CN115950517A - Configurable underwater acoustic signal feature extraction method and device - Google Patents

Configurable underwater acoustic signal feature extraction method and device Download PDF

Info

Publication number
CN115950517A
CN115950517A CN202310187600.2A CN202310187600A CN115950517A CN 115950517 A CN115950517 A CN 115950517A CN 202310187600 A CN202310187600 A CN 202310187600A CN 115950517 A CN115950517 A CN 115950517A
Authority
CN
China
Prior art keywords
signal
frame
mel
underwater sound
power spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310187600.2A
Other languages
Chinese (zh)
Inventor
林军
史可
王中风
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202310187600.2A priority Critical patent/CN115950517A/en
Publication of CN115950517A publication Critical patent/CN115950517A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The application provides a configurable underwater acoustic signal feature extraction method and device, and the method comprises the following steps: acquiring a configuration file and an underwater sound sampling signal; performing preprocessing on the underwater sound sampling signal to obtain a first signal frame set; performing Fast Fourier Transform (FFT) on the first signal frame set according to the configuration file to obtain underwater sound signal characteristics; and outputting the characteristics of the underwater sound signals according to the configuration file. Wherein the configuration file is used for indicating one or more underwater sound signal feature combinations in the LOFAR spectrum, the STFT power spectrum, the Mel power spectrum and the MFCC to be extracted. The device comprises a control module, an FPGA module and a power supply module, wherein the FPGA module shares a preprocessing unit and an FFT unit through four feature extraction methods, so that underwater acoustic signal feature extraction of various results is realized, proper underwater acoustic signal features can be selected according to needs, and the use of hardware resources is reduced. In addition, in the operation process, the use of the multiplier is reduced, and the operation efficiency is improved.

Description

Configurable underwater acoustic signal feature extraction method and device
Technical Field
The application relates to the technical field of voiceprint feature extraction, in particular to a configurable underwater sound signal feature extraction method and device.
Background
Underwater sound target identification is one of research hotspots in the field of underwater sound, and in the research on underwater sound target identification, the characteristics of an underwater sound signal need to be extracted. The underwater sound signal feature extraction method mainly comprises time domain waveform structure analysis, frequency domain spectrum estimation and time-frequency domain analysis. The frequency domain spectrum estimation can extract the characteristics of frequency, power, envelope and the like of the signal and analyze the characteristics of the non-Gaussian signal by utilizing a high-order spectrum. The method is simple in principle and easy to implement, and can be obtained only through the collected original underwater acoustic signals, but the extracted features need certain empirical knowledge to carry out signal preprocessing, and the generalization performance is weak in a time-varying marine environment. The spectral features obtained by Frequency domain spectrum estimation are continuously enriched and expanded from the original Fourier spectrum and power spectrum, and are developed into Low Frequency Analysis and Recording (LOFAR) spectrum, auditory spectrum, mel Frequency cepstrum and the like, which are more and more fit with the auditory perception model of human ears.
The feature extraction of the underwater acoustic signals is mostly realized through a software method, a librosa library is usually used for feature extraction, and the librosa is a Python module, is used for analyzing general audio signals and is a powerful third-party library for Python voice signal processing. The library of librosa is called to load the audio file and read the sampling rate, and also provides feature extraction tools such as Short Time Fourier Transform (STFT), mel Frequency Cepstral Coefficients (MFCC), and the like. However, the library provided by librosa cannot see the source code, and only one underwater sound signal feature can be extracted through one programming form, and the extraction result is single.
Disclosure of Invention
The application provides a configurable underwater sound signal feature extraction method and device, which are used for solving the problem of single extraction result when a software method is adopted to extract underwater sound signal features.
The application provides a configurable underwater acoustic signal feature extraction method in a first aspect, and the method comprises the following steps:
acquiring a configuration file and an underwater sound sampling signal;
performing preprocessing on the underwater sound sampling signal to obtain a first signal frame set; the preprocessing comprises pre-emphasis processing, framing processing and windowing processing;
performing Fast Fourier Transform (FFT) on the first signal frame set according to the configuration file to obtain underwater sound signal characteristics; the configuration file is used for indicating one or more underwater sound signal feature combinations in a low-frequency analysis record LOFAR spectrum, a short-time Fourier transform (STFT) power spectrum, a Mel power spectrum and Mel Frequency Cepstrum Coefficients (MFCC) to be extracted;
and outputting the underwater sound signal characteristics.
Optionally, the performing the preprocessing on the underwater acoustic sampling signal includes:
utilizing a pre-emphasis filter to strengthen a high-frequency part in the underwater sound sampling signal so as to perform pre-emphasis processing on the underwater sound sampling signal;
dividing the underwater sound sampling signal into a plurality of time frames for framing;
and multiplying each frame of the underwater sound sampling signal subjected to framing processing by a window function so as to perform windowing processing.
Optionally, the step of performing fast fourier transform FFT on the first signal frame set according to the configuration file to obtain the characteristics of the underwater acoustic signal includes:
normalizing each frame of the first signal frame set to obtain a second signal frame set;
performing centralization processing on each frame of the second signal frame set to obtain a third signal frame set;
performing a Fast Fourier Transform (FFT) on the third set of signal frames to obtain the low frequency analysis record LOFAR spectrum.
Optionally, the step of performing normalization processing on each frame of the first signal frame set includes:
acquiring a first amplitude extreme value of each frame signal in the first signal frame set, wherein the first amplitude extreme value comprises a maximum value and a minimum value;
averaging each frame signal of the first set of signal frames according to the first amplitude pole to obtain the second set of signal frames.
Optionally, the step of performing centering processing on each frame of the second signal frame set includes:
acquiring a second amplitude extreme value of each frame signal in the second signal frame set, wherein the second amplitude extreme value comprises a maximum value and a minimum value;
averaging each frame signal of the second set of signal frames according to the second amplitude pole;
subtracting the average of each frame signal amplitude from each frame signal amplitude in the second set of signal frames to obtain the third set of signal frames.
Optionally, the step of performing fast fourier transform FFT on the first signal frame set according to the configuration file to obtain the characteristics of the underwater acoustic signal further includes:
performing a Fast Fourier Transform (FFT) on each frame of the first set of signal frames;
and performing power spectrum calculation on each frame of the transformed first signal frame set to obtain the short-time Fourier transform (STFT) power spectrum.
Optionally, the step of performing fast fourier transform FFT on the first signal frame set according to the configuration file to obtain the characteristics of the underwater acoustic signal further includes:
performing Mel filtering on the short-time Fourier transform (STFT) power spectrum of each frame through a Mel filter bank to obtain a filtered power spectrum;
multiplying the filtered power spectrum with the mel filter bank to obtain a plurality of energy value results;
and respectively taking logarithms of a plurality of energy value results to obtain the Mel power spectrum.
Optionally, the step of Mel-filtering the short-time fourier transform STFT power spectrum of each frame by a Mel filter bank includes:
acquiring the lowest frequency and the highest frequency of the underwater sound sampling signal after the power spectrum calculation and the number of Mel filters;
respectively calculating Mel frequencies corresponding to the highest frequency and the lowest frequency;
calculating the center frequency distance between two adjacent Mel filters according to the number of the Mel filters;
and respectively calculating frequency values corresponding to the central frequency intervals and subscripts of Fast Fourier Transform (FFT) midpoints corresponding to the frequency values, wherein the midpoints are midpoints of triangular bottom edges of the Mel filter.
Optionally, the step of performing fast fourier transform FFT on the first signal frame set according to the configuration file to obtain the characteristics of the underwater sound signal further includes:
performing Discrete Cosine Transform (DCT) processing on the obtained Mel power spectrum to obtain a Mel Frequency Cepstrum Coefficient (MFCC); the number of points of the discrete cosine transform DCT is the same as the number of points of the Mel filter.
Another aspect of the present application provides a configurable underwater acoustic signal feature extraction apparatus, including: the device comprises a control module, a programmable gate array FPGA module and a power supply module; the power supply module is electrically connected with the control module and the FPGA module respectively; the control module is in communication connection with the FPGA module; a storage unit is arranged in the control module; the FPGA module comprises a preprocessing unit, a normalization centralization unit, a Fast Fourier Transform (FFT) unit, a short-time Fourier transform (STFT) power spectrum calculation unit, a Mel filter bank unit, a logarithm calculation unit, a Mel power spectrum unit, a Discrete Cosine Transform (DCT) unit and a Mel Frequency Cepstrum Coefficient (MFCC) unit; the control module is configured to perform the configurable underwater acoustic signal feature extraction method of the first aspect.
As can be seen from the foregoing technical solutions, in one aspect, the present application provides a method for extracting features of a configurable underwater acoustic signal, where the method includes: acquiring a configuration file and an underwater sound sampling signal; performing preprocessing on the underwater sound sampling signal to obtain a first signal frame set; performing Fast Fourier Transform (FFT) on the first signal frame set according to the configuration file to obtain underwater sound signal characteristics; and outputting the characteristics of the underwater sound signals according to the configuration file. Wherein the configuration file is used for indicating one or more water sound signal feature combinations in a low frequency analysis record LOFAR spectrum, a Short Time Fourier Transform (STFT) power spectrum, a Mel power spectrum and Mel Frequency Cepstrum Coefficients (MFCC) to be extracted. The device comprises a control module, an FPGA module and a power module, wherein in the FPGA module, the four feature extraction methods share a preprocessing unit and an FFT unit, so that the underwater acoustic signal feature extraction with various results can be realized, and the use of hardware resources can be reduced. By reducing the use of the multiplier in the operation process, the complexity of the operation can be reduced, and the operation efficiency is improved.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow diagram of a configurable underwater acoustic signal feature extraction method;
FIG. 2 is a flow chart of the pretreatment;
FIG. 3 is a schematic framing diagram of an underwater acoustic sampling signal;
FIG. 4 is a schematic illustration of windowing an underwater acoustic sample signal after framing;
FIG. 5 is a path diagram of a configurable underwater acoustic signal feature extraction method;
fig. 6 is a schematic diagram of an N = 8-point FFT operation;
fig. 7 is a schematic structural diagram of a configurable underwater acoustic signal feature extraction device.
Detailed Description
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following examples do not represent all embodiments consistent with the present application. But merely as examples of systems and methods consistent with certain aspects of the application, as detailed in the claims.
When the software method is adopted to extract the characteristics of the underwater sound signal, a library of librosa can be used for extracting the characteristics. The library of librosa is called to load the audio file and read the sampling rate, and the library of librosa also provides a feature extraction tool such as STFT, MFCC and the like. However, the library provided by librosa cannot see the source code, and only one underwater sound signal feature can be extracted through one programming form, and the extraction result is single.
In order to solve the above problem, some embodiments of the present application provide a configurable underwater acoustic signal feature extraction method. The configurable underwater sound signal feature extraction method provided by the application is based on a Programmable logic Gate Array (FPGA), extraction of underwater sound signal features in multiple modes is realized on the FPGA, and the required underwater sound signal features can be selected by selecting different modes.
Fig. 1 is a flowchart of a configurable underwater acoustic signal feature extraction method, as shown in fig. 1, the configurable underwater acoustic signal feature extraction method includes:
s100: and acquiring a configuration file and an underwater sound sampling signal.
The original underwater sound signal is acquired by the underwater sound signal acquisition device to obtain an analog signal, and the analog signal needs to be subjected to discretization processing to obtain a digital signal so as to be convenient for further calculation processing of the signal. The analog signals after discretization processing are stored in a storage unit of the control module, the FPGA module acquires the processed underwater sound signals, namely the underwater sound sampling signals, from the storage unit and acquires a configuration file from the control module, and the configuration file is used for carrying out logic configuration on the FPGA module.
S200: preprocessing is performed on the underwater sound sampling signal to obtain a first signal frame set.
Wherein the preprocessing comprises pre-emphasis processing, framing processing and windowing processing.
Fig. 2 is a flow diagram of preprocessing, and as shown in fig. 2, in some embodiments, preprocessing the underwater sound sampling signal further comprises:
s210: utilizing a pre-emphasis filter to strengthen a high-frequency part in the underwater sound sampling signal so as to perform pre-emphasis processing on the underwater sound sampling signal;
the pre-emphasis processing is to apply a pre-emphasis filter to the underwater sound sampling signal to amplify high frequency in the signal, so as to strengthen the high frequency part in the underwater sound sampling signal. The high frequency part of the signal mainly appears at the rising edge and the falling edge of the signal, and the pre-emphasis processing is to enhance the amplitude at the rising edge and the falling edge of the signal. The formula for implementing pre-emphasis with a difference equation is expressed as follows:
Figure SMS_1
wherein pre _ interpolated (t) is a pre-emphasis signal at time t; signal (t) is an underwater sound sampling signal at the time t; signal (t-1) is an underwater sound sampling signal at the time of t-1;
Figure SMS_2
is the pre-emphasis filter coefficient; will be/are>
Figure SMS_3
If the value of (d) is 0.97, the above formula is expressed as:
Figure SMS_4
in the above formula, the division of the underwater acoustic sampling signal (t-1) at the time t-1 by 32 can be realized by shifting right by 5 bits, and the division operation is simplified into the shift operation, so that the calculation complexity is reduced.
S220: dividing the underwater sound sampling signal into a plurality of time frames for framing processing;
the framing processing is to divide a continuous underwater sound sampling signal into a plurality of short-time frames so that subsequent processing can be performed in units of one frame. When framing processing is performed, the following parameters are set: the frame length, namely the length of one frame, and the number of sampling points contained in one frame; the length of the signal; frame sets (frames), two-dimensional arrays, where each row is a frame and the number of rows is the number of frames. As shown in fig. 3, fig. 3 is a schematic diagram of framing an underwater acoustic sampling signal, where signal [ n ]0-55807 is a signal length of the underwater acoustic sampling signal, the underwater acoustic sampling signal is subjected to framing to obtain a frame set frame [ n ] [ m ], each row is a frame, for example, 0-255 is a frame, 128-383 is a frame, and line number m is the number of frames.
S230: and multiplying each frame of the underwater sound sampling signal subjected to framing processing by a window function so as to perform windowing processing.
As shown in fig. 4, fig. 4 is a schematic diagram of windowing the framed underwater sound sampling signal. After dividing the underwater sound sampling signal into frames, each frame is multiplied by a window function to increase the continuity of the left end of the frame and the right end of the frame. In some embodiments, a Hamming window may be used, which may effectively overcome the leakage phenomenon and have a smoother low-pass characteristic. The hamming window function is as follows:
Figure SMS_5
wherein N is more than or equal to 0 and less than or equal to N-1, N is the window length,
Figure SMS_6
is a window value. In this embodiment, the windowing process is implemented by using a table lookup method, in which each value of the window function is stored in the storage unit and sequentially searched when used. And windowing the underwater sound sampling signal to obtain a first signal frame set.
By preprocessing the underwater sound sampling signal, the influence of aliasing, higher harmonic distortion, high frequency and other factors on the underwater sound sampling signal caused by the underwater sound sampling signal and equipment for acquiring the underwater sound signal can be eliminated. More uniform and smooth underwater sound sampling signals can be obtained, and the processing quality of the underwater sound sampling signals is improved.
S300: and performing Fast Fourier Transform (FFT) on the first signal frame set according to the configuration file to obtain the underwater sound signal characteristics.
Wherein the configuration file is used for indicating one or more feature combinations in a low frequency analysis record LOFAR spectrum, a Short Time Fourier Transform (STFT) power spectrum, a Mel power spectrum and Mel Frequency Cepstrum Coefficients (MFCC) to be extracted.
After the first signal frame set is obtained, fast Fourier Transform (FFT) is performed on the first signal frame set according to the configuration file, and different underwater acoustic signal characteristics can be obtained through different paths configured by the FPGA module. The acquired characteristics of the hydroacoustic signal include a LOFAR spectrum, a STFT power spectrum, a Mel power spectrum, and a MFCC.
Fig. 5 is a path diagram of a configurable underwater acoustic signal feature extraction method, as shown in fig. 5, in some embodiments, the step of obtaining a low frequency analysis record LOFAR spectrum is:
s311: normalizing each frame of the first signal frame set to obtain a second signal frame set;
and acquiring a first amplitude extreme value of each frame signal in the first signal frame set, namely the maximum value and the minimum value of each frame signal amplitude according to the first signal frame set obtained after windowing. Averaging each frame signal of the first set of signal frames according to the maximum and minimum values of the first amplitude to obtain a second set of signal frames. The formula for the normalization process is as follows:
Figure SMS_7
wherein, the Normalization frame is a Normalization frame, the frame is an initial frame, the mean frame is an average frame, the max frame is a maximum frame, and the min frame is a minimum frame.
S312: performing centralization processing on each frame of the second signal frame set to obtain a third signal frame set;
and acquiring a second amplitude extreme value of each frame signal in the second signal frame set obtained after the normalization processing, namely the maximum value and the minimum value of the amplitude of each frame signal. And averaging each frame signal of the second signal frame set according to the maximum value and the minimum value of the second amplitude, and subtracting the average value of the amplitude of each frame signal from the amplitude of each frame signal in the second signal frame set to obtain a third signal frame set. The formula of the centralization treatment is as follows:
Figure SMS_8
wherein, the Centralization frame is a centralized frame, the Normalization frame is a normalized frame, and the Mean Normalization frame is an average normalized frame.
S313: performing an FFT on the third set of signal frames to obtain the low frequency analysis record LOFAR spectrum.
In this embodiment, the FFT employs a radix-2-FFT algorithm that is extracted over time, and the calculation mode is fixed-point number calculation. In the radix-2-FFT algorithm, the number of points N of the fast fourier transform input must be the nth power of 2. In the calculation process, the twiddle factors used are stored in a storage unit, and the real part and the imaginary part are calculated separately. Since the length of each frame in the first signal frame set is fixed, the number of points of FFT calculation coincides with the length of the frame. The number of FFT calculation points is configured according to the window size of the windowing, and in this embodiment, FFT calculation of three number of points, 256 points, 512 points, and 1024 points, can be realized, and can be selected according to actual requirements. As shown in fig. 6, fig. 6 is a schematic diagram of N = 8-point FFT operation. The radix 2-FFT algorithm extracted according to time has symmetrical structure, high efficiency and easy hardware realization.
As shown in fig. 5, in some embodiments, the step of obtaining the short-time fourier transform STFT power spectrum is:
s321: performing an FFT transform on each frame of the first set of signal frames;
the first signal frame set in this step is a new frame set obtained after windowing in the preprocessing process.
S322: and performing power spectrum calculation on each frame of the transformed first signal frame set to obtain the short-time Fourier transform (STFT) power spectrum.
The STFT power spectrum is calculated as follows:
Figure SMS_9
wherein,
Figure SMS_10
is the power spectral density, N is the number of points of the underwater sound sampling signal, is>
Figure SMS_11
Sampling signal for underwater sound>
Figure SMS_12
The fourier transform of (d). Because the number of points N is an integer of the nth power of 2, the method can be realized by adopting a shifting mode in the operation process, thereby reducing the complexity of the operation.
As shown in fig. 5, in some embodiments, the step of obtaining the Mel-power spectrum is:
s331: performing Mel filtering on the short-time Fourier transform (STFT) power spectrum of each frame through a Mel filter bank to obtain a filtered power spectrum;
the STFT power spectrum of each frame obtained in step S322 is Mel-filtered through a set of Mel filter banks. The mel filter bank is a triangular filter bank of equal height, the starting point of each filter is at the midpoint of the last filter, and the corresponding frequency is linear on the mel scale, so the mel filter bank is called. The filter adopted by the Mel filter bank is a triangular filter. The frequency response of the triangular filter is expressed as follows:
Figure SMS_13
wherein,
Figure SMS_14
is the frequency response value of k period of the mth filter, m is the serial number of the filter, f (m) is the center frequency, f (m-1) is the starting point frequency of the center frequency, f (m + 1) is the centerEnd point frequency of frequency.
The purpose of adopting the triangular filter bank is to filter the STFT power spectrum of the underwater sound sampling signal, the coverage area of each triangular filter is similar to a critical bandwidth of human ears, and the characteristics of low frequency density and high frequency sparseness can be well used for simulating the masking effect of the human ears. In addition, the triangular filtering can smooth the frequency spectrum, eliminate the effect of harmonic waves and highlight the formants of the original audio. The data volume and the operation amount can be reduced.
The implementation process of carrying out Mel filtering on the STFT power spectrum of each frame through a Mel filter bank comprises the following steps:
firstly, the lowest frequency (fmin), the highest frequency (fmax) and the number (M) of the Mel filters of the underwater sound sampling signals after power spectrum calculation are determined. In some embodiments, the number of mel filters used may be set to 8, i.e., M =8.
Respectively calculating Mel frequencies corresponding to the highest frequency and the lowest frequency according to a formula for converting the frequency into Mel scales, wherein the formula for converting the frequency into the Mel frequencies is as follows:
Figure SMS_15
wherein,
Figure SMS_16
f is the actual frequency in Hz for the converted mel-frequency. Obviously, when f is large>
Figure SMS_17
The change in (c) tends to be gradual.
The central frequency spacing between two adjacent Mel filters is calculated according to the number (M) of the filters, the central frequency between two adjacent Mel filters on Mel scale is equidistant, and the calculation formula of the central frequency spacing is as follows:
Figure SMS_18
wherein,
Figure SMS_19
for the center frequency spacing of two adjacent Mel filters, <' >>
Figure SMS_20
Is the highest point of the Mel filter>
Figure SMS_21
The lowest point of the mel filter.
After the center frequency distance on the Mel scale is obtained, frequency values f corresponding to a plurality of center frequencies are respectively calculated according to a formula of converting the Mel frequency back to the frequency:
Figure SMS_22
respectively calculating subscripts of the FFT midpoints corresponding to the plurality of frequency values, wherein the midpoints are the midpoints of the triangular bottom sides of the Mel filters, and the calculation formula of the subscripts of the FFT midpoints is as follows:
Figure SMS_23
wherein,
Figure SMS_24
is subscript of the midpoint>
Figure SMS_25
Is the centre point of the Mel filter>
Figure SMS_26
Is the FFT region length->
Figure SMS_27
Is the sampling frequency.
Through the steps, mel filtering can be carried out on the STFT power spectrum, and a filtered power spectrum is obtained.
S332: multiplying the filtered power spectrum with the mel filter bank to obtain a plurality of energy value results;
it will be appreciated that the energy value result is the energy value result of the mel filter bank for each frame of the power spectrum.
S333: and respectively taking logarithms of a plurality of energy value results to obtain the Mel power spectrum.
In the process of logarithm taking, a table look-up method is adopted, the pre-calculated logarithm value is stored in a storage unit, and the table look-up value taking is carried out according to the requirement when the table look-up method is used. In some embodiments, for the logarithm taking step, a Coordinate Rotation Digital Computer (CORDIC) algorithm may also be used to take a value, and a multiplication operation is replaced by a basic addition and shift operation, so that a multiplier is not needed, and the operation process is simplified.
As shown in fig. 5, in some embodiments, the step of obtaining mel-frequency cepstral coefficients MFCC is:
the Mel power spectrum obtained in step S333 is subjected to Discrete Cosine Transform (DCT) processing to obtain Mel-frequency cepstral coefficients MFCC. The number of points of the DCT is the same as the number M of the Mel filters. In this embodiment, the DCT algorithm adopted by the DCT transform has the following formula:
step 1:
Figure SMS_28
step 2:
Figure SMS_29
and 3, step 3:
Figure SMS_30
and 4, step 4:
Figure SMS_31
and 5, step 5:
Figure SMS_32
and 6, step 6:
Figure SMS_33
and 7, step 7:
Figure SMS_34
wherein,
Figure SMS_35
Figure SMS_36
Figure SMS_37
. It should be noted that the DCT coefficients generated by this algorithm are not real coefficients.
The number M of the Mel filters can be set in the implementation process of the Mel filtering, and if M is set to be 8, the DCT transformation is 8-point DCT transformation. The fixed parameters required to be used are all pre-calculated and stored in the storage unit, and are called out to be calculated according to the requirements when used. In some embodiments, the DCT transform may also be replaced by other linear orthogonal transforms or other types of DCT.
S400: and outputting the underwater sound signal characteristics.
In step S300, 4 different underwater acoustic signal features can be obtained through calculation, and the FPGA module can select which underwater acoustic signal feature needs to be output according to actual requirements, and output the underwater acoustic signal feature to the control module for extraction.
Some embodiments of the present application further provide a configurable underwater acoustic signal feature extraction device, and fig. 7 is a schematic structural diagram of the configurable underwater acoustic signal feature extraction device, as shown in fig. 7, the device includes: the device comprises a control module, an FPGA module and a power supply module. The power supply module is electrically connected with the control module and the FPGA module respectively; the control module is connected with the FPGA module.
The control module is configured to perform the configurable underwater acoustic signal feature extraction method provided in the above-described embodiments. And a storage unit is arranged in the control module and used for storing the underwater sound sampling signal obtained after discretization and the calculation result. The control module is used for controlling the storage unit and the FPGA module. In some embodiments, the control module may be a combination of a host and a display, and the storage unit may be a Random Access Memory (RAM). The power module is used for supplying power for the control module and the FPGA module. In this embodiment, the power supply module may be a switching power supply, and may convert an input power supply voltage into a working voltage and output the working voltage to the control module and the FPGA module.
The FPGA module is used for executing digital logic functions or hardware realization of a digital circuit, judging which logic should be executed when data flow passes through, and finishing complex time sequence and combinational logic circuit functions. The basic components of the FPGA module comprise a programmable input/output unit and a basic programmable logic unit, wherein the basic programmable logic unit is a main body of programmable logic, can realize different logic functions by programming the FPGA module, and is suitable for being applied to the field of underwater acoustic signal feature extraction.
The FPGA module is used for bearing complex operations such as Fourier transform. The FPGA module comprises a preprocessing unit, a normalization centralization unit, a fast Fourier transform FFT unit, a short-time Fourier transform STFT power spectrum calculation unit, a Mel filter bank unit, a logarithm calculation unit, a Mel power spectrum unit, a discrete cosine transform DCT unit and a Mel frequency cepstrum coefficient MFCC unit. Through the output with control module and the input electric connection of FPGA module, can give the FPGA module with the underwater sound sampling signal transmission, utilize above-mentioned each unit to carry out operation and logic configuration to the underwater sound sampling signal. In this embodiment, the FPGA module may configure the underwater acoustic signal features in four modes, select a suitable underwater acoustic signal feature by selecting different modes, and finally output the underwater acoustic signal feature to the control module for extraction.
Through the above embodiments, the application provides a configurable underwater acoustic signal feature extraction method and device. The method comprises the following steps: acquiring a configuration file and an underwater sound sampling signal; performing preprocessing on the underwater sound sampling signal to obtain a first signal frame set; performing Fast Fourier Transform (FFT) on the first signal frame set according to the configuration file to obtain underwater sound signal characteristics; and outputting the characteristics of the underwater sound signals according to the configuration file. Wherein the configuration file is used for indicating one or more underwater sound signal feature combinations in the LOFAR spectrum, the STFT power spectrum, the Mel power spectrum and the MFCC to be extracted. The device comprises a control module, an FPGA module and a power supply module, wherein the FPGA module shares a preprocessing unit and an FFT unit through four feature extraction methods, so that underwater acoustic signal feature extraction of various results is realized, appropriate underwater acoustic signal features can be selected and used according to needs, and the use of hardware resources can be reduced. In addition, in the operation process, the use of the multiplier is reduced, the operation complexity is reduced, and the operation efficiency is improved.
The embodiments provided in the present application are only a few examples of the general concept of the present application, and do not limit the scope of the present application. Any other embodiments extended according to the scheme of the present application without inventive efforts will be within the scope of protection of the present application for a person skilled in the art.

Claims (10)

1. A configurable underwater acoustic signal feature extraction method is characterized by comprising the following steps:
acquiring a configuration file and an underwater sound sampling signal;
performing preprocessing on the underwater sound sampling signal to obtain a first signal frame set; the preprocessing comprises pre-emphasis processing, framing processing and windowing processing;
performing Fast Fourier Transform (FFT) on the first signal frame set according to the configuration file to obtain underwater sound signal characteristics; the configuration file is used for indicating one or more underwater sound signal feature combinations in a low-frequency analysis record LOFAR spectrum, a short-time Fourier transform (STFT) power spectrum, a Mel power spectrum and Mel Frequency Cepstrum Coefficients (MFCC) to be extracted;
and outputting the underwater sound signal characteristics.
2. The method of claim 1, wherein the pre-processing the underwater acoustic sampling signal comprises:
utilizing a pre-emphasis filter to strengthen a high-frequency part in the underwater sound sampling signal so as to perform pre-emphasis processing on the underwater sound sampling signal;
dividing the underwater sound sampling signal into a plurality of time frames for framing processing;
and multiplying each frame of the underwater sound sampling signal subjected to framing processing by a window function so as to perform windowing processing.
3. The method of claim 1, wherein the step of performing Fast Fourier Transform (FFT) on the first signal frame set according to the configuration file to obtain the features of the underwater acoustic signal comprises:
normalizing each frame of the first signal frame set to obtain a second signal frame set;
performing centralized processing on each frame of the second signal frame set to obtain a third signal frame set;
performing a Fast Fourier Transform (FFT) on the third set of signal frames to obtain the low frequency analysis record LOFAR spectrum.
4. The method according to claim 3, wherein the step of normalizing each frame of the first signal frame set comprises:
acquiring a first amplitude extreme value of each frame signal in the first signal frame set, wherein the first amplitude extreme value comprises a maximum value and a minimum value;
averaging each frame signal of the first set of signal frames according to the first amplitude pole to obtain the second set of signal frames.
5. The method of claim 4, wherein the step of centering each frame of the second signal frame set comprises:
acquiring a second amplitude extreme value of each frame signal in the second signal frame set, wherein the second amplitude extreme value comprises a maximum value and a minimum value;
averaging each frame signal of the second set of signal frames according to the second amplitude pole;
subtracting the average of each frame signal amplitude from each frame signal amplitude in the second set of signal frames to obtain the third set of signal frames.
6. The method of claim 1, wherein the step of performing Fast Fourier Transform (FFT) on the first signal frame set according to the configuration file to obtain the features of the underwater acoustic signal further comprises:
performing a Fast Fourier Transform (FFT) on each frame of the first set of signal frames;
and performing power spectrum calculation on each frame of the transformed first signal frame set to obtain the short-time Fourier transform (STFT) power spectrum.
7. The method as claimed in claim 6, wherein the step of performing Fast Fourier Transform (FFT) on the first signal frame set according to the configuration file to obtain the features of the underwater acoustic signal further comprises:
performing Mel filtering on the short-time Fourier transform (STFT) power spectrum of each frame through a Mel filter bank to obtain a filtered power spectrum;
multiplying the filtered power spectrum with the mel filter bank to obtain a plurality of energy value results;
and respectively taking logarithms of a plurality of energy value results to obtain the Mel power spectrum.
8. The method of claim 7, wherein the step of Mel filtering the Short Time Fourier Transform (STFT) power spectrum of each frame through a Mel filter bank comprises:
acquiring the lowest frequency and the highest frequency of the underwater sound sampling signal after the power spectrum calculation and the number of Mel filters;
respectively calculating the Mel frequencies corresponding to the highest frequency and the lowest frequency;
calculating the center frequency distance between two adjacent Mel filters according to the number of the Mel filters;
and respectively calculating frequency values corresponding to the central frequency intervals and subscripts of Fast Fourier Transform (FFT) midpoints corresponding to the frequency values, wherein the midpoints are midpoints of triangular bottom edges of the Mel filter.
9. The method of claim 7, wherein the step of performing Fast Fourier Transform (FFT) on the first signal frame set according to the configuration file to obtain the features of the underwater acoustic signal further comprises:
performing Discrete Cosine Transform (DCT) processing on the obtained Mel power spectrum to obtain a Mel Frequency Cepstrum Coefficient (MFCC); the number of points of the Discrete Cosine Transform (DCT) is the same as the number of points of the Mel filter.
10. A configurable underwater acoustic signal feature extraction apparatus, the apparatus comprising: the device comprises a control module, a programmable gate array FPGA module and a power supply module; the power supply module is electrically connected with the control module and the FPGA module respectively; the control module is in communication connection with the FPGA module; a storage unit is arranged in the control module; the FPGA module comprises a preprocessing unit, a normalization centralization unit, a Fast Fourier Transform (FFT) unit, a short-time Fourier transform (STFT) power spectrum calculation unit, a Mel filter bank unit, a logarithm calculation unit, a Mel power spectrum unit, a Discrete Cosine Transform (DCT) unit and a Mel Frequency Cepstrum Coefficient (MFCC) unit; the control module is configured to perform a configurable underwater acoustic signal feature extraction method as claimed in any one of claims 1 to 9.
CN202310187600.2A 2023-03-02 2023-03-02 Configurable underwater acoustic signal feature extraction method and device Pending CN115950517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310187600.2A CN115950517A (en) 2023-03-02 2023-03-02 Configurable underwater acoustic signal feature extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310187600.2A CN115950517A (en) 2023-03-02 2023-03-02 Configurable underwater acoustic signal feature extraction method and device

Publications (1)

Publication Number Publication Date
CN115950517A true CN115950517A (en) 2023-04-11

Family

ID=87287935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310187600.2A Pending CN115950517A (en) 2023-03-02 2023-03-02 Configurable underwater acoustic signal feature extraction method and device

Country Status (1)

Country Link
CN (1) CN115950517A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150340027A1 (en) * 2013-03-29 2015-11-26 Boe Technology Group Co., Ltd. Voice recognition system
CN112364779A (en) * 2020-11-12 2021-02-12 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN113049080A (en) * 2021-03-08 2021-06-29 中国电子科技集团公司第三十六研究所 GDWC auditory feature extraction method for ship radiation noise
CN114842873A (en) * 2022-04-15 2022-08-02 南京大学 Ship classification method and system based on underwater acoustic radiation audio data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150340027A1 (en) * 2013-03-29 2015-11-26 Boe Technology Group Co., Ltd. Voice recognition system
CN112364779A (en) * 2020-11-12 2021-02-12 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN113049080A (en) * 2021-03-08 2021-06-29 中国电子科技集团公司第三十六研究所 GDWC auditory feature extraction method for ship radiation noise
CN114842873A (en) * 2022-04-15 2022-08-02 南京大学 Ship classification method and system based on underwater acoustic radiation audio data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
姜岩松: "基于生成对抗网络的水声信号识别与分离研究", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *
岳皓: "基于深度学习的水声信号特征提取和分类识别研究", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *
曾向阳: "《声信号处理基础》", 西北工业大学出版社 *
王鹏: "基于深度神经网络的水中目标识别研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 *

Similar Documents

Publication Publication Date Title
CN110491407B (en) Voice noise reduction method and device, electronic equipment and storage medium
CN103999076B (en) System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain
CN103718242B (en) Adopt the system and method for the treatment of voice signal of spectrum motion transform
US20210193149A1 (en) Method, apparatus and device for voiceprint recognition, and medium
CN101023469B (en) Digital filtering method, digital filtering equipment
EP0632899B1 (en) Method and apparatus for time varying spectrum analysis
JP6334895B2 (en) Signal processing apparatus, control method therefor, and program
CN111341303A (en) Acoustic model training method and device and voice recognition method and device
JP2015118361A (en) Information processing apparatus, information processing method, and program
CN101404155A (en) Signal processing apparatus, signal processing method, and program therefor
CN113077806A (en) Audio processing method and device, model training method and device, medium and equipment
CN112185410A (en) Audio processing method and device
CN113782044A (en) Voice enhancement method and device
CN112151055B (en) Audio processing method and device
CN113744715A (en) Vocoder speech synthesis method, device, computer equipment and storage medium
CN113593604A (en) Method, device and storage medium for detecting audio quality
CN115950517A (en) Configurable underwater acoustic signal feature extraction method and device
Ernawan et al. Efficient discrete tchebichef on spectrum analysis of speech recognition
Kumar et al. Design and implementation of pipelined SDF FFT architecture for sustainable industrial noise suppression in Digital Hearing Aids
CN117558290A (en) Configurable multi-mode underwater sound signal feature extraction method
Prasanna Kumar et al. An unsupervised approach for co-channel speech separation using Hilbert–Huang transform and Fuzzy C-Means clustering
US11785409B1 (en) Multi-stage solver for acoustic wave decomposition
Zhang Designs, experiments, and applications of multichannel structures for hearing aids
US20240257822A1 (en) Spatio-temporal beamformer
Singh pyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning Modeling.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230411