CN112504970B - Gas photoacoustic spectrum enhanced voiceprint recognition method and device based on deep learning - Google Patents

Gas photoacoustic spectrum enhanced voiceprint recognition method and device based on deep learning Download PDF

Info

Publication number
CN112504970B
CN112504970B CN202110168036.0A CN202110168036A CN112504970B CN 112504970 B CN112504970 B CN 112504970B CN 202110168036 A CN202110168036 A CN 202110168036A CN 112504970 B CN112504970 B CN 112504970B
Authority
CN
China
Prior art keywords
photoacoustic
spectrum
data set
noise
enhanced
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110168036.0A
Other languages
Chinese (zh)
Other versions
CN112504970A (en
Inventor
杨军
易国华
李俊逸
代犇
姜勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Infotech Co ltd
Original Assignee
Hubei Infotech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Infotech Co ltd filed Critical Hubei Infotech Co ltd
Priority to CN202110168036.0A priority Critical patent/CN112504970B/en
Publication of CN112504970A publication Critical patent/CN112504970A/en
Application granted granted Critical
Publication of CN112504970B publication Critical patent/CN112504970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/1702Systems in which incident light is modified in accordance with the properties of the material investigated with opto-acoustic detection, e.g. for gases or analysing solids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N29/00Investigating or analysing materials by the use of ultrasonic, sonic or infrasonic waves; Visualisation of the interior of objects by transmitting ultrasonic or sonic waves through the object
    • G01N29/44Processing the detected response signal, e.g. electronic circuits specially adapted therefor
    • G01N29/4454Signal recognition, e.g. specific values or portions, signal events, signatures
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N29/00Investigating or analysing materials by the use of ultrasonic, sonic or infrasonic waves; Visualisation of the interior of objects by transmitting ultrasonic or sonic waves through the object
    • G01N29/44Processing the detected response signal, e.g. electronic circuits specially adapted therefor
    • G01N29/4481Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/1702Systems in which incident light is modified in accordance with the properties of the material investigated with opto-acoustic detection, e.g. for gases or analysing solids
    • G01N2021/1704Systems in which incident light is modified in accordance with the properties of the material investigated with opto-acoustic detection, e.g. for gases or analysing solids in gases

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Analytical Chemistry (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Investigating Or Analyzing Materials By The Use Of Ultrasonic Waves (AREA)

Abstract

The invention relates to a method and a device for identifying gas photoacoustic spectrum enhanced voiceprints based on deep learning, wherein the method comprises the following steps: acquiring a plurality of photoacoustic spectrums of target characteristic gas under different components and concentrations to obtain a photoacoustic spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; framing each photoacoustic sound spectrum in the photoacoustic sound spectrum data set, and extracting a Mel frequency cepstrum coefficient, a wave peak, a spectrum skewness and a spectrum kurtosis of each photoacoustic sound spectrum; and training a cyclic neural network according to the characteristics, the noise in the enhanced noise data set and the concentration of the target characteristic gas and using the cyclic neural network for identifying the concentration of the target characteristic gas corresponding to the photoacoustic sound spectrum. The method improves the generalization and robustness of the identification model by enhancing the environmental noise data of the photoacoustic cell, and can be suitable for voiceprint identification of gas photoacoustic spectrums in different sound environments.

Description

Gas photoacoustic spectrum enhanced voiceprint recognition method and device based on deep learning
Technical Field
The invention belongs to the field of photoacoustic spectrum data processing and deep learning, and particularly relates to a method and a device for identifying a gas photoacoustic spectrum enhanced voiceprint based on deep learning.
Background
The gas photoacoustic spectroscopy technology is a novel detection technology, mainly used for quantitatively analyzing the concentration of gas by detecting the absorption of laser photon energy by gas molecules, and also belongs to a gas analysis method for measuring the absorption. Compared with a detection method for directly measuring light radiation energy, the technology adds a link for converting heat energy into sound signals. For example, the photoacoustic spectrometry detection technology is applied to the online monitoring process of the gas content in the transformer oil, and has higher detection sensitivity and lower sample gas demand, so that the time for oil-gas separation can be greatly shortened, and the measurement period can be shortened. The photoacoustic cell is a carrier for generating photoacoustic effect by gas, and the performance of the photoacoustic cell is an important factor influencing the detection sensitivity, the signal-to-noise ratio and the detection limit of the photoacoustic spectrum detection system.
The photoacoustic spectroscopy gas detection technology is used for measuring an acoustic signal generated by a sample absorbing light energy, and the magnitude of the light energy absorbed by a substance can be obtained by measuring the intensity of the acoustic signal, so that signal interference caused by reflection, scattering and the like of light in a sample to be detected is avoided. For example, the operational scenarios of transformers differ: the sound environment close to the downtown is complex, while the suburban sound environment is relatively simple; therefore, to improve the measurement accuracy of photoacoustic spectroscopy, different photoacoustic cells are used to improve the signal-to-noise ratio of photoacoustic signals to adapt to different sound environments, thereby increasing the cost and complexity of field measurement.
Disclosure of Invention
In order to improve the identification accuracy and flexibility of the audio signal of the gas photoacoustic spectrum and improve the identification accuracy and stability, the first aspect of the invention provides a deep learning-based gas photoacoustic spectrum enhanced voiceprint identification method, which comprises the following steps: acquiring a plurality of photoacoustic spectrums of target characteristic gas under different components and concentrations to obtain a photoacoustic spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; mixing the photoacoustic sound spectrum data set with the noise data set to obtain an enhanced noise data set; framing each photoacoustic sound spectrum in the photoacoustic sound spectrum data set, and extracting a Mel frequency cepstrum coefficient, a wave peak, a spectrum skewness and a spectrum kurtosis of each photoacoustic sound spectrum; mapping and fusing the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic acoustic spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic acoustic spectrum; respectively taking the concentrations of the noise in the multi-dimensional characteristic vector and the enhanced noise data set and the target characteristic gas as a positive sample, a negative sample and a label to construct a sample data set; training a recurrent neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained recurrent neural network; and inputting the photoacoustic sound spectrum to be identified into the trained recurrent neural network, and identifying the concentration of the target characteristic gas corresponding to the photoacoustic sound spectrum.
In some embodiments of the invention, the mixing the photoacoustic spectroscopy dataset with the noise dataset resulting in an enhanced noise dataset comprises the steps of: and performing time stretching, tone shifting, random matching, cyclic shifting or mixing on the noise signals in the noise data set and the noise signals in the photoacoustic sound spectrum to obtain an enhanced noise data set.
Further, the generative countermeasure network and the noise data set are utilized to improve the realism of the enhanced noise data set. Preferably, the generative confrontation network comprises a generation network and a discrimination network, and the generation network is used for generating reconstructed noise samples according to the noise samples in the noise data set; and the discrimination network is used for evaluating the similarity between the reconstructed noise sample and the noise sample.
In some embodiments of the invention, the recurrent neural network comprises a plurality of deep convolutional neural network elements.
The target characteristic gas is one of hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide, oxygen or nitrogen.
The invention provides a gas photoacoustic spectrum enhanced voiceprint recognition device based on deep learning, which comprises an acquisition module, an extraction module, a fusion module, a training module and a recognition module, wherein the acquisition module is used for acquiring a plurality of photoacoustic spectra of target characteristic gas under different components and concentrations to obtain a photoacoustic spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; mixing the photoacoustic sound spectrum data set with the noise data set to obtain an enhanced noise data set; the extraction module is used for framing each photoacoustic spectrum in the photoacoustic spectrum data set and extracting the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic spectrum; the fusion module is used for mapping and fusing the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic acoustic spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic acoustic spectrum; the training module is used for respectively taking the concentrations of the noise in the multi-dimensional characteristic vector and the enhanced noise data set and the target characteristic gas as a positive sample, a negative sample and a label to construct a sample data set; training a recurrent neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained recurrent neural network; the identification module is used for inputting the photoacoustic sound spectrum to be identified into the trained recurrent neural network and identifying the concentration of the target characteristic gas corresponding to the photoacoustic sound spectrum.
Further, the acquiring module comprises a first acquiring module, a second acquiring module and a mixing module, wherein the first acquiring module is used for acquiring a plurality of photoacoustic sound spectrums of the target characteristic gas under different components and concentrations to obtain a photoacoustic sound spectrum data set; the second acquisition module is used for acquiring the noise of the photoacoustic cell in different sound environments to obtain a noise data set; the mixing module is configured to mix the audio signal of the photoacoustic sound spectrum with the noise data set to obtain an enhanced noise data set.
In a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for gas photoacoustic spectroscopy-enhanced voiceprint recognition based on deep learning provided by the first aspect of the present invention.
In a fourth aspect of the present invention, a computer readable medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the deep learning based gas photoacoustic spectroscopy enhanced voiceprint recognition method provided by the first aspect of the present invention.
The invention has the beneficial effects that:
1. because the collected photoacoustic cell is subjected to noise in different environment sound environments at the construction stage of the sample data set, the adaptability of the identification model can be improved, the photoacoustic cell can be flexibly used for identifying photoacoustic sound spectrums in different scenes, and the dependence on a specific structure of the photoacoustic cell is reduced on the premise of ensuring that the precision is not reduced;
2. the diversity and the authenticity of the reconstructed noise sample are further improved through the generative countermeasure network, so that the robustness of the model is improved, and the gas represented by the photoacoustic spectrum and the concentration or other information thereof can be better and more accurately identified;
3. in order to better learn the sound characteristics of the gas photoacoustic sound spectrum, the cyclic neural network comprises a deep convolutional neural network layer, and the expression of different characteristics by the layered model is more definite and has higher accuracy than a model formed by a single cyclic neural network layer.
Drawings
FIG. 1 is a schematic flow diagram of a deep learning based gas photoacoustic spectroscopy enhanced voiceprint recognition method in some embodiments of the present invention;
FIG. 2 is a schematic diagram of a recurrent neural network in some embodiments of the present invention;
FIG. 3 is a schematic diagram of a deep learning based gas photoacoustic spectroscopy enhanced voiceprint recognition apparatus in some embodiments of the present invention;
FIG. 4 is a basic block diagram of an electronic device in some embodiments of the invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, in a first aspect of the present invention, there is provided a gas photoacoustic spectroscopy enhanced voiceprint recognition method based on deep learning, comprising the steps of: s101, acquiring a plurality of photoacoustic sound spectrums of target characteristic gas under different components and concentrations to obtain a photoacoustic sound spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; mixing the photoacoustic sound spectrum data set with the noise data set to obtain an enhanced noise data set; s102, framing each photoacoustic spectrum in the photoacoustic spectrum data set, and extracting a Mel frequency cepstrum coefficient, a wave peak, a spectrum skewness and a spectrum kurtosis of each photoacoustic spectrum; s103, mapping and fusing the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic sound spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic sound spectrum; s104, respectively taking the concentrations of the multi-dimensional characteristic vector, noise in the enhanced noise data set and the target characteristic gas as a positive sample, a negative sample and a label to construct a sample data set; training a Recurrent Neural Network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining a trained Recurrent Neural Network (RNN); and S105, inputting the photoacoustic sound spectrum to be identified into the trained recurrent neural network, and identifying the concentration of the target characteristic gas corresponding to the photoacoustic sound spectrum.
In some embodiments of the invention, the mixing the photoacoustic spectroscopy dataset with the noise dataset resulting in an enhanced noise dataset comprises the steps of: and performing Time Stretch (Time-Stretch), Pitch Shift (Pitch-Shift), random matching, cyclic Shift and mixing on the noise signals in the noise data set and the audio signals in the photoacoustic sound spectrum to obtain an enhanced noise data set. It will be appreciated that the data enhancement described above is similar to image enhancement, and for example, the boundary after audio signal deformation such as stretching needs to be correspondingly stretched.
Further, the augmented noise data set is augmented with a Generative Adaptive Network (GAN) and a noise data set. Preferably, the generative confrontation network comprises a generation network and a discrimination network, and the generation network is used for generating reconstructed noise samples according to the noise samples in the noise data set; and the discrimination network is used for evaluating the similarity between the reconstructed noise sample and the noise sample.
Specifically, the loss function of the generative countermeasure network is:
Figure 385006DEST_PATH_IMAGE001
wherein V (D, G) represents a loss function of the generative impedance network, D represents a discriminant network, and G represents a generative network;
Figure 356636DEST_PATH_IMAGE002
a loss function representing the generated network,
Figure 569311DEST_PATH_IMAGE003
representing a loss function of the discriminant network, and Pdata (.) representing a sample distribution function; x represents an authentic noise sample, and z represents a non-authentic noise sample (including a non-authentic noise sample obtained by mixing a noise signal in the photoacoustic spectrometry data set with a noise signal in the noise data set in the above-described embodiment); g (z) represents G network generated reconstructed noise samples, d (x) represents the authenticity (probability of being authentic) of the noise samples; d (g (z)) represents the evaluation of the similarity (probability of being true or not) of the reconstructed noise sample and the noise sample. The similarity is determined by the similarity of the images of the sound spectrum, the similarity of the images of the corresponding frequency spectrum, the cross entropy, the KL divergence, the JS divergence or the fitness of the waveform, and the acoustic method.
Referring to FIG. 2, in some embodiments of the invention, the recurrent neural network includes a plurality of deep convolutional neural network elements. From bottom to top, the bottommost part is an input layer, the middle layer (1-L) layer is a hidden layer, the top layer is an output layer, the expression of different characteristics of the hierarchical model is more definite than that of a model formed by a single cyclic neural network layer, and the advantage of processing sequence data by the cyclic neural network is kept.
Optionally, the Recurrent neural network is an LSTM (Long Short-term Memory) neural network or a gru (gated regenerative unit) neural network.
Optionally, the identification method is applied to identification of characteristic gas in transformer oil, and then the target characteristic gas is one of hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide, oxygen or nitrogen.
Referring to fig. 3, in a second aspect of the present invention, there is provided a gas photoacoustic spectroscopy enhanced voiceprint recognition apparatus 1 based on deep learning, including an obtaining module 11, an extracting module 12, a fusing module 13, a training module 14, and a recognition module 15, where the obtaining module 11 is configured to obtain a plurality of photoacoustic spectra of a target characteristic gas under different components and concentrations, so as to obtain a photoacoustic spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; mixing the photoacoustic sound spectrum data set with the noise data set to obtain an enhanced noise data set; the extraction module 12 is configured to perform framing on each photoacoustic spectrum in the photoacoustic spectrum data set, and extract a mel-frequency cepstrum coefficient, a peak, a spectral skewness, and a spectral kurtosis of each photoacoustic spectrum; the fusion module 13 is configured to map and fuse the mel-frequency cepstrum coefficient, the peak, the spectral skewness, and the spectral kurtosis of each photoacoustic acoustic spectrum into a multidimensional vector, so as to obtain a multidimensional feature vector of each photoacoustic acoustic spectrum; the training module 14 is configured to use the concentrations of the multi-dimensional feature vector, noise in the enhanced noise data set, and the target feature gas as a positive sample, a negative sample, and a label, respectively, to construct a sample data set; training a recurrent neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained recurrent neural network; the identifying module 15 is configured to input the photoacoustic spectrum to be identified into the trained recurrent neural network, and identify the concentration of the target characteristic gas corresponding to the photoacoustic spectrum.
Further, the acquiring module 11 includes a first acquiring module, a second acquiring module and a mixing module, where the first acquiring module is configured to acquire a plurality of photoacoustic spectra of the target characteristic gas under different components and concentrations to obtain a photoacoustic spectrum data set; the second acquisition module is used for acquiring the noise of the photoacoustic cell in different sound environments to obtain a noise data set; the mixing module is configured to mix the audio signal of the photoacoustic sound spectrum with the noise data set to obtain an enhanced noise data set.
In a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for gas photoacoustic spectroscopy-enhanced voiceprint recognition based on deep learning provided by the first aspect of the present invention.
Referring to fig. 4, an electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following devices may be connected to the I/O interface 505 in general: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; a storage device 508 including, for example, a hard disk; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 4 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 4 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more computer programs which, when executed by the electronic device, cause the electronic device to:
computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, Python, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. The method for identifying the gas photoacoustic spectrum enhanced voiceprint based on deep learning is characterized by comprising the following steps of:
acquiring a plurality of photoacoustic spectrums of target characteristic gas under different components and concentrations to obtain a photoacoustic spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; mixing the photoacoustic sound spectrum data set with the noise data set to obtain an enhanced noise data set;
framing each photoacoustic sound spectrum in the photoacoustic sound spectrum data set, and extracting a Mel frequency cepstrum coefficient, a wave peak, a spectrum skewness and a spectrum kurtosis of each photoacoustic sound spectrum;
mapping and fusing the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic acoustic spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic acoustic spectrum;
respectively taking the concentrations of the noise in the multi-dimensional characteristic vector and the enhanced noise data set and the target characteristic gas as a positive sample, a negative sample and a label to construct a sample data set; training a recurrent neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained recurrent neural network;
and inputting the photoacoustic sound spectrum to be identified into the trained recurrent neural network, and identifying the concentration of the target characteristic gas corresponding to the photoacoustic sound spectrum.
2. The deep learning based gas photoacoustic spectroscopy enhanced voiceprint recognition method of claim 1, wherein the mixing the photoacoustic spectroscopy dataset with the noise dataset to obtain an enhanced noise dataset comprises the steps of:
and performing time stretching, tone shifting, random matching, cyclic shifting or mixing on the noise signals in the noise data set and the noise signals in the photoacoustic sound spectrum to obtain an enhanced noise data set.
3. The deep learning-based gas photoacoustic spectroscopy-enhanced voiceprint recognition method of claim 2, wherein a generative countermeasure network and a noise data set are utilized to increase the authenticity of the enhanced noise data set.
4. The deep learning-based gas photoacoustic spectroscopy enhanced voiceprint recognition method according to claim 3, wherein the generative countermeasure network comprises a generation network and a discrimination network,
the generating network is used for generating reconstructed noise samples according to the noise samples in the noise data set;
and the discrimination network is used for evaluating the similarity between the reconstructed noise sample and the noise sample.
5. The deep learning-based gas photoacoustic spectroscopy-enhanced voiceprint recognition method of claim 1, wherein the recurrent neural network comprises a plurality of deep convolutional neural network units.
6. The deep learning based gas photoacoustic spectroscopy enhanced voiceprint recognition method of claim 1, wherein the target characteristic gas is one of hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide, oxygen or nitrogen.
7. A gas photoacoustic spectrum enhanced voiceprint recognition device based on deep learning is characterized by comprising an acquisition module, an extraction module, a fusion module, a training module and a recognition module,
the acquisition module is used for acquiring a plurality of photoacoustic sound spectrums of target characteristic gas under different components and concentrations to obtain a photoacoustic sound spectrum data set; acquiring noise of the photoacoustic cell in different sound environments to obtain a noise data set; mixing the photoacoustic sound spectrum data set with the noise data set to obtain an enhanced noise data set;
the extraction module is used for framing each photoacoustic spectrum in the photoacoustic spectrum data set and extracting the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic spectrum;
the fusion module is used for mapping and fusing the Mel frequency cepstrum coefficient, the wave peak, the spectrum skewness and the spectrum kurtosis of each photoacoustic acoustic spectrum into a multi-dimensional vector to obtain a multi-dimensional feature vector of each photoacoustic acoustic spectrum;
the training module is used for respectively taking the concentrations of the noise in the multi-dimensional characteristic vector and the enhanced noise data set and the target characteristic gas as a positive sample, a negative sample and a label to construct a sample data set; training a recurrent neural network by using the sample data set until the error is lower than a threshold value and tends to be stable, and obtaining the trained recurrent neural network;
the identification module is used for inputting the photoacoustic sound spectrum to be identified into the trained recurrent neural network and identifying the concentration of the target characteristic gas corresponding to the photoacoustic sound spectrum.
8. The deep learning based gas photoacoustic spectroscopy-enhanced voiceprint recognition apparatus of claim 7, wherein the acquiring means comprises a first acquiring means, a second acquiring means and a mixing means,
the first acquisition module is used for acquiring a plurality of photoacoustic sound spectrums of target characteristic gas under different components and concentrations to obtain a photoacoustic sound spectrum data set;
the second acquisition module is used for acquiring the noise of the photoacoustic cell in different sound environments to obtain a noise data set;
the mixing module is configured to mix the audio signal of the photoacoustic sound spectrum with the noise data set to obtain an enhanced noise data set.
9. An electronic device, comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the deep learning-based gas photoacoustic spectroscopy-enhanced voiceprint recognition method as claimed in any one of claims 1 to 6.
10. A computer-readable medium, characterized in that a computer program is stored thereon, wherein the computer program, when being executed by a processor, implements the deep learning-based gas photoacoustic spectroscopy-enhanced voiceprint recognition method as recited in any one of claims 1 to 6.
CN202110168036.0A 2021-02-07 2021-02-07 Gas photoacoustic spectrum enhanced voiceprint recognition method and device based on deep learning Active CN112504970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110168036.0A CN112504970B (en) 2021-02-07 2021-02-07 Gas photoacoustic spectrum enhanced voiceprint recognition method and device based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110168036.0A CN112504970B (en) 2021-02-07 2021-02-07 Gas photoacoustic spectrum enhanced voiceprint recognition method and device based on deep learning

Publications (2)

Publication Number Publication Date
CN112504970A CN112504970A (en) 2021-03-16
CN112504970B true CN112504970B (en) 2021-04-20

Family

ID=74952756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110168036.0A Active CN112504970B (en) 2021-02-07 2021-02-07 Gas photoacoustic spectrum enhanced voiceprint recognition method and device based on deep learning

Country Status (1)

Country Link
CN (1) CN112504970B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111944B (en) * 2021-04-13 2022-05-31 湖北鑫英泰***技术股份有限公司 Photoacoustic spectrum identification method and device based on deep learning and gas photoacoustic effect

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
HU226448B1 (en) * 2005-06-29 2008-12-29 Univ Szegedi Measuring configuration for and method of detecting at least one component of gas-mixtures by photoacoustic principle
CN102800316A (en) * 2012-08-30 2012-11-28 重庆大学 Optimal codebook design method for voiceprint recognition system based on nerve network
CN102884413A (en) * 2010-03-02 2013-01-16 利康股份有限公司 Method and apparatus for the photo-acoustic identification and quantification of analyte species in a gaseous or liquid medium
CN105575394A (en) * 2016-01-04 2016-05-11 北京时代瑞朗科技有限公司 Voiceprint identification method based on global change space and deep learning hybrid modeling
CN106531174A (en) * 2016-11-27 2017-03-22 福州大学 Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN206074431U (en) * 2016-07-19 2017-04-05 安徽华电六安电厂有限公司 A kind of optoacoustic spectroscopy principle transformer online monitoring system based on elimination cross influence
CN112304869A (en) * 2019-07-26 2021-02-02 英飞凌科技股份有限公司 Gas sensing device for sensing gas in gas mixture and method for operating the same

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
HU226448B1 (en) * 2005-06-29 2008-12-29 Univ Szegedi Measuring configuration for and method of detecting at least one component of gas-mixtures by photoacoustic principle
CN102884413A (en) * 2010-03-02 2013-01-16 利康股份有限公司 Method and apparatus for the photo-acoustic identification and quantification of analyte species in a gaseous or liquid medium
CN102800316A (en) * 2012-08-30 2012-11-28 重庆大学 Optimal codebook design method for voiceprint recognition system based on nerve network
CN105575394A (en) * 2016-01-04 2016-05-11 北京时代瑞朗科技有限公司 Voiceprint identification method based on global change space and deep learning hybrid modeling
CN206074431U (en) * 2016-07-19 2017-04-05 安徽华电六安电厂有限公司 A kind of optoacoustic spectroscopy principle transformer online monitoring system based on elimination cross influence
CN106531174A (en) * 2016-11-27 2017-03-22 福州大学 Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN112304869A (en) * 2019-07-26 2021-02-02 英飞凌科技股份有限公司 Gas sensing device for sensing gas in gas mixture and method for operating the same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Identication and concentration measurements of atmospheric pollutants through a neural network analysis of photothermal signatures;K.Boccara等;《JOURNAL DE PHYSIQUE IV》;19941231;第C7-107-110页 *
基于多特征融合的乐器声品质评价方法研究;陈燕文等;《测试技术学报》;20191231;第33卷(第5期);第421-427页 *

Also Published As

Publication number Publication date
CN112504970A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
US20210271826A1 (en) Speech translation method, electronic device and computer-readable storage medium
CN111429946A (en) Voice emotion recognition method, device, medium and electronic equipment
CN112432905B (en) Voiceprint recognition method and device based on photoacoustic spectrum of characteristic gas in transformer oil
CN110085210B (en) Interactive information testing method and device, computer equipment and storage medium
CN112595672B (en) Mixed gas photoacoustic spectrum identification method and device based on deep learning
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN113111944B (en) Photoacoustic spectrum identification method and device based on deep learning and gas photoacoustic effect
CN112259123B (en) Drum point detection method and device and electronic equipment
CN112504970B (en) Gas photoacoustic spectrum enhanced voiceprint recognition method and device based on deep learning
Han et al. A new audio steganalysis method based on linear prediction
Machado et al. Forensic speaker verification using ordinary least squares
Bhangale et al. Speech emotion recognition using the novel PEmoNet (Parallel Emotion Network)
Hu et al. A lightweight multi-sensory field-based dual-feature fusion residual network for bird song recognition
CN117058597B (en) Dimension emotion recognition method, system, equipment and medium based on audio and video
CN111914822B (en) Text image labeling method, device, computer readable storage medium and equipment
CN116978409A (en) Depression state evaluation method, device, terminal and medium based on voice signal
CN116153337A (en) Synthetic voice tracing evidence obtaining method and device, electronic equipment and storage medium
CN113642629B (en) Visualization method and device for improving reliability of spectroscopy analysis based on random forest
CN116110423A (en) Multi-mode audio-visual separation method and system integrating double-channel attention mechanism
CN114302301B (en) Frequency response correction method and related product
Mirza et al. Residual LSTM neural network for time dependent consecutive pitch string recognition from spectrograms: a study on Turkish classical music makams
Jadhav et al. Transfer Learning for Audio Waveform to Guitar Chord Spectrograms Using the Convolution Neural Network
CN112951274A (en) Voice similarity determination method and device, and program product
Wu et al. Audio-based expansion learning for aerial target recognition
Luo et al. Compression detection of audio waveforms based on stacked autoencoders

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method and device of gas photoacoustic spectrum enhanced voiceprint recognition based on deep learning

Effective date of registration: 20220610

Granted publication date: 20210420

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: HUBEI INFOTECH CO.,LTD.

Registration number: Y2022420000153

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230922

Granted publication date: 20210420

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: HUBEI INFOTECH CO.,LTD.

Registration number: Y2022420000153

PC01 Cancellation of the registration of the contract for pledge of patent right