CN115662464A - Method and system for intelligently identifying environmental noise - Google Patents
Method and system for intelligently identifying environmental noise Download PDFInfo
- Publication number
- CN115662464A CN115662464A CN202211704375.7A CN202211704375A CN115662464A CN 115662464 A CN115662464 A CN 115662464A CN 202211704375 A CN202211704375 A CN 202211704375A CN 115662464 A CN115662464 A CN 115662464A
- Authority
- CN
- China
- Prior art keywords
- noise
- voiceprint
- environmental noise
- characteristic
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
Landscapes
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention discloses a method and a system for intelligently identifying environmental noise, which extract frequency domain characteristics of a noise voiceprint from environmental noise characteristic data through an environmental noise voiceprint frequency domain characteristic identification channel in a noise identification neural network model, extract relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint position characteristic identification channel, fuse the extracted frequency domain characteristics of the noise voiceprint and the relative position characteristic characteristics through a characteristic fusion channel to obtain fusion characteristics with the frequency domain characteristics of the noise voiceprint and the relative position characteristic characteristics, classify and determine the environmental noise type through a classification identifier.
Description
Technical Field
The invention relates to the technical field of environmental noise processing, in particular to a method and a system for intelligently identifying environmental noise.
Background
Environmental noise refers to sound generated in industrial production, building construction, transportation and social life and interfering with the surrounding living environment, noise prevention and control needs to perform classified prevention and control of noise pollution, namely, the types of the environmental noise need to be distinguished, supervision and law enforcement are developed, a traditional noise detection system is limited to obtain original audio recording and measured instant sound intensity or average sound intensity within a period of time to judge whether the noise exceeds the standard, the method is only relied on, the types of the environmental noise cannot be determined, landing which is not beneficial to noise supervision is difficult, and the law enforcement is more difficult and serious, and the prior art can classify and identify the environmental noise through a neural network, for example, the Chinese patent application number is 2019166344.2, the name is an environmental noise identification and classification method based on a convolutional neural network, and the following scheme is adopted: step 1, extracting natural environment noise, editing the natural environment noise into a noise segment with the duration of 300ms-30s and the frequency of 44.1 kHz; step 2, carrying out short-time Fourier transform on the noise fragment, and converting the one-dimensional time domain signal into a two-dimensional frequency domain signal to obtain a spectrogram; step 3, extracting a Mel frequency spectrum cepstrum coefficient (MFCC) of the signal; step 4, taking 80% of all noise segments as a training set, and taking the rest 20% as a test set; step 5, carrying out noise classification by using a convolutional neural network model; and 6, training a classification model by using the training set, verifying the accuracy of the model by using the test set, and completing the environmental noise identification classification based on the convolutional neural network, but only extracting the characteristics on an environmental noise frequency domain to classify the environmental noise in the prior art, so that the classification identification accuracy is not high.
Disclosure of Invention
The invention aims to provide a method and a system for intelligently identifying environmental noise so as to improve the accuracy of environmental noise classification and identification.
In order to solve the technical problem, the invention adopts the following technical scheme:
in one aspect, a method for intelligently identifying ambient noise includes the following steps:
collecting an environmental noise signal;
extracting environmental noise characteristic data from the acquired environmental noise signals;
training a noise recognition neural network model;
inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;
an environmental noise voiceprint frequency domain feature identification channel in the noise identification neural network model extracts frequency domain features of noise voiceprints from environmental noise feature data, an environmental noise voiceprint position feature identification channel extracts relative position feature features of the noise voiceprints from the environmental noise feature data, a feature fusion channel fuses the extracted frequency domain features of the noise voiceprints and the relative position feature features to obtain fusion features with the noise voiceprint frequency domain features and the relative position feature features, and a classification identifier classifies the fusion features to determine the environmental noise type.
Wherein, extracting the environmental noise characteristic data from the collected noise signal specifically comprises:
carrying out data preprocessing on the acquired noise audio signal, and carrying out short-time Fourier transform on the signal subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.
The training of the noise recognition neural network model specifically comprises the following steps:
preprocessing the acquired noise fragment and extracting characteristic data to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.
Preferably, random clipping, sound speed regulation, tone regulation and sound fusion are adopted to perform data enhancement on the feature data to obtain an N times training data set.
The environment noise voiceprint frequency domain feature identification channel can adopt a convolutional neural network, the environment noise voiceprint position feature identification channel can adopt a long-time memory neural network, and the classification identifier can adopt an exponential normalization classifier.
In another aspect, a system for intelligently identifying ambient noise, comprising:
the acquisition processing module is used for acquiring an environmental noise signal;
the environmental noise characteristic data extraction processing module is used for extracting environmental noise characteristic data from the acquired environmental noise signals;
the training processing module is used for training a noise recognition neural network model;
the input processing module is used for inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, and the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;
the classification identification module is used for extracting the frequency domain characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint frequency domain characteristic identification channel in the noise identification neural network model, extracting the relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint position characteristic identification channel, fusing the extracted frequency domain characteristics and the relative position characteristic characteristics of the noise voiceprint through a characteristic fusion channel to obtain fusion characteristics with the noise voiceprint frequency domain characteristics and the relative position characteristic characteristics, and classifying the fusion characteristics through a classification identifier to determine the type of the environmental noise.
The environmental noise characteristic data extraction processing module adopts the following method to extract:
carrying out data preprocessing on the acquired noise audio signals, and carrying out short-time Fourier transform on the signals subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.
Wherein, the training processing module can adopt the following modes to train:
the collected noise fragments are subjected to data preprocessing and characteristic data extraction to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.
Preferably, the training processing module performs data enhancement on the feature data by adopting random clipping, sound speed regulation, tone regulation and sound fusion to obtain an N times training data set.
The environment noise voiceprint frequency domain feature identification channel can adopt a convolutional neural network, the environment noise voiceprint position feature identification channel can adopt a long-time memory neural network, and the classification identifier can adopt an exponential normalization classifier.
Compared with the prior art, the invention has the following beneficial effects:
in the method and the system, environmental noise characteristic data are extracted from the acquired environmental noise signals; training a noise recognition neural network model; inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer; the method comprises the steps that an environmental noise voiceprint frequency domain characteristic identification channel in a noise identification neural network model extracts frequency domain characteristics of a noise voiceprint from environmental noise characteristic data, an environmental noise voiceprint position characteristic identification channel extracts relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data, a characteristic fusion channel fuses the extracted frequency domain characteristics of the noise voiceprint and the extracted relative position characteristic characteristics to obtain fusion characteristics with the noise voiceprint frequency domain characteristics and the relative position characteristic characteristics, a classification identifier classifies the fusion characteristics to determine the environmental noise type, the frequency domain characteristics of the noise voiceprint are extracted from the environmental noise characteristic data through the environmental noise voiceprint frequency domain characteristic identification channel, the environmental noise voiceprint position characteristic identification channel extracts the relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data, the frequency domain characteristics of the noise voiceprint and the relative position characteristic characteristics of the noise voiceprint are fused, and classification is carried out according to the fusion characteristics, and compared with the prior art, the accuracy of environmental noise classification is higher through the frequency domain characteristics alone.
Drawings
FIG. 1 is a flow chart of one embodiment of a method for intelligently identifying ambient noise according to the present invention;
FIG. 2 is a schematic structural diagram of a noise-identifying neural network model trained in the method for intelligently identifying environmental noise according to an embodiment of the present invention;
FIG. 3 is a block diagram of an embodiment of a system for intelligently identifying ambient noise.
Detailed Description
Referring to fig. 1, the figure is a flowchart of an embodiment of the method for intelligently identifying environmental noise of the present invention, and mainly includes the following steps:
s101, collecting an environmental noise signal, wherein during specific implementation, an audio signal of the environmental noise can be collected through a noise collection device, the environmental noise mainly comprises traffic noise, industrial noise, building construction noise and social life noise, the traffic noise is noise generated when vehicles such as motor vehicles, airplanes, trains and ships run, and the industrial noise mainly refers to noise generated in industrial production labor. The building noise mainly comes from machines and high-speed running equipment, the building noise mainly refers to noise generated on a building construction site, the social life noise mainly refers to noise generated by people in various social activities such as commercial transactions, sports competitions, tourist conferences and entertainment places, and noise of various household appliances such as radio recorders, televisions and washing machines, and is not repeated herein;
s102, extracting the environmental noise feature data from the acquired environmental noise signal, for example, in the following manner:
carrying out data preprocessing on the acquired noise audio signal, and carrying out short-time Fourier transform on the signal subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
drawing the filter bank characteristics into a noise voiceprint image as characteristic data;
it should be noted that the data preprocessing mainly makes the data meet the requirement of short-time fourier transform, mainly adopts pre-emphasis, framing and windowing, and may also adopt other manners, which are not specifically limited herein;
in addition, the data volume of the audio is generally large, and such feature data will greatly increase the amount of calculation in the subsequent neural network feature classification; in the step, the sound is converted to the Mel domain through the Mel filter bank to be expressed, so that the Mel frequency spectrum is obtained, the Mel frequency spectrum is more consistent with the auditory characteristics of human ears, the data volume is controlled, and the voiceprint characteristics on the frequency domain are more obvious and abundant;
s103, training the noise recognition neural network model, for example, the following method may be used for training when the noise recognition neural network model is specifically implemented:
the collected noise fragments are subjected to data preprocessing and characteristic data extraction to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met;
as shown in fig. 2, the above-mentioned built noise recognition neural network model architecture mainly includes an environmental noise voiceprint frequency domain feature recognition channel, an environmental noise voiceprint position feature recognition channel, a feature fusion channel and a classification recognizer, as a preferred embodiment, the environmental noise voiceprint frequency domain feature recognition channel may adopt a convolutional neural network, the environmental noise voiceprint position feature recognition channel may adopt a long-time and short-time memory neural network, and the classification recognizer may adopt an exponential normalization classifier, which is not specifically limited herein;
it should be noted that, because the amount of data collected is limited, the learning of the neural network requires as many samples as possible; some speech signal processing can be performed on the original data, such as: random cutting, sound speed regulation, tone regulation and other operations, namely performing data enhancement on characteristic data by adopting random cutting, sound speed regulation, tone regulation, sound fusion and the like to obtain an N-time training data set, screening out a batch of relatively pure target environmental noises, filtering the target environmental noises to obtain purer target environmental noises, performing weighted fusion on the purer target environmental noises and natural noises (wind, bird, and the like), and performing speed regulation and other operations to obtain environmental noise data with higher quality, so that the N-time training data set is obtained, and the model training result is remarkably improved;
in addition, after the model training is finished, objective performance evaluation needs to be carried out on the training result, wherein the three performance indexes include but are not limited to accuracy, F1 score and recall rate, and if the performance indexes do not meet the requirements, whether the data set has unknown problems is checked; and readjusting corresponding parameters in the noise recognition neural network model, such as adjusting the number of network layers, the depth of convolution pooling in the convolutional neural network, the number of neurons in the long-time and short-time memory neural network, and the like.
S104, inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;
s105, extracting frequency domain characteristics of a noise voiceprint from environmental noise characteristic data by an environmental noise voiceprint frequency domain characteristic identification channel in the noise identification neural network model, extracting relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data by an environmental noise voiceprint position characteristic identification channel, wherein during specific implementation, the extracted relative position characteristic characteristics can be characteristics in various forms, for example, the relative position characteristic characteristics can be time sequence characteristics of corresponding noise voiceprints or other relative position characteristic characteristics, and are not specifically limited;
in addition, in step S105, the feature fusion channel fuses the frequency domain features and the relative position feature features of the extracted noise voiceprint to obtain fusion features having the frequency domain features and the relative position feature features of the noise voiceprint, and then classifies the fusion features by the classification identifier to determine the environmental noise type;
it should be noted that, in the prior art, in the specific implementation, a feature weighted fusion manner may be used for feature fusion to fuse the frequency domain feature and the relative position feature of the extracted noise voiceprint, that is, the frequency domain feature and the relative position feature of the extracted noise voiceprint are weighted and fused by different weights, but if the extracted feature is abnormal, when the weighted fusion manner is used for fusion, if the weight of the type of feature is too large, the accuracy of subsequent noise identification is greatly reduced for the abnormal feature portion, and in order to solve the technical problem that the accuracy of noise identification is reduced, as a preferred embodiment of the present invention, the feature fusion path in this embodiment performs feature fusion by using the following manner, that is:
firstly, combining the extracted frequency domain characteristics and relative position characteristic characteristics of the noise voiceprint into an alternative characteristic set;
then, the alternative feature set is divided into a normal feature vector set and an abnormal feature vector set, wherein the normal feature vector set refers to a feature vector set in which all features are normal, and the abnormal feature vector set refers to a feature vector set in which some features are abnormal, and when the determination of whether the features are abnormal is specifically implemented, the determination in this embodiment can be performed according to the loss of feature data or the fact that the feature data is not within a predetermined range, and no specific limitation is made here;
thirdly, classifying the normal feature vector set to obtain each normal feature subset, wherein the features in each normal feature subset are similar, and in the specific implementation, the classification can adopt various existing classification algorithms, for example, a mean shift classification algorithm or other classification algorithms, which is not specifically limited herein;
thirdly, correcting the abnormal feature data of the abnormal feature vector set, wherein in the specific implementation, for example, the dissimilarity degree between the abnormal feature and each feature in the abnormal feature vector set can be calculated, and the feature data value with the minimum dissimilarity degree with the abnormal feature is selected to correct the abnormal feature data;
thirdly, calculating the distance between the features in the corrected abnormal feature vector set and each normal feature subset, and adding the features into the normal feature subset closest to the features, during specific implementation, selecting one feature in the abnormal feature vector set to calculate the distance between the features and each normal feature subset, wherein the distance may be mahalanobis distance or other distances capable of determining similarity, without specific limitation, determining the normal feature subset closest to the features according to the calculation result, then adding the feature into the normal feature subset closest to the distance, continuing to select other features in the abnormal feature vector set to calculate and determine the normal feature subset closest to the features according to the above method, and adding the feature into the normal feature subset closest to the features until all the features in the abnormal feature vector set are added into the corresponding normal feature subset;
and finally, performing feature cascade on each determined normal feature subset to obtain a final fusion feature, during concrete implementation, adding the features of the abnormal feature vector set into each normal feature subset, respectively normalizing each finally determined normal feature subset, and then splicing each normalized normal feature subset according to the direction of the dimension to obtain the final fusion feature.
It should be noted that, in the above embodiment, the frequency domain features and the relative position feature of the extracted noise voiceprint are merged into the candidate feature set, the features in the candidate feature set are divided into the normal feature vector set and the abnormal feature vector set, the normal feature vector set is classified to obtain each normal feature subset, the abnormal features in the abnormal feature vector set are corrected, the features of the corrected abnormal feature vector set are added to the normal feature subset closest to the features, the features of the abnormal feature vector set are added to each normal feature subset, and then the finally determined normal feature subsets are subjected to feature cascade to obtain the final fusion features, so that the technical problem that the subsequent ambient noise identification accuracy is reduced due to the extracted features being abnormal can be avoided, and the accuracy of ambient noise identification is finally improved.
The noise identification neural network model in the embodiment can simultaneously extract the voiceprint frequency domain characteristics and the relative position characteristic information of the environmental noise, then the two characteristics are fused, the two characteristics complement each other to form a fusion characteristic, classification identification is carried out according to the fusion characteristic, the accuracy of the classification identification of the environmental noise can be greatly improved, and even if the extracted characteristics are abnormal, the noise identification neural network model can also accurately identify the environmental noise.
Referring to fig. 3, which is a block diagram of an embodiment of the system for intelligently recognizing environmental noise of the present invention, the system for intelligently recognizing environmental noise of the present embodiment mainly includes: an acquisition processing module 101, an environmental noise characteristic data extraction processing module 102, a training processing module 103, an input processing module 104 and a classification recognition module 105, wherein
The acquisition processing module 101, in this embodiment, the acquisition processing module 101, is mainly configured to acquire an environmental noise signal, and in specific implementation, the audio signal of the environmental noise may be acquired by a noise acquisition device, where the environmental noise mainly includes traffic noise, industrial noise, building construction noise, and social life noise, which are not described herein again;
an environmental noise characteristic data extraction processing module 102, in this embodiment, the environmental noise characteristic data extraction processing module 102 is mainly used for extracting environmental noise characteristic data from an acquired environmental noise signal; in a specific implementation, the ambient noise feature data extraction processing module 102 may extract the ambient noise feature data by:
carrying out data preprocessing on the acquired noise audio signals, and carrying out short-time Fourier transform on the signals subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.
As mentioned above, the data preprocessing mainly makes the data meet the requirement of short-time fourier transform, mainly adopts pre-emphasis, framing and windowing, and can also adopt other modes, which are not specifically limited herein;
in addition, the data volume of the audio is generally large, and such feature data will greatly increase the amount of calculation in the subsequent neural network feature classification; in the step, the sound is converted to the Mel domain through the Mel filter bank to be expressed, so that the Mel frequency spectrum is obtained, the Mel frequency spectrum is more consistent with the auditory characteristics of human ears, the data volume is controlled, and the voiceprint characteristics on the frequency domain are more obvious and abundant;
a training processing module 103, in this embodiment, the training processing module 103 is mainly used for training a noise recognition neural network model; in specific implementation, the training processing module 103 may perform training in the following manner:
preprocessing the acquired noise fragment and extracting characteristic data to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.
As shown in fig. 2, the above-mentioned built noise recognition neural network model architecture mainly includes an environmental noise voiceprint frequency domain feature recognition channel, an environmental noise voiceprint position feature recognition channel, a feature fusion channel and a classification recognizer, as a preferred embodiment, the environmental noise voiceprint frequency domain feature recognition channel may adopt a convolutional neural network, the environmental noise voiceprint position feature recognition channel may adopt a long-time and short-time memory neural network, and the classification recognizer may adopt an exponential normalization classifier, which is not specifically limited herein;
an input processing module 104, in this embodiment, the input processing module 104 is mainly configured to input the extracted environmental noise feature data into a trained noise recognition neural network model, where the noise recognition neural network model includes an environmental noise voiceprint frequency domain feature recognition channel, an environmental noise voiceprint position feature recognition channel, a weighting fusion channel, and a classification recognizer;
a classification identification module 105, in this embodiment, the classification identification module 105 is mainly configured to extract frequency domain features of a noise voiceprint from environmental noise feature data through an environmental noise voiceprint frequency domain feature identification channel in a noise identification neural network model, extract relative position feature features of the noise voiceprint from the environmental noise feature data through an environmental noise voiceprint position feature identification channel, fuse the extracted frequency domain features of the noise voiceprint and the extracted relative position feature features to obtain a fusion feature having the noise voiceprint frequency domain features and the relative position feature features, and classify and determine an environmental noise type through a classification identifier.
It should be noted that if an extracted feature is abnormal, when a weighted fusion method is used to perform feature fusion, if the weight of the feature of the type is too large, the accuracy of subsequent noise identification will be greatly reduced, and as a preferred embodiment of the present invention, the feature fusion channel of the classification recognition module 105 in this embodiment may perform feature fusion in the following manner, that is:
firstly, combining the extracted frequency domain characteristics and relative position characteristic characteristics of the noise voiceprint into an alternative characteristic set;
then, dividing the alternative feature set into a normal feature vector set and an abnormal feature vector set, wherein the normal feature vector set refers to a feature vector set in which all features are normal, and the abnormal feature vector set refers to a feature vector set in which some features are abnormal, and in the specific implementation, the judgment on whether the features are abnormal or not in the embodiment can be performed according to the loss of feature data or the fact that the feature data are not in a predetermined range, and no specific limitation is made herein;
thirdly, classifying the normal feature vector set to obtain each normal feature subset, wherein the features in each normal feature subset are similar, and in the specific implementation, the classification can adopt various existing classification algorithms, for example, a mean shift classification algorithm or other classification algorithms, which is not specifically limited herein;
thirdly, correcting the abnormal feature data of the abnormal feature vector set, wherein in the specific implementation, for example, the dissimilarity degree between the abnormal feature and each feature in the abnormal feature vector set can be calculated, and the feature data value with the minimum dissimilarity degree with the abnormal feature is selected to correct the abnormal feature data;
thirdly, calculating the distance between the features in the corrected abnormal feature vector set and each normal feature subset, and adding the features into the normal feature subset closest to the features, during specific implementation, selecting one feature in the abnormal feature vector set to calculate the distance between the features and each normal feature subset, wherein the distance may be mahalanobis distance or other distances capable of determining similarity, without specific limitation, determining the normal feature subset closest to the features according to the calculation result, then adding the feature into the normal feature subset closest to the distance, continuing to select other features in the abnormal feature vector set to calculate and determine the normal feature subset closest to the features according to the above method, and adding the feature into the normal feature subset closest to the features until all the features in the abnormal feature vector set are added into the corresponding normal feature subset;
and finally, performing feature cascade on each determined normal feature subset to obtain final fusion features, during specific implementation, adding the features of the abnormal feature vector set into each normal feature subset, respectively normalizing each finally determined normal feature subset, and then splicing each normalized normal feature subset according to the direction of the dimension to obtain the final fusion features.
The noise identification neural network model in the embodiment can simultaneously extract the voiceprint frequency domain characteristics and the relative position characteristic information of the environmental noise, then the two characteristics are fused, the two characteristics supplement each other to form a fusion characteristic, classification and identification are carried out according to the fusion characteristic, the accuracy of classification and identification of the environmental noise can be greatly improved, and the noise identification neural network model can also accurately identify the environmental noise even if the extracted characteristics are abnormal.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.
Claims (10)
1. A method for intelligently identifying environmental noise is characterized by comprising the following steps:
collecting an environmental noise signal;
extracting environmental noise characteristic data from the acquired environmental noise signals;
training a noise recognition neural network model;
inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;
an environmental noise voiceprint frequency domain feature identification channel in the noise identification neural network model extracts frequency domain features of a noise voiceprint from environmental noise feature data, an environmental noise voiceprint position feature identification channel extracts relative position feature features of the noise voiceprint from the environmental noise feature data, a feature fusion channel fuses the extracted frequency domain features and the relative position feature features of the noise voiceprint to obtain fusion features with the noise voiceprint frequency domain features and the relative position feature features, and a classification identifier classifies the fusion features to determine the type of the environmental noise.
2. The method according to claim 1, wherein extracting the ambient noise characteristic data from the acquired noise signal comprises in particular:
carrying out data preprocessing on the acquired noise audio signals, and carrying out short-time Fourier transform on the signals subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.
3. The method of claim 1, wherein training the noise-discriminating neural network model specifically comprises:
the collected noise fragments are subjected to data preprocessing and characteristic data extraction to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.
4. The method of claim 3, wherein the feature data is data enhanced by random cropping, voice pacing, pitch modification, and voice fusion to obtain N times of training data set.
5. The method according to claim 1, wherein the frequency domain feature recognition channel of the environmental noise voiceprint adopts a convolutional neural network, the position feature recognition channel of the environmental noise voiceprint adopts a long-time memory neural network, and the classification recognizer adopts an exponential normalization classifier.
6. A system for intelligently identifying ambient noise, comprising:
the acquisition processing module is used for acquiring an environmental noise signal;
the environmental noise characteristic data extraction processing module is used for extracting environmental noise characteristic data from the acquired environmental noise signals;
the training processing module is used for training a noise recognition neural network model;
the input processing module is used for inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, and the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;
the classification identification module is used for extracting the frequency domain characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint frequency domain characteristic identification channel in the noise identification neural network model, extracting the relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint position characteristic identification channel, fusing the extracted frequency domain characteristics and the relative position characteristic characteristics of the noise voiceprint through a characteristic fusion channel to obtain fusion characteristics with the noise voiceprint frequency domain characteristics and the relative position characteristic characteristics, and classifying the fusion characteristics through a classification identifier to determine the type of the environmental noise.
7. The system of claim 6, wherein the ambient noise feature data extraction processing module extracts by:
carrying out data preprocessing on the acquired noise audio signals, and carrying out short-time Fourier transform on the signals subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.
8. The system of claim 6, wherein the training processing module trains in the following manner:
preprocessing the acquired noise fragment and extracting characteristic data to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.
9. The system of claim 8, wherein the training processing module performs data enhancement on the feature data by using random cropping, voice speed adjustment, tone adjustment, and voice fusion to obtain N times of training data set.
10. The system according to claim 6, wherein the frequency domain feature recognition channel of the environmental noise voiceprint adopts a convolutional neural network, the position feature recognition channel of the environmental noise voiceprint adopts a long-time memory neural network, and the classification recognizer adopts an exponential normalization classifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211704375.7A CN115662464B (en) | 2022-12-29 | 2022-12-29 | Method and system for intelligently identifying environmental noise |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211704375.7A CN115662464B (en) | 2022-12-29 | 2022-12-29 | Method and system for intelligently identifying environmental noise |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115662464A true CN115662464A (en) | 2023-01-31 |
CN115662464B CN115662464B (en) | 2023-06-27 |
Family
ID=85022962
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211704375.7A Active CN115662464B (en) | 2022-12-29 | 2022-12-29 | Method and system for intelligently identifying environmental noise |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115662464B (en) |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106251049A (en) * | 2016-07-25 | 2016-12-21 | 国网浙江省电力公司宁波供电公司 | A kind of electricity charge risk model construction method of big data |
US20180260656A1 (en) * | 2017-03-08 | 2018-09-13 | Hitachi, Ltd. | Abnormal waveform sensing system, abnormal waveform sensing method, and waveform analysis device |
CN109767785A (en) * | 2019-03-06 | 2019-05-17 | 河北工业大学 | Ambient noise method for identifying and classifying based on convolutional neural networks |
CN110457550A (en) * | 2019-07-05 | 2019-11-15 | 中国地质大学(武汉) | The bearing calibration of misoperation data in a kind of sintering process |
CN110458170A (en) * | 2019-08-06 | 2019-11-15 | 汕头大学 | Chinese character positioning and recognition methods in a kind of very noisy complex background image |
CN112043271A (en) * | 2020-09-21 | 2020-12-08 | 北京华睿博视医学影像技术有限公司 | Electrical impedance measurement data correction method and device |
CN112101765A (en) * | 2020-09-08 | 2020-12-18 | 国网山东省电力公司菏泽供电公司 | Abnormal data processing method and system for operation index data of power distribution network |
CN112686104A (en) * | 2020-12-19 | 2021-04-20 | 北京工业大学 | Deep learning-based multi-vocal music score identification method |
US20210264938A1 (en) * | 2018-06-05 | 2021-08-26 | Anker Innovations Technology Co. Ltd. | Deep learning based method and system for processing sound quality characteristics |
CN113658607A (en) * | 2021-07-23 | 2021-11-16 | 南京理工大学 | Environmental sound classification method based on data enhancement and convolution cyclic neural network |
CN114372513A (en) * | 2021-12-20 | 2022-04-19 | 广州大学 | Training method, classification method, equipment and medium of bird sound recognition model |
CN114693043A (en) * | 2020-12-31 | 2022-07-01 | 奥动新能源汽车科技有限公司 | Method, system, electronic device, and medium for evaluating health condition of vehicle battery |
CN114882906A (en) * | 2022-06-30 | 2022-08-09 | 广州伏羲智能科技有限公司 | Novel environmental noise identification method and system |
CN115081473A (en) * | 2022-05-31 | 2022-09-20 | 同济大学 | Multi-feature fusion brake noise classification and identification method |
CN115496285A (en) * | 2022-09-26 | 2022-12-20 | 上海玫克生储能科技有限公司 | Power load prediction method and device and electronic equipment |
-
2022
- 2022-12-29 CN CN202211704375.7A patent/CN115662464B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106251049A (en) * | 2016-07-25 | 2016-12-21 | 国网浙江省电力公司宁波供电公司 | A kind of electricity charge risk model construction method of big data |
US20180260656A1 (en) * | 2017-03-08 | 2018-09-13 | Hitachi, Ltd. | Abnormal waveform sensing system, abnormal waveform sensing method, and waveform analysis device |
US20210264938A1 (en) * | 2018-06-05 | 2021-08-26 | Anker Innovations Technology Co. Ltd. | Deep learning based method and system for processing sound quality characteristics |
CN109767785A (en) * | 2019-03-06 | 2019-05-17 | 河北工业大学 | Ambient noise method for identifying and classifying based on convolutional neural networks |
CN110457550A (en) * | 2019-07-05 | 2019-11-15 | 中国地质大学(武汉) | The bearing calibration of misoperation data in a kind of sintering process |
CN110458170A (en) * | 2019-08-06 | 2019-11-15 | 汕头大学 | Chinese character positioning and recognition methods in a kind of very noisy complex background image |
CN112101765A (en) * | 2020-09-08 | 2020-12-18 | 国网山东省电力公司菏泽供电公司 | Abnormal data processing method and system for operation index data of power distribution network |
CN112043271A (en) * | 2020-09-21 | 2020-12-08 | 北京华睿博视医学影像技术有限公司 | Electrical impedance measurement data correction method and device |
CN112686104A (en) * | 2020-12-19 | 2021-04-20 | 北京工业大学 | Deep learning-based multi-vocal music score identification method |
CN114693043A (en) * | 2020-12-31 | 2022-07-01 | 奥动新能源汽车科技有限公司 | Method, system, electronic device, and medium for evaluating health condition of vehicle battery |
CN113658607A (en) * | 2021-07-23 | 2021-11-16 | 南京理工大学 | Environmental sound classification method based on data enhancement and convolution cyclic neural network |
CN114372513A (en) * | 2021-12-20 | 2022-04-19 | 广州大学 | Training method, classification method, equipment and medium of bird sound recognition model |
CN115081473A (en) * | 2022-05-31 | 2022-09-20 | 同济大学 | Multi-feature fusion brake noise classification and identification method |
CN114882906A (en) * | 2022-06-30 | 2022-08-09 | 广州伏羲智能科技有限公司 | Novel environmental noise identification method and system |
CN115496285A (en) * | 2022-09-26 | 2022-12-20 | 上海玫克生储能科技有限公司 | Power load prediction method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN115662464B (en) | 2023-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105976809B (en) | Identification method and system based on speech and facial expression bimodal emotion fusion | |
CN108281146B (en) | Short voice speaker identification method and device | |
CN109036382B (en) | Audio feature extraction method based on KL divergence | |
CN108231067A (en) | Sound scenery recognition methods based on convolutional neural networks and random forest classification | |
CN111429935B (en) | Voice caller separation method and device | |
CN110600054B (en) | Sound scene classification method based on network model fusion | |
CN108922541A (en) | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model | |
CN103646649A (en) | High-efficiency voice detecting method | |
CN110120230B (en) | Acoustic event detection method and device | |
CN103985381A (en) | Voice frequency indexing method based on parameter fusion optimized decision | |
CN110992985A (en) | Identification model determining method, identification method and identification system for identifying abnormal sounds of treadmill | |
CN111081223B (en) | Voice recognition method, device, equipment and storage medium | |
CN109961794A (en) | A kind of layering method for distinguishing speek person of model-based clustering | |
CN113823293B (en) | Speaker recognition method and system based on voice enhancement | |
CN107945793A (en) | Voice activation detection method and device | |
CN110570870A (en) | Text-independent voiceprint recognition method, device and equipment | |
CN112397074A (en) | Voiceprint recognition method based on MFCC (Mel frequency cepstrum coefficient) and vector element learning | |
CN115101076B (en) | Speaker clustering method based on multi-scale channel separation convolution feature extraction | |
CN105916090A (en) | Hearing aid system based on intelligent speech recognition technology | |
CN111489763A (en) | Adaptive method for speaker recognition in complex environment based on GMM model | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
CN110415707B (en) | Speaker recognition method based on voice feature fusion and GMM | |
CN112420056A (en) | Speaker identity authentication method and system based on variational self-encoder and unmanned aerial vehicle | |
CN115662464B (en) | Method and system for intelligently identifying environmental noise | |
CN114038469B (en) | Speaker identification method based on multi-class spectrogram characteristic attention fusion network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |