CN115662464A - Method and system for intelligently identifying environmental noise - Google Patents

Method and system for intelligently identifying environmental noise Download PDF

Info

Publication number
CN115662464A
CN115662464A CN202211704375.7A CN202211704375A CN115662464A CN 115662464 A CN115662464 A CN 115662464A CN 202211704375 A CN202211704375 A CN 202211704375A CN 115662464 A CN115662464 A CN 115662464A
Authority
CN
China
Prior art keywords
noise
voiceprint
environmental noise
characteristic
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211704375.7A
Other languages
Chinese (zh)
Other versions
CN115662464B (en
Inventor
李毓勤
李余琨
周当
李晓斌
何玉龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Skyland Information Technology Co ltd
Original Assignee
Guangzhou Skyland Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Skyland Information Technology Co ltd filed Critical Guangzhou Skyland Information Technology Co ltd
Priority to CN202211704375.7A priority Critical patent/CN115662464B/en
Publication of CN115662464A publication Critical patent/CN115662464A/en
Application granted granted Critical
Publication of CN115662464B publication Critical patent/CN115662464B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a method and a system for intelligently identifying environmental noise, which extract frequency domain characteristics of a noise voiceprint from environmental noise characteristic data through an environmental noise voiceprint frequency domain characteristic identification channel in a noise identification neural network model, extract relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint position characteristic identification channel, fuse the extracted frequency domain characteristics of the noise voiceprint and the relative position characteristic characteristics through a characteristic fusion channel to obtain fusion characteristics with the frequency domain characteristics of the noise voiceprint and the relative position characteristic characteristics, classify and determine the environmental noise type through a classification identifier.

Description

Method and system for intelligently identifying environmental noise
Technical Field
The invention relates to the technical field of environmental noise processing, in particular to a method and a system for intelligently identifying environmental noise.
Background
Environmental noise refers to sound generated in industrial production, building construction, transportation and social life and interfering with the surrounding living environment, noise prevention and control needs to perform classified prevention and control of noise pollution, namely, the types of the environmental noise need to be distinguished, supervision and law enforcement are developed, a traditional noise detection system is limited to obtain original audio recording and measured instant sound intensity or average sound intensity within a period of time to judge whether the noise exceeds the standard, the method is only relied on, the types of the environmental noise cannot be determined, landing which is not beneficial to noise supervision is difficult, and the law enforcement is more difficult and serious, and the prior art can classify and identify the environmental noise through a neural network, for example, the Chinese patent application number is 2019166344.2, the name is an environmental noise identification and classification method based on a convolutional neural network, and the following scheme is adopted: step 1, extracting natural environment noise, editing the natural environment noise into a noise segment with the duration of 300ms-30s and the frequency of 44.1 kHz; step 2, carrying out short-time Fourier transform on the noise fragment, and converting the one-dimensional time domain signal into a two-dimensional frequency domain signal to obtain a spectrogram; step 3, extracting a Mel frequency spectrum cepstrum coefficient (MFCC) of the signal; step 4, taking 80% of all noise segments as a training set, and taking the rest 20% as a test set; step 5, carrying out noise classification by using a convolutional neural network model; and 6, training a classification model by using the training set, verifying the accuracy of the model by using the test set, and completing the environmental noise identification classification based on the convolutional neural network, but only extracting the characteristics on an environmental noise frequency domain to classify the environmental noise in the prior art, so that the classification identification accuracy is not high.
Disclosure of Invention
The invention aims to provide a method and a system for intelligently identifying environmental noise so as to improve the accuracy of environmental noise classification and identification.
In order to solve the technical problem, the invention adopts the following technical scheme:
in one aspect, a method for intelligently identifying ambient noise includes the following steps:
collecting an environmental noise signal;
extracting environmental noise characteristic data from the acquired environmental noise signals;
training a noise recognition neural network model;
inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;
an environmental noise voiceprint frequency domain feature identification channel in the noise identification neural network model extracts frequency domain features of noise voiceprints from environmental noise feature data, an environmental noise voiceprint position feature identification channel extracts relative position feature features of the noise voiceprints from the environmental noise feature data, a feature fusion channel fuses the extracted frequency domain features of the noise voiceprints and the relative position feature features to obtain fusion features with the noise voiceprint frequency domain features and the relative position feature features, and a classification identifier classifies the fusion features to determine the environmental noise type.
Wherein, extracting the environmental noise characteristic data from the collected noise signal specifically comprises:
carrying out data preprocessing on the acquired noise audio signal, and carrying out short-time Fourier transform on the signal subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.
The training of the noise recognition neural network model specifically comprises the following steps:
preprocessing the acquired noise fragment and extracting characteristic data to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.
Preferably, random clipping, sound speed regulation, tone regulation and sound fusion are adopted to perform data enhancement on the feature data to obtain an N times training data set.
The environment noise voiceprint frequency domain feature identification channel can adopt a convolutional neural network, the environment noise voiceprint position feature identification channel can adopt a long-time memory neural network, and the classification identifier can adopt an exponential normalization classifier.
In another aspect, a system for intelligently identifying ambient noise, comprising:
the acquisition processing module is used for acquiring an environmental noise signal;
the environmental noise characteristic data extraction processing module is used for extracting environmental noise characteristic data from the acquired environmental noise signals;
the training processing module is used for training a noise recognition neural network model;
the input processing module is used for inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, and the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;
the classification identification module is used for extracting the frequency domain characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint frequency domain characteristic identification channel in the noise identification neural network model, extracting the relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint position characteristic identification channel, fusing the extracted frequency domain characteristics and the relative position characteristic characteristics of the noise voiceprint through a characteristic fusion channel to obtain fusion characteristics with the noise voiceprint frequency domain characteristics and the relative position characteristic characteristics, and classifying the fusion characteristics through a classification identifier to determine the type of the environmental noise.
The environmental noise characteristic data extraction processing module adopts the following method to extract:
carrying out data preprocessing on the acquired noise audio signals, and carrying out short-time Fourier transform on the signals subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.
Wherein, the training processing module can adopt the following modes to train:
the collected noise fragments are subjected to data preprocessing and characteristic data extraction to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.
Preferably, the training processing module performs data enhancement on the feature data by adopting random clipping, sound speed regulation, tone regulation and sound fusion to obtain an N times training data set.
The environment noise voiceprint frequency domain feature identification channel can adopt a convolutional neural network, the environment noise voiceprint position feature identification channel can adopt a long-time memory neural network, and the classification identifier can adopt an exponential normalization classifier.
Compared with the prior art, the invention has the following beneficial effects:
in the method and the system, environmental noise characteristic data are extracted from the acquired environmental noise signals; training a noise recognition neural network model; inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer; the method comprises the steps that an environmental noise voiceprint frequency domain characteristic identification channel in a noise identification neural network model extracts frequency domain characteristics of a noise voiceprint from environmental noise characteristic data, an environmental noise voiceprint position characteristic identification channel extracts relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data, a characteristic fusion channel fuses the extracted frequency domain characteristics of the noise voiceprint and the extracted relative position characteristic characteristics to obtain fusion characteristics with the noise voiceprint frequency domain characteristics and the relative position characteristic characteristics, a classification identifier classifies the fusion characteristics to determine the environmental noise type, the frequency domain characteristics of the noise voiceprint are extracted from the environmental noise characteristic data through the environmental noise voiceprint frequency domain characteristic identification channel, the environmental noise voiceprint position characteristic identification channel extracts the relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data, the frequency domain characteristics of the noise voiceprint and the relative position characteristic characteristics of the noise voiceprint are fused, and classification is carried out according to the fusion characteristics, and compared with the prior art, the accuracy of environmental noise classification is higher through the frequency domain characteristics alone.
Drawings
FIG. 1 is a flow chart of one embodiment of a method for intelligently identifying ambient noise according to the present invention;
FIG. 2 is a schematic structural diagram of a noise-identifying neural network model trained in the method for intelligently identifying environmental noise according to an embodiment of the present invention;
FIG. 3 is a block diagram of an embodiment of a system for intelligently identifying ambient noise.
Detailed Description
Referring to fig. 1, the figure is a flowchart of an embodiment of the method for intelligently identifying environmental noise of the present invention, and mainly includes the following steps:
s101, collecting an environmental noise signal, wherein during specific implementation, an audio signal of the environmental noise can be collected through a noise collection device, the environmental noise mainly comprises traffic noise, industrial noise, building construction noise and social life noise, the traffic noise is noise generated when vehicles such as motor vehicles, airplanes, trains and ships run, and the industrial noise mainly refers to noise generated in industrial production labor. The building noise mainly comes from machines and high-speed running equipment, the building noise mainly refers to noise generated on a building construction site, the social life noise mainly refers to noise generated by people in various social activities such as commercial transactions, sports competitions, tourist conferences and entertainment places, and noise of various household appliances such as radio recorders, televisions and washing machines, and is not repeated herein;
s102, extracting the environmental noise feature data from the acquired environmental noise signal, for example, in the following manner:
carrying out data preprocessing on the acquired noise audio signal, and carrying out short-time Fourier transform on the signal subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
drawing the filter bank characteristics into a noise voiceprint image as characteristic data;
it should be noted that the data preprocessing mainly makes the data meet the requirement of short-time fourier transform, mainly adopts pre-emphasis, framing and windowing, and may also adopt other manners, which are not specifically limited herein;
in addition, the data volume of the audio is generally large, and such feature data will greatly increase the amount of calculation in the subsequent neural network feature classification; in the step, the sound is converted to the Mel domain through the Mel filter bank to be expressed, so that the Mel frequency spectrum is obtained, the Mel frequency spectrum is more consistent with the auditory characteristics of human ears, the data volume is controlled, and the voiceprint characteristics on the frequency domain are more obvious and abundant;
s103, training the noise recognition neural network model, for example, the following method may be used for training when the noise recognition neural network model is specifically implemented:
the collected noise fragments are subjected to data preprocessing and characteristic data extraction to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met;
as shown in fig. 2, the above-mentioned built noise recognition neural network model architecture mainly includes an environmental noise voiceprint frequency domain feature recognition channel, an environmental noise voiceprint position feature recognition channel, a feature fusion channel and a classification recognizer, as a preferred embodiment, the environmental noise voiceprint frequency domain feature recognition channel may adopt a convolutional neural network, the environmental noise voiceprint position feature recognition channel may adopt a long-time and short-time memory neural network, and the classification recognizer may adopt an exponential normalization classifier, which is not specifically limited herein;
it should be noted that, because the amount of data collected is limited, the learning of the neural network requires as many samples as possible; some speech signal processing can be performed on the original data, such as: random cutting, sound speed regulation, tone regulation and other operations, namely performing data enhancement on characteristic data by adopting random cutting, sound speed regulation, tone regulation, sound fusion and the like to obtain an N-time training data set, screening out a batch of relatively pure target environmental noises, filtering the target environmental noises to obtain purer target environmental noises, performing weighted fusion on the purer target environmental noises and natural noises (wind, bird, and the like), and performing speed regulation and other operations to obtain environmental noise data with higher quality, so that the N-time training data set is obtained, and the model training result is remarkably improved;
in addition, after the model training is finished, objective performance evaluation needs to be carried out on the training result, wherein the three performance indexes include but are not limited to accuracy, F1 score and recall rate, and if the performance indexes do not meet the requirements, whether the data set has unknown problems is checked; and readjusting corresponding parameters in the noise recognition neural network model, such as adjusting the number of network layers, the depth of convolution pooling in the convolutional neural network, the number of neurons in the long-time and short-time memory neural network, and the like.
S104, inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;
s105, extracting frequency domain characteristics of a noise voiceprint from environmental noise characteristic data by an environmental noise voiceprint frequency domain characteristic identification channel in the noise identification neural network model, extracting relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data by an environmental noise voiceprint position characteristic identification channel, wherein during specific implementation, the extracted relative position characteristic characteristics can be characteristics in various forms, for example, the relative position characteristic characteristics can be time sequence characteristics of corresponding noise voiceprints or other relative position characteristic characteristics, and are not specifically limited;
in addition, in step S105, the feature fusion channel fuses the frequency domain features and the relative position feature features of the extracted noise voiceprint to obtain fusion features having the frequency domain features and the relative position feature features of the noise voiceprint, and then classifies the fusion features by the classification identifier to determine the environmental noise type;
it should be noted that, in the prior art, in the specific implementation, a feature weighted fusion manner may be used for feature fusion to fuse the frequency domain feature and the relative position feature of the extracted noise voiceprint, that is, the frequency domain feature and the relative position feature of the extracted noise voiceprint are weighted and fused by different weights, but if the extracted feature is abnormal, when the weighted fusion manner is used for fusion, if the weight of the type of feature is too large, the accuracy of subsequent noise identification is greatly reduced for the abnormal feature portion, and in order to solve the technical problem that the accuracy of noise identification is reduced, as a preferred embodiment of the present invention, the feature fusion path in this embodiment performs feature fusion by using the following manner, that is:
firstly, combining the extracted frequency domain characteristics and relative position characteristic characteristics of the noise voiceprint into an alternative characteristic set;
then, the alternative feature set is divided into a normal feature vector set and an abnormal feature vector set, wherein the normal feature vector set refers to a feature vector set in which all features are normal, and the abnormal feature vector set refers to a feature vector set in which some features are abnormal, and when the determination of whether the features are abnormal is specifically implemented, the determination in this embodiment can be performed according to the loss of feature data or the fact that the feature data is not within a predetermined range, and no specific limitation is made here;
thirdly, classifying the normal feature vector set to obtain each normal feature subset, wherein the features in each normal feature subset are similar, and in the specific implementation, the classification can adopt various existing classification algorithms, for example, a mean shift classification algorithm or other classification algorithms, which is not specifically limited herein;
thirdly, correcting the abnormal feature data of the abnormal feature vector set, wherein in the specific implementation, for example, the dissimilarity degree between the abnormal feature and each feature in the abnormal feature vector set can be calculated, and the feature data value with the minimum dissimilarity degree with the abnormal feature is selected to correct the abnormal feature data;
thirdly, calculating the distance between the features in the corrected abnormal feature vector set and each normal feature subset, and adding the features into the normal feature subset closest to the features, during specific implementation, selecting one feature in the abnormal feature vector set to calculate the distance between the features and each normal feature subset, wherein the distance may be mahalanobis distance or other distances capable of determining similarity, without specific limitation, determining the normal feature subset closest to the features according to the calculation result, then adding the feature into the normal feature subset closest to the distance, continuing to select other features in the abnormal feature vector set to calculate and determine the normal feature subset closest to the features according to the above method, and adding the feature into the normal feature subset closest to the features until all the features in the abnormal feature vector set are added into the corresponding normal feature subset;
and finally, performing feature cascade on each determined normal feature subset to obtain a final fusion feature, during concrete implementation, adding the features of the abnormal feature vector set into each normal feature subset, respectively normalizing each finally determined normal feature subset, and then splicing each normalized normal feature subset according to the direction of the dimension to obtain the final fusion feature.
It should be noted that, in the above embodiment, the frequency domain features and the relative position feature of the extracted noise voiceprint are merged into the candidate feature set, the features in the candidate feature set are divided into the normal feature vector set and the abnormal feature vector set, the normal feature vector set is classified to obtain each normal feature subset, the abnormal features in the abnormal feature vector set are corrected, the features of the corrected abnormal feature vector set are added to the normal feature subset closest to the features, the features of the abnormal feature vector set are added to each normal feature subset, and then the finally determined normal feature subsets are subjected to feature cascade to obtain the final fusion features, so that the technical problem that the subsequent ambient noise identification accuracy is reduced due to the extracted features being abnormal can be avoided, and the accuracy of ambient noise identification is finally improved.
The noise identification neural network model in the embodiment can simultaneously extract the voiceprint frequency domain characteristics and the relative position characteristic information of the environmental noise, then the two characteristics are fused, the two characteristics complement each other to form a fusion characteristic, classification identification is carried out according to the fusion characteristic, the accuracy of the classification identification of the environmental noise can be greatly improved, and even if the extracted characteristics are abnormal, the noise identification neural network model can also accurately identify the environmental noise.
Referring to fig. 3, which is a block diagram of an embodiment of the system for intelligently recognizing environmental noise of the present invention, the system for intelligently recognizing environmental noise of the present embodiment mainly includes: an acquisition processing module 101, an environmental noise characteristic data extraction processing module 102, a training processing module 103, an input processing module 104 and a classification recognition module 105, wherein
The acquisition processing module 101, in this embodiment, the acquisition processing module 101, is mainly configured to acquire an environmental noise signal, and in specific implementation, the audio signal of the environmental noise may be acquired by a noise acquisition device, where the environmental noise mainly includes traffic noise, industrial noise, building construction noise, and social life noise, which are not described herein again;
an environmental noise characteristic data extraction processing module 102, in this embodiment, the environmental noise characteristic data extraction processing module 102 is mainly used for extracting environmental noise characteristic data from an acquired environmental noise signal; in a specific implementation, the ambient noise feature data extraction processing module 102 may extract the ambient noise feature data by:
carrying out data preprocessing on the acquired noise audio signals, and carrying out short-time Fourier transform on the signals subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.
As mentioned above, the data preprocessing mainly makes the data meet the requirement of short-time fourier transform, mainly adopts pre-emphasis, framing and windowing, and can also adopt other modes, which are not specifically limited herein;
in addition, the data volume of the audio is generally large, and such feature data will greatly increase the amount of calculation in the subsequent neural network feature classification; in the step, the sound is converted to the Mel domain through the Mel filter bank to be expressed, so that the Mel frequency spectrum is obtained, the Mel frequency spectrum is more consistent with the auditory characteristics of human ears, the data volume is controlled, and the voiceprint characteristics on the frequency domain are more obvious and abundant;
a training processing module 103, in this embodiment, the training processing module 103 is mainly used for training a noise recognition neural network model; in specific implementation, the training processing module 103 may perform training in the following manner:
preprocessing the acquired noise fragment and extracting characteristic data to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.
As shown in fig. 2, the above-mentioned built noise recognition neural network model architecture mainly includes an environmental noise voiceprint frequency domain feature recognition channel, an environmental noise voiceprint position feature recognition channel, a feature fusion channel and a classification recognizer, as a preferred embodiment, the environmental noise voiceprint frequency domain feature recognition channel may adopt a convolutional neural network, the environmental noise voiceprint position feature recognition channel may adopt a long-time and short-time memory neural network, and the classification recognizer may adopt an exponential normalization classifier, which is not specifically limited herein;
an input processing module 104, in this embodiment, the input processing module 104 is mainly configured to input the extracted environmental noise feature data into a trained noise recognition neural network model, where the noise recognition neural network model includes an environmental noise voiceprint frequency domain feature recognition channel, an environmental noise voiceprint position feature recognition channel, a weighting fusion channel, and a classification recognizer;
a classification identification module 105, in this embodiment, the classification identification module 105 is mainly configured to extract frequency domain features of a noise voiceprint from environmental noise feature data through an environmental noise voiceprint frequency domain feature identification channel in a noise identification neural network model, extract relative position feature features of the noise voiceprint from the environmental noise feature data through an environmental noise voiceprint position feature identification channel, fuse the extracted frequency domain features of the noise voiceprint and the extracted relative position feature features to obtain a fusion feature having the noise voiceprint frequency domain features and the relative position feature features, and classify and determine an environmental noise type through a classification identifier.
It should be noted that if an extracted feature is abnormal, when a weighted fusion method is used to perform feature fusion, if the weight of the feature of the type is too large, the accuracy of subsequent noise identification will be greatly reduced, and as a preferred embodiment of the present invention, the feature fusion channel of the classification recognition module 105 in this embodiment may perform feature fusion in the following manner, that is:
firstly, combining the extracted frequency domain characteristics and relative position characteristic characteristics of the noise voiceprint into an alternative characteristic set;
then, dividing the alternative feature set into a normal feature vector set and an abnormal feature vector set, wherein the normal feature vector set refers to a feature vector set in which all features are normal, and the abnormal feature vector set refers to a feature vector set in which some features are abnormal, and in the specific implementation, the judgment on whether the features are abnormal or not in the embodiment can be performed according to the loss of feature data or the fact that the feature data are not in a predetermined range, and no specific limitation is made herein;
thirdly, classifying the normal feature vector set to obtain each normal feature subset, wherein the features in each normal feature subset are similar, and in the specific implementation, the classification can adopt various existing classification algorithms, for example, a mean shift classification algorithm or other classification algorithms, which is not specifically limited herein;
thirdly, correcting the abnormal feature data of the abnormal feature vector set, wherein in the specific implementation, for example, the dissimilarity degree between the abnormal feature and each feature in the abnormal feature vector set can be calculated, and the feature data value with the minimum dissimilarity degree with the abnormal feature is selected to correct the abnormal feature data;
thirdly, calculating the distance between the features in the corrected abnormal feature vector set and each normal feature subset, and adding the features into the normal feature subset closest to the features, during specific implementation, selecting one feature in the abnormal feature vector set to calculate the distance between the features and each normal feature subset, wherein the distance may be mahalanobis distance or other distances capable of determining similarity, without specific limitation, determining the normal feature subset closest to the features according to the calculation result, then adding the feature into the normal feature subset closest to the distance, continuing to select other features in the abnormal feature vector set to calculate and determine the normal feature subset closest to the features according to the above method, and adding the feature into the normal feature subset closest to the features until all the features in the abnormal feature vector set are added into the corresponding normal feature subset;
and finally, performing feature cascade on each determined normal feature subset to obtain final fusion features, during specific implementation, adding the features of the abnormal feature vector set into each normal feature subset, respectively normalizing each finally determined normal feature subset, and then splicing each normalized normal feature subset according to the direction of the dimension to obtain the final fusion features.
The noise identification neural network model in the embodiment can simultaneously extract the voiceprint frequency domain characteristics and the relative position characteristic information of the environmental noise, then the two characteristics are fused, the two characteristics supplement each other to form a fusion characteristic, classification and identification are carried out according to the fusion characteristic, the accuracy of classification and identification of the environmental noise can be greatly improved, and the noise identification neural network model can also accurately identify the environmental noise even if the extracted characteristics are abnormal.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims (10)

1. A method for intelligently identifying environmental noise is characterized by comprising the following steps:
collecting an environmental noise signal;
extracting environmental noise characteristic data from the acquired environmental noise signals;
training a noise recognition neural network model;
inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;
an environmental noise voiceprint frequency domain feature identification channel in the noise identification neural network model extracts frequency domain features of a noise voiceprint from environmental noise feature data, an environmental noise voiceprint position feature identification channel extracts relative position feature features of the noise voiceprint from the environmental noise feature data, a feature fusion channel fuses the extracted frequency domain features and the relative position feature features of the noise voiceprint to obtain fusion features with the noise voiceprint frequency domain features and the relative position feature features, and a classification identifier classifies the fusion features to determine the type of the environmental noise.
2. The method according to claim 1, wherein extracting the ambient noise characteristic data from the acquired noise signal comprises in particular:
carrying out data preprocessing on the acquired noise audio signals, and carrying out short-time Fourier transform on the signals subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.
3. The method of claim 1, wherein training the noise-discriminating neural network model specifically comprises:
the collected noise fragments are subjected to data preprocessing and characteristic data extraction to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.
4. The method of claim 3, wherein the feature data is data enhanced by random cropping, voice pacing, pitch modification, and voice fusion to obtain N times of training data set.
5. The method according to claim 1, wherein the frequency domain feature recognition channel of the environmental noise voiceprint adopts a convolutional neural network, the position feature recognition channel of the environmental noise voiceprint adopts a long-time memory neural network, and the classification recognizer adopts an exponential normalization classifier.
6. A system for intelligently identifying ambient noise, comprising:
the acquisition processing module is used for acquiring an environmental noise signal;
the environmental noise characteristic data extraction processing module is used for extracting environmental noise characteristic data from the acquired environmental noise signals;
the training processing module is used for training a noise recognition neural network model;
the input processing module is used for inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, and the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;
the classification identification module is used for extracting the frequency domain characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint frequency domain characteristic identification channel in the noise identification neural network model, extracting the relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint position characteristic identification channel, fusing the extracted frequency domain characteristics and the relative position characteristic characteristics of the noise voiceprint through a characteristic fusion channel to obtain fusion characteristics with the noise voiceprint frequency domain characteristics and the relative position characteristic characteristics, and classifying the fusion characteristics through a classification identifier to determine the type of the environmental noise.
7. The system of claim 6, wherein the ambient noise feature data extraction processing module extracts by:
carrying out data preprocessing on the acquired noise audio signals, and carrying out short-time Fourier transform on the signals subjected to data preprocessing;
calculating the energy spectrum of each frame of signal after short-time Fourier transform;
applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;
and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.
8. The system of claim 6, wherein the training processing module trains in the following manner:
preprocessing the acquired noise fragment and extracting characteristic data to obtain characteristic data;
carrying out data enhancement processing on the characteristic data to obtain an N times training data set;
building a noise recognition neural network model;
inputting a training data set into a built noise recognition neural network model for training;
and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.
9. The system of claim 8, wherein the training processing module performs data enhancement on the feature data by using random cropping, voice speed adjustment, tone adjustment, and voice fusion to obtain N times of training data set.
10. The system according to claim 6, wherein the frequency domain feature recognition channel of the environmental noise voiceprint adopts a convolutional neural network, the position feature recognition channel of the environmental noise voiceprint adopts a long-time memory neural network, and the classification recognizer adopts an exponential normalization classifier.
CN202211704375.7A 2022-12-29 2022-12-29 Method and system for intelligently identifying environmental noise Active CN115662464B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211704375.7A CN115662464B (en) 2022-12-29 2022-12-29 Method and system for intelligently identifying environmental noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211704375.7A CN115662464B (en) 2022-12-29 2022-12-29 Method and system for intelligently identifying environmental noise

Publications (2)

Publication Number Publication Date
CN115662464A true CN115662464A (en) 2023-01-31
CN115662464B CN115662464B (en) 2023-06-27

Family

ID=85022962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211704375.7A Active CN115662464B (en) 2022-12-29 2022-12-29 Method and system for intelligently identifying environmental noise

Country Status (1)

Country Link
CN (1) CN115662464B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106251049A (en) * 2016-07-25 2016-12-21 国网浙江省电力公司宁波供电公司 A kind of electricity charge risk model construction method of big data
US20180260656A1 (en) * 2017-03-08 2018-09-13 Hitachi, Ltd. Abnormal waveform sensing system, abnormal waveform sensing method, and waveform analysis device
CN109767785A (en) * 2019-03-06 2019-05-17 河北工业大学 Ambient noise method for identifying and classifying based on convolutional neural networks
CN110457550A (en) * 2019-07-05 2019-11-15 中国地质大学(武汉) The bearing calibration of misoperation data in a kind of sintering process
CN110458170A (en) * 2019-08-06 2019-11-15 汕头大学 Chinese character positioning and recognition methods in a kind of very noisy complex background image
CN112043271A (en) * 2020-09-21 2020-12-08 北京华睿博视医学影像技术有限公司 Electrical impedance measurement data correction method and device
CN112101765A (en) * 2020-09-08 2020-12-18 国网山东省电力公司菏泽供电公司 Abnormal data processing method and system for operation index data of power distribution network
CN112686104A (en) * 2020-12-19 2021-04-20 北京工业大学 Deep learning-based multi-vocal music score identification method
US20210264938A1 (en) * 2018-06-05 2021-08-26 Anker Innovations Technology Co. Ltd. Deep learning based method and system for processing sound quality characteristics
CN113658607A (en) * 2021-07-23 2021-11-16 南京理工大学 Environmental sound classification method based on data enhancement and convolution cyclic neural network
CN114372513A (en) * 2021-12-20 2022-04-19 广州大学 Training method, classification method, equipment and medium of bird sound recognition model
CN114693043A (en) * 2020-12-31 2022-07-01 奥动新能源汽车科技有限公司 Method, system, electronic device, and medium for evaluating health condition of vehicle battery
CN114882906A (en) * 2022-06-30 2022-08-09 广州伏羲智能科技有限公司 Novel environmental noise identification method and system
CN115081473A (en) * 2022-05-31 2022-09-20 同济大学 Multi-feature fusion brake noise classification and identification method
CN115496285A (en) * 2022-09-26 2022-12-20 上海玫克生储能科技有限公司 Power load prediction method and device and electronic equipment

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106251049A (en) * 2016-07-25 2016-12-21 国网浙江省电力公司宁波供电公司 A kind of electricity charge risk model construction method of big data
US20180260656A1 (en) * 2017-03-08 2018-09-13 Hitachi, Ltd. Abnormal waveform sensing system, abnormal waveform sensing method, and waveform analysis device
US20210264938A1 (en) * 2018-06-05 2021-08-26 Anker Innovations Technology Co. Ltd. Deep learning based method and system for processing sound quality characteristics
CN109767785A (en) * 2019-03-06 2019-05-17 河北工业大学 Ambient noise method for identifying and classifying based on convolutional neural networks
CN110457550A (en) * 2019-07-05 2019-11-15 中国地质大学(武汉) The bearing calibration of misoperation data in a kind of sintering process
CN110458170A (en) * 2019-08-06 2019-11-15 汕头大学 Chinese character positioning and recognition methods in a kind of very noisy complex background image
CN112101765A (en) * 2020-09-08 2020-12-18 国网山东省电力公司菏泽供电公司 Abnormal data processing method and system for operation index data of power distribution network
CN112043271A (en) * 2020-09-21 2020-12-08 北京华睿博视医学影像技术有限公司 Electrical impedance measurement data correction method and device
CN112686104A (en) * 2020-12-19 2021-04-20 北京工业大学 Deep learning-based multi-vocal music score identification method
CN114693043A (en) * 2020-12-31 2022-07-01 奥动新能源汽车科技有限公司 Method, system, electronic device, and medium for evaluating health condition of vehicle battery
CN113658607A (en) * 2021-07-23 2021-11-16 南京理工大学 Environmental sound classification method based on data enhancement and convolution cyclic neural network
CN114372513A (en) * 2021-12-20 2022-04-19 广州大学 Training method, classification method, equipment and medium of bird sound recognition model
CN115081473A (en) * 2022-05-31 2022-09-20 同济大学 Multi-feature fusion brake noise classification and identification method
CN114882906A (en) * 2022-06-30 2022-08-09 广州伏羲智能科技有限公司 Novel environmental noise identification method and system
CN115496285A (en) * 2022-09-26 2022-12-20 上海玫克生储能科技有限公司 Power load prediction method and device and electronic equipment

Also Published As

Publication number Publication date
CN115662464B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN105976809B (en) Identification method and system based on speech and facial expression bimodal emotion fusion
CN108281146B (en) Short voice speaker identification method and device
CN109036382B (en) Audio feature extraction method based on KL divergence
CN108231067A (en) Sound scenery recognition methods based on convolutional neural networks and random forest classification
CN111429935B (en) Voice caller separation method and device
CN110600054B (en) Sound scene classification method based on network model fusion
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN103646649A (en) High-efficiency voice detecting method
CN110120230B (en) Acoustic event detection method and device
CN103985381A (en) Voice frequency indexing method based on parameter fusion optimized decision
CN110992985A (en) Identification model determining method, identification method and identification system for identifying abnormal sounds of treadmill
CN111081223B (en) Voice recognition method, device, equipment and storage medium
CN109961794A (en) A kind of layering method for distinguishing speek person of model-based clustering
CN113823293B (en) Speaker recognition method and system based on voice enhancement
CN107945793A (en) Voice activation detection method and device
CN110570870A (en) Text-independent voiceprint recognition method, device and equipment
CN112397074A (en) Voiceprint recognition method based on MFCC (Mel frequency cepstrum coefficient) and vector element learning
CN115101076B (en) Speaker clustering method based on multi-scale channel separation convolution feature extraction
CN105916090A (en) Hearing aid system based on intelligent speech recognition technology
CN111489763A (en) Adaptive method for speaker recognition in complex environment based on GMM model
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
CN110415707B (en) Speaker recognition method based on voice feature fusion and GMM
CN112420056A (en) Speaker identity authentication method and system based on variational self-encoder and unmanned aerial vehicle
CN115662464B (en) Method and system for intelligently identifying environmental noise
CN114038469B (en) Speaker identification method based on multi-class spectrogram characteristic attention fusion network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant