CN115662464A

CN115662464A - Method and system for intelligently identifying environmental noise

Info

Publication number: CN115662464A
Application number: CN202211704375.7A
Authority: CN
Inventors: 李毓勤; 李余琨; 周当; 李晓斌; 何玉龙
Original assignee: Guangzhou Skyland Information Technology Co ltd
Current assignee: Guangzhou Skyland Information Technology Co ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-01-31
Anticipated expiration: 2042-12-29
Also published as: CN115662464B

Abstract

The invention discloses a method and a system for intelligently identifying environmental noise, which extract frequency domain characteristics of a noise voiceprint from environmental noise characteristic data through an environmental noise voiceprint frequency domain characteristic identification channel in a noise identification neural network model, extract relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint position characteristic identification channel, fuse the extracted frequency domain characteristics of the noise voiceprint and the relative position characteristic characteristics through a characteristic fusion channel to obtain fusion characteristics with the frequency domain characteristics of the noise voiceprint and the relative position characteristic characteristics, classify and determine the environmental noise type through a classification identifier.

Description

Method and system for intelligently identifying environmental noise

Technical Field

The invention relates to the technical field of environmental noise processing, in particular to a method and a system for intelligently identifying environmental noise.

Background

Environmental noise refers to sound generated in industrial production, building construction, transportation and social life and interfering with the surrounding living environment, noise prevention and control needs to perform classified prevention and control of noise pollution, namely, the types of the environmental noise need to be distinguished, supervision and law enforcement are developed, a traditional noise detection system is limited to obtain original audio recording and measured instant sound intensity or average sound intensity within a period of time to judge whether the noise exceeds the standard, the method is only relied on, the types of the environmental noise cannot be determined, landing which is not beneficial to noise supervision is difficult, and the law enforcement is more difficult and serious, and the prior art can classify and identify the environmental noise through a neural network, for example, the Chinese patent application number is 2019166344.2, the name is an environmental noise identification and classification method based on a convolutional neural network, and the following scheme is adopted: step 1, extracting natural environment noise, editing the natural environment noise into a noise segment with the duration of 300ms-30s and the frequency of 44.1 kHz; step 2, carrying out short-time Fourier transform on the noise fragment, and converting the one-dimensional time domain signal into a two-dimensional frequency domain signal to obtain a spectrogram; step 3, extracting a Mel frequency spectrum cepstrum coefficient (MFCC) of the signal; step 4, taking 80% of all noise segments as a training set, and taking the rest 20% as a test set; step 5, carrying out noise classification by using a convolutional neural network model; and 6, training a classification model by using the training set, verifying the accuracy of the model by using the test set, and completing the environmental noise identification classification based on the convolutional neural network, but only extracting the characteristics on an environmental noise frequency domain to classify the environmental noise in the prior art, so that the classification identification accuracy is not high.

Disclosure of Invention

The invention aims to provide a method and a system for intelligently identifying environmental noise so as to improve the accuracy of environmental noise classification and identification.

In order to solve the technical problem, the invention adopts the following technical scheme:

in one aspect, a method for intelligently identifying ambient noise includes the following steps:

collecting an environmental noise signal;

extracting environmental noise characteristic data from the acquired environmental noise signals;

training a noise recognition neural network model;

inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;

an environmental noise voiceprint frequency domain feature identification channel in the noise identification neural network model extracts frequency domain features of noise voiceprints from environmental noise feature data, an environmental noise voiceprint position feature identification channel extracts relative position feature features of the noise voiceprints from the environmental noise feature data, a feature fusion channel fuses the extracted frequency domain features of the noise voiceprints and the relative position feature features to obtain fusion features with the noise voiceprint frequency domain features and the relative position feature features, and a classification identifier classifies the fusion features to determine the environmental noise type.

Wherein, extracting the environmental noise characteristic data from the collected noise signal specifically comprises:

carrying out data preprocessing on the acquired noise audio signal, and carrying out short-time Fourier transform on the signal subjected to data preprocessing;

calculating the energy spectrum of each frame of signal after short-time Fourier transform;

applying a Mel filter bank on the energy spectrum, and extracting the characteristics of the filter bank;

and drawing the filter bank characteristics into a noise voiceprint image as characteristic data.

The training of the noise recognition neural network model specifically comprises the following steps:

preprocessing the acquired noise fragment and extracting characteristic data to obtain characteristic data;

carrying out data enhancement processing on the characteristic data to obtain an N times training data set;

building a noise recognition neural network model;

inputting a training data set into a built noise recognition neural network model for training;

and after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met.

Preferably, random clipping, sound speed regulation, tone regulation and sound fusion are adopted to perform data enhancement on the feature data to obtain an N times training data set.

The environment noise voiceprint frequency domain feature identification channel can adopt a convolutional neural network, the environment noise voiceprint position feature identification channel can adopt a long-time memory neural network, and the classification identifier can adopt an exponential normalization classifier.

In another aspect, a system for intelligently identifying ambient noise, comprising:

the acquisition processing module is used for acquiring an environmental noise signal;

the environmental noise characteristic data extraction processing module is used for extracting environmental noise characteristic data from the acquired environmental noise signals;

the training processing module is used for training a noise recognition neural network model;

the input processing module is used for inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, and the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;

the classification identification module is used for extracting the frequency domain characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint frequency domain characteristic identification channel in the noise identification neural network model, extracting the relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data through an environmental noise voiceprint position characteristic identification channel, fusing the extracted frequency domain characteristics and the relative position characteristic characteristics of the noise voiceprint through a characteristic fusion channel to obtain fusion characteristics with the noise voiceprint frequency domain characteristics and the relative position characteristic characteristics, and classifying the fusion characteristics through a classification identifier to determine the type of the environmental noise.

The environmental noise characteristic data extraction processing module adopts the following method to extract:

carrying out data preprocessing on the acquired noise audio signals, and carrying out short-time Fourier transform on the signals subjected to data preprocessing;

Wherein, the training processing module can adopt the following modes to train:

the collected noise fragments are subjected to data preprocessing and characteristic data extraction to obtain characteristic data;

building a noise recognition neural network model;

Preferably, the training processing module performs data enhancement on the feature data by adopting random clipping, sound speed regulation, tone regulation and sound fusion to obtain an N times training data set.

Compared with the prior art, the invention has the following beneficial effects:

in the method and the system, environmental noise characteristic data are extracted from the acquired environmental noise signals; training a noise recognition neural network model; inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer; the method comprises the steps that an environmental noise voiceprint frequency domain characteristic identification channel in a noise identification neural network model extracts frequency domain characteristics of a noise voiceprint from environmental noise characteristic data, an environmental noise voiceprint position characteristic identification channel extracts relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data, a characteristic fusion channel fuses the extracted frequency domain characteristics of the noise voiceprint and the extracted relative position characteristic characteristics to obtain fusion characteristics with the noise voiceprint frequency domain characteristics and the relative position characteristic characteristics, a classification identifier classifies the fusion characteristics to determine the environmental noise type, the frequency domain characteristics of the noise voiceprint are extracted from the environmental noise characteristic data through the environmental noise voiceprint frequency domain characteristic identification channel, the environmental noise voiceprint position characteristic identification channel extracts the relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data, the frequency domain characteristics of the noise voiceprint and the relative position characteristic characteristics of the noise voiceprint are fused, and classification is carried out according to the fusion characteristics, and compared with the prior art, the accuracy of environmental noise classification is higher through the frequency domain characteristics alone.

Drawings

FIG. 1 is a flow chart of one embodiment of a method for intelligently identifying ambient noise according to the present invention;

FIG. 2 is a schematic structural diagram of a noise-identifying neural network model trained in the method for intelligently identifying environmental noise according to an embodiment of the present invention;

FIG. 3 is a block diagram of an embodiment of a system for intelligently identifying ambient noise.

Detailed Description

Referring to fig. 1, the figure is a flowchart of an embodiment of the method for intelligently identifying environmental noise of the present invention, and mainly includes the following steps:

s101, collecting an environmental noise signal, wherein during specific implementation, an audio signal of the environmental noise can be collected through a noise collection device, the environmental noise mainly comprises traffic noise, industrial noise, building construction noise and social life noise, the traffic noise is noise generated when vehicles such as motor vehicles, airplanes, trains and ships run, and the industrial noise mainly refers to noise generated in industrial production labor. The building noise mainly comes from machines and high-speed running equipment, the building noise mainly refers to noise generated on a building construction site, the social life noise mainly refers to noise generated by people in various social activities such as commercial transactions, sports competitions, tourist conferences and entertainment places, and noise of various household appliances such as radio recorders, televisions and washing machines, and is not repeated herein;

s102, extracting the environmental noise feature data from the acquired environmental noise signal, for example, in the following manner:

drawing the filter bank characteristics into a noise voiceprint image as characteristic data;

it should be noted that the data preprocessing mainly makes the data meet the requirement of short-time fourier transform, mainly adopts pre-emphasis, framing and windowing, and may also adopt other manners, which are not specifically limited herein;

in addition, the data volume of the audio is generally large, and such feature data will greatly increase the amount of calculation in the subsequent neural network feature classification; in the step, the sound is converted to the Mel domain through the Mel filter bank to be expressed, so that the Mel frequency spectrum is obtained, the Mel frequency spectrum is more consistent with the auditory characteristics of human ears, the data volume is controlled, and the voiceprint characteristics on the frequency domain are more obvious and abundant;

s103, training the noise recognition neural network model, for example, the following method may be used for training when the noise recognition neural network model is specifically implemented:

building a noise recognition neural network model;

after training, estimating the performance index of the training result, readjusting the parameters in the optimized noise recognition neural network model for training, and ending the training until the expected result is met;

as shown in fig. 2, the above-mentioned built noise recognition neural network model architecture mainly includes an environmental noise voiceprint frequency domain feature recognition channel, an environmental noise voiceprint position feature recognition channel, a feature fusion channel and a classification recognizer, as a preferred embodiment, the environmental noise voiceprint frequency domain feature recognition channel may adopt a convolutional neural network, the environmental noise voiceprint position feature recognition channel may adopt a long-time and short-time memory neural network, and the classification recognizer may adopt an exponential normalization classifier, which is not specifically limited herein;

it should be noted that, because the amount of data collected is limited, the learning of the neural network requires as many samples as possible; some speech signal processing can be performed on the original data, such as: random cutting, sound speed regulation, tone regulation and other operations, namely performing data enhancement on characteristic data by adopting random cutting, sound speed regulation, tone regulation, sound fusion and the like to obtain an N-time training data set, screening out a batch of relatively pure target environmental noises, filtering the target environmental noises to obtain purer target environmental noises, performing weighted fusion on the purer target environmental noises and natural noises (wind, bird, and the like), and performing speed regulation and other operations to obtain environmental noise data with higher quality, so that the N-time training data set is obtained, and the model training result is remarkably improved;

in addition, after the model training is finished, objective performance evaluation needs to be carried out on the training result, wherein the three performance indexes include but are not limited to accuracy, F1 score and recall rate, and if the performance indexes do not meet the requirements, whether the data set has unknown problems is checked; and readjusting corresponding parameters in the noise recognition neural network model, such as adjusting the number of network layers, the depth of convolution pooling in the convolutional neural network, the number of neurons in the long-time and short-time memory neural network, and the like.

S104, inputting the extracted environmental noise characteristic data into a trained noise recognition neural network model, wherein the noise recognition neural network model comprises an environmental noise voiceprint frequency domain characteristic recognition channel, an environmental noise voiceprint position characteristic recognition channel, a characteristic fusion channel and a classification recognizer;

s105, extracting frequency domain characteristics of a noise voiceprint from environmental noise characteristic data by an environmental noise voiceprint frequency domain characteristic identification channel in the noise identification neural network model, extracting relative position characteristic characteristics of the noise voiceprint from the environmental noise characteristic data by an environmental noise voiceprint position characteristic identification channel, wherein during specific implementation, the extracted relative position characteristic characteristics can be characteristics in various forms, for example, the relative position characteristic characteristics can be time sequence characteristics of corresponding noise voiceprints or other relative position characteristic characteristics, and are not specifically limited;

in addition, in step S105, the feature fusion channel fuses the frequency domain features and the relative position feature features of the extracted noise voiceprint to obtain fusion features having the frequency domain features and the relative position feature features of the noise voiceprint, and then classifies the fusion features by the classification identifier to determine the environmental noise type;

it should be noted that, in the prior art, in the specific implementation, a feature weighted fusion manner may be used for feature fusion to fuse the frequency domain feature and the relative position feature of the extracted noise voiceprint, that is, the frequency domain feature and the relative position feature of the extracted noise voiceprint are weighted and fused by different weights, but if the extracted feature is abnormal, when the weighted fusion manner is used for fusion, if the weight of the type of feature is too large, the accuracy of subsequent noise identification is greatly reduced for the abnormal feature portion, and in order to solve the technical problem that the accuracy of noise identification is reduced, as a preferred embodiment of the present invention, the feature fusion path in this embodiment performs feature fusion by using the following manner, that is:

firstly, combining the extracted frequency domain characteristics and relative position characteristic characteristics of the noise voiceprint into an alternative characteristic set;

then, the alternative feature set is divided into a normal feature vector set and an abnormal feature vector set, wherein the normal feature vector set refers to a feature vector set in which all features are normal, and the abnormal feature vector set refers to a feature vector set in which some features are abnormal, and when the determination of whether the features are abnormal is specifically implemented, the determination in this embodiment can be performed according to the loss of feature data or the fact that the feature data is not within a predetermined range, and no specific limitation is made here;

thirdly, classifying the normal feature vector set to obtain each normal feature subset, wherein the features in each normal feature subset are similar, and in the specific implementation, the classification can adopt various existing classification algorithms, for example, a mean shift classification algorithm or other classification algorithms, which is not specifically limited herein;

thirdly, correcting the abnormal feature data of the abnormal feature vector set, wherein in the specific implementation, for example, the dissimilarity degree between the abnormal feature and each feature in the abnormal feature vector set can be calculated, and the feature data value with the minimum dissimilarity degree with the abnormal feature is selected to correct the abnormal feature data;

thirdly, calculating the distance between the features in the corrected abnormal feature vector set and each normal feature subset, and adding the features into the normal feature subset closest to the features, during specific implementation, selecting one feature in the abnormal feature vector set to calculate the distance between the features and each normal feature subset, wherein the distance may be mahalanobis distance or other distances capable of determining similarity, without specific limitation, determining the normal feature subset closest to the features according to the calculation result, then adding the feature into the normal feature subset closest to the distance, continuing to select other features in the abnormal feature vector set to calculate and determine the normal feature subset closest to the features according to the above method, and adding the feature into the normal feature subset closest to the features until all the features in the abnormal feature vector set are added into the corresponding normal feature subset;

and finally, performing feature cascade on each determined normal feature subset to obtain a final fusion feature, during concrete implementation, adding the features of the abnormal feature vector set into each normal feature subset, respectively normalizing each finally determined normal feature subset, and then splicing each normalized normal feature subset according to the direction of the dimension to obtain the final fusion feature.

It should be noted that, in the above embodiment, the frequency domain features and the relative position feature of the extracted noise voiceprint are merged into the candidate feature set, the features in the candidate feature set are divided into the normal feature vector set and the abnormal feature vector set, the normal feature vector set is classified to obtain each normal feature subset, the abnormal features in the abnormal feature vector set are corrected, the features of the corrected abnormal feature vector set are added to the normal feature subset closest to the features, the features of the abnormal feature vector set are added to each normal feature subset, and then the finally determined normal feature subsets are subjected to feature cascade to obtain the final fusion features, so that the technical problem that the subsequent ambient noise identification accuracy is reduced due to the extracted features being abnormal can be avoided, and the accuracy of ambient noise identification is finally improved.

The noise identification neural network model in the embodiment can simultaneously extract the voiceprint frequency domain characteristics and the relative position characteristic information of the environmental noise, then the two characteristics are fused, the two characteristics complement each other to form a fusion characteristic, classification identification is carried out according to the fusion characteristic, the accuracy of the classification identification of the environmental noise can be greatly improved, and even if the extracted characteristics are abnormal, the noise identification neural network model can also accurately identify the environmental noise.

Referring to fig. 3, which is a block diagram of an embodiment of the system for intelligently recognizing environmental noise of the present invention, the system for intelligently recognizing environmental noise of the present embodiment mainly includes: an acquisition processing module 101, an environmental noise characteristic data extraction processing module 102, a training processing module 103, an input processing module 104 and a classification recognition module 105, wherein

The acquisition processing module 101, in this embodiment, the acquisition processing module 101, is mainly configured to acquire an environmental noise signal, and in specific implementation, the audio signal of the environmental noise may be acquired by a noise acquisition device, where the environmental noise mainly includes traffic noise, industrial noise, building construction noise, and social life noise, which are not described herein again;

an environmental noise characteristic data extraction processing module 102, in this embodiment, the environmental noise characteristic data extraction processing module 102 is mainly used for extracting environmental noise characteristic data from an acquired environmental noise signal; in a specific implementation, the ambient noise feature data extraction processing module 102 may extract the ambient noise feature data by:

As mentioned above, the data preprocessing mainly makes the data meet the requirement of short-time fourier transform, mainly adopts pre-emphasis, framing and windowing, and can also adopt other modes, which are not specifically limited herein;

a training processing module 103, in this embodiment, the training processing module 103 is mainly used for training a noise recognition neural network model; in specific implementation, the training processing module 103 may perform training in the following manner:

building a noise recognition neural network model;

an input processing module 104, in this embodiment, the input processing module 104 is mainly configured to input the extracted environmental noise feature data into a trained noise recognition neural network model, where the noise recognition neural network model includes an environmental noise voiceprint frequency domain feature recognition channel, an environmental noise voiceprint position feature recognition channel, a weighting fusion channel, and a classification recognizer;

a classification identification module 105, in this embodiment, the classification identification module 105 is mainly configured to extract frequency domain features of a noise voiceprint from environmental noise feature data through an environmental noise voiceprint frequency domain feature identification channel in a noise identification neural network model, extract relative position feature features of the noise voiceprint from the environmental noise feature data through an environmental noise voiceprint position feature identification channel, fuse the extracted frequency domain features of the noise voiceprint and the extracted relative position feature features to obtain a fusion feature having the noise voiceprint frequency domain features and the relative position feature features, and classify and determine an environmental noise type through a classification identifier.

It should be noted that if an extracted feature is abnormal, when a weighted fusion method is used to perform feature fusion, if the weight of the feature of the type is too large, the accuracy of subsequent noise identification will be greatly reduced, and as a preferred embodiment of the present invention, the feature fusion channel of the classification recognition module 105 in this embodiment may perform feature fusion in the following manner, that is:

then, dividing the alternative feature set into a normal feature vector set and an abnormal feature vector set, wherein the normal feature vector set refers to a feature vector set in which all features are normal, and the abnormal feature vector set refers to a feature vector set in which some features are abnormal, and in the specific implementation, the judgment on whether the features are abnormal or not in the embodiment can be performed according to the loss of feature data or the fact that the feature data are not in a predetermined range, and no specific limitation is made herein;

and finally, performing feature cascade on each determined normal feature subset to obtain final fusion features, during specific implementation, adding the features of the abnormal feature vector set into each normal feature subset, respectively normalizing each finally determined normal feature subset, and then splicing each normalized normal feature subset according to the direction of the dimension to obtain the final fusion features.

The noise identification neural network model in the embodiment can simultaneously extract the voiceprint frequency domain characteristics and the relative position characteristic information of the environmental noise, then the two characteristics are fused, the two characteristics supplement each other to form a fusion characteristic, classification and identification are carried out according to the fusion characteristic, the accuracy of classification and identification of the environmental noise can be greatly improved, and the noise identification neural network model can also accurately identify the environmental noise even if the extracted characteristics are abnormal.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. A method for intelligently identifying environmental noise is characterized by comprising the following steps:

collecting an environmental noise signal;

training a noise recognition neural network model;

an environmental noise voiceprint frequency domain feature identification channel in the noise identification neural network model extracts frequency domain features of a noise voiceprint from environmental noise feature data, an environmental noise voiceprint position feature identification channel extracts relative position feature features of the noise voiceprint from the environmental noise feature data, a feature fusion channel fuses the extracted frequency domain features and the relative position feature features of the noise voiceprint to obtain fusion features with the noise voiceprint frequency domain features and the relative position feature features, and a classification identifier classifies the fusion features to determine the type of the environmental noise.

2. The method according to claim 1, wherein extracting the ambient noise characteristic data from the acquired noise signal comprises in particular:

3. The method of claim 1, wherein training the noise-discriminating neural network model specifically comprises:

building a noise recognition neural network model;

4. The method of claim 3, wherein the feature data is data enhanced by random cropping, voice pacing, pitch modification, and voice fusion to obtain N times of training data set.

5. The method according to claim 1, wherein the frequency domain feature recognition channel of the environmental noise voiceprint adopts a convolutional neural network, the position feature recognition channel of the environmental noise voiceprint adopts a long-time memory neural network, and the classification recognizer adopts an exponential normalization classifier.

6. A system for intelligently identifying ambient noise, comprising:

7. The system of claim 6, wherein the ambient noise feature data extraction processing module extracts by:

8. The system of claim 6, wherein the training processing module trains in the following manner:

building a noise recognition neural network model;

9. The system of claim 8, wherein the training processing module performs data enhancement on the feature data by using random cropping, voice speed adjustment, tone adjustment, and voice fusion to obtain N times of training data set.

10. The system according to claim 6, wherein the frequency domain feature recognition channel of the environmental noise voiceprint adopts a convolutional neural network, the position feature recognition channel of the environmental noise voiceprint adopts a long-time memory neural network, and the classification recognizer adopts an exponential normalization classifier.