CN110335616A

CN110335616A - Voice data noise-reduction method, device, computer equipment and storage medium

Info

Publication number: CN110335616A
Application number: CN201910650447.6A
Authority: CN
Inventors: 欧阳碧云; 王晶晶
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2019-10-15

Abstract

This application involves a kind of voice data noise-reduction method, device, computer equipment and storage medium based on artificial intelligence, it include: the noise reduction request for receiving terminal and sending, and obtain the incidence relation in the combination of audio data character pair and feature combination to be processed between each feature.According to the incidence relation between each feature and each feature, the discrimination of each feature combination is calculated.The combination of each feature is screened according to preset discrimination threshold value, obtain initial characteristics combination, initial characteristics combination is screened using preset evaluation index, obtain available feature combination, and obtain available feature and combine corresponding audio data to be processed, the first initial audio data based on discrimination is generated, deep learning noise reduction model is based on, noise reduction process is carried out to the first initial audio data, the voice data after generating noise reduction.This method carries out noise reduction to the voice data based on discrimination using deep learning noise reduction model, improves voice data noise reduction effect.

Description

Voice data noise-reduction method, device, computer equipment and storage medium

Technical field

This application involves voice processing technology fields, more particularly to a kind of voice data noise-reduction method, device, computer Equipment and storage medium.

Background technique

With increasingly developed, the generally use of voice data in daily life, for different user of voice processing technology Demand, the voice quality of voice data is required also different, and in various situations used in everyday, there are a variety of noises The interference of data and device signal, voice quality will receive certain influence, can not meet the needs of users, therefore occur Voice de-noising technology.

A kind of currently used voice de-noising method is the signal-to-noise ratio curve by determining voice signal, and is believed according to voice Number signal-to-noise ratio curve determine speech frame and noise frame in voice signal, only to noise frame obtained carry out noise reduction process, But the method used is relatively simple, and to be improved for the accuracy of differentiation and the determination of speech frame and noise frame, such as occurs The problem for determining inaccuracy will affect voice quality, reduce the effect of voice data noise reduction.

Summary of the invention

Based on this, it is necessary in view of the above technical problems, provide a kind of voice data that can be improved voice de-noising effect Noise-reduction method, device, computer equipment and storage medium.

A kind of voice data noise-reduction method, which comprises

It receives the noise reduction to audio data to be processed that terminal is sent to request, and obtains the audio data to be processed；

The corresponding feature combination of the audio data to be processed is obtained, and obtains the feature and combines between interior each feature Incidence relation combines the incidence relation between corresponding feature and each feature according to the feature, calculates each feature combination Discrimination；

Each feature combination is screened according to the discrimination threshold value, obtains initial characteristics combination；

Initial characteristics combination is screened using preset evaluation index, acquisition meets the preset evaluation index can It is combined with feature；

It obtains the available feature and combines corresponding audio data to be processed, generate the first initial audio based on discrimination Data；

Based on deep learning noise reduction model, noise reduction process is carried out to first initial audio data, after generating noise reduction Voice data.

The mode of the deep learning noise reduction model is obtained in one of the embodiments, comprising:

Obtained from training sample without departing from the feature of the discrimination threshold value combine corresponding effective audio data and its Corresponding second initial audio data；

Slicing treatment is carried out to effective audio data and second initial audio data respectively according to predetermined length；

According to effective audio data after slice, the first voiceprint map of effective audio data is generated, from institute State the first vocal print parameter that effective audio data is extracted in the first voiceprint map；

According to second initial audio data after slice, the second vocal print figure of second initial audio data is generated Spectrum, extracts the second vocal print parameter of second initial audio data from second voiceprint map；

Using the second vocal print parameter of second initial audio data as the input of deep learning model, the moment is corresponded to Output of the first vocal print parameter of effective audio data as deep learning model, is trained deep learning model, obtains Deep learning noise reduction model.

The corresponding feature combination of the audio data to be processed is obtained in one of the embodiments, and is calculated each described The discrimination of feature combination, comprising:

Obtain the incidence relation between the corresponding feature of the audio data to be processed and each feature；

According to the incidence relation between the feature and each feature, feature corresponding with the audio data to be processed is generated Combination；

The incidence relation between corresponding feature and each feature is combined according to each feature, calculates separately each feature combination Discrimination.

It is described in one of the embodiments, that each feature combination is sieved according to the preset discrimination threshold value Choosing obtains initial characteristics combination, comprising:

The discrimination of each feature combination is compared with the discrimination threshold value respectively；

The corresponding feature combination of discrimination beyond the discrimination threshold value is obtained, initial characteristics combination is generated.

It is described in one of the embodiments, that initial characteristics combination is screened using preset evaluation index, it obtains Obtain available feature combination, comprising:

Obtain preset evaluation index；The preset evaluation index includes AUC value, accuracy rate and recall rate；

According to the AUC value, accuracy rate and recall rate, initial characteristics combination is screened；

Satisfactory initial characteristics combination is obtained, available feature combination is generated.

It is described in one of the embodiments, to be based on deep learning noise reduction model, to first initial audio data into Row noise reduction process, the voice data after generating noise reduction, comprising:

Slicing treatment is carried out to first initial audio data according to predetermined length；

According to first initial audio data after slice, the vocal print to be processed of first initial audio data is generated Map extracts the vocal print parameter to be processed of first initial audio data from the voiceprint map to be processed；

The vocal print parameter to be processed is inputted in the deep learning noise reduction model, the voice data after obtaining noise reduction.

In one of the embodiments, before the step of calculating the discrimination of each feature combination, further includes:

According to the corresponding relationship between each feature combination and data type, obtains combined with each feature pair respectively The data type answered；The data type includes numeric type, byte type and text-type；

According to the corresponding relationship between the data type and data processing method, obtain corresponding with the data type Data processing method；The data processing method includes judgement processing, assignment processing and statement processing；

According to each data processing method, the corresponding audio data to be processed of each feature combination is carried out at data respectively Reason.

A kind of voice data denoising device, described device include:

Receiving module, the noise reduction to audio data to be processed for receiving terminal transmission are requested, and are obtained described wait locate Manage audio data；

Discrimination computing module for obtaining the corresponding feature combination of the audio data to be processed, and obtains the spy Sign combines the incidence relation between interior each feature, combines the association between corresponding feature and each feature according to the feature and closes System calculates the discrimination of each feature combination；

Initial characteristics combination obtains module, for being sieved according to preset discrimination threshold value to each feature combination Choosing obtains initial characteristics combination；

Available feature combination obtains module, for being screened using preset evaluation index to initial characteristics combination, Obtain the available feature combination for meeting the preset evaluation index；

Initial audio data generation module combines corresponding audio data to be processed for obtaining the available feature, raw At the first initial audio data based on discrimination；

Noise reduction module carries out noise reduction process to first initial audio data for being based on deep learning noise reduction model, Voice data after generating noise reduction.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device performs the steps of when executing the computer program

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor It is performed the steps of when row

Above-mentioned voice data noise-reduction method, device, computer equipment and storage medium, by utilizing preset discrimination threshold Value, combines feature corresponding with audio data to be processed and traverses, and obtains the feature combination for meeting discrimination threshold value, and benefit The feature for meeting discrimination threshold value is combined with preset evaluation index and is screened, available feature combination is obtained, strengthens pair The reliability that voice data and noise data distinguish.Using deep learning noise reduction model, to based at the beginning of the first of discrimination Beginning audio data carries out noise reduction process, the voice data after obtaining noise reduction.And improving voice data and noise data differentiation On the basis of degree, it is utilized trained deep learning noise reduction model, at the noise reduction for quickly and efficiently realizing voice data Reason, further improves voice data noise reduction effect.

Detailed description of the invention

Fig. 1 is the application scenario diagram of voice data noise-reduction method in one embodiment；

Fig. 2 is the flow diagram of voice data noise-reduction method in one embodiment；

Fig. 3 is flow diagram the step of obtaining deep learning noise reduction model in one embodiment；

Fig. 4 is the structural block diagram of voice data denoising device in one embodiment；

Fig. 5 is the internal structure chart of computer equipment in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Voice data noise-reduction method provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, eventually End 102 is communicated by network with server 104.Server 104 receive that terminal 102 sends to audio data to be processed Noise reduction request, obtains audio data to be processed, obtains the corresponding feature combination of audio data to be processed, and obtain in feature combination Incidence relation between each feature combines the incidence relation between corresponding feature and each feature according to feature, calculates each feature Combined discrimination.Server 104 screens the combination of each feature according to preset discrimination threshold value, obtains initial characteristics group It closes.Initial characteristics combination is screened using preset evaluation index, obtains the available feature combination for meeting preset evaluation index, It obtains available feature and combines corresponding audio data to be processed, generate the first initial audio data based on discrimination, and be based on Deep learning noise reduction model carries out noise reduction process to the first initial audio data, the voice data after generating noise reduction.Wherein, eventually End 102 can be, but not limited to be various personal computers, laptop, smart phone and tablet computer, and server 104 can be with It is realized with the server cluster of the either multiple server compositions of independent server.

In one embodiment, as shown in Fig. 2, providing a kind of voice data noise-reduction method, it is applied to Fig. 1 in this way In server for be illustrated, comprising the following steps:

S202 receives the noise reduction to audio data to be processed that terminal is sent and requests, and obtains audio data to be processed.

S204 obtains the corresponding feature combination of audio data to be processed, and obtains the feature and combine between interior each feature Incidence relation, the incidence relation between corresponding feature and each feature is combined according to the feature, calculates the combination of each feature Discrimination.

Specifically, audio data to be processed corresponding with noise reduction process request, corresponding multiple features can be multiple by obtaining Incidence relation between feature generates corresponding feature combination according to multiple features and each incidence relation.Server can be by obtaining The incidence relation between the corresponding feature of audio data to be processed and each feature is taken, according to the association between feature and each feature Relationship generates feature corresponding with audio data to be processed and combines.So as to combine corresponding feature and Ge Te according to each feature Incidence relation between sign calculates separately the discrimination of each feature combination.

In the present solution, the feature of audio data includes sample frequency, bit rate, port number, frame per second, zero-crossing rate and short Shi Nengliang, wherein sample frequency indicated within the unit time, the number of Sample point collection was carried out in analog signal, to the mould taken Point on quasi- signal assigns a number, can be transformed into digital signal.Bit rate indicates light ring of analog signal (influencing sound The amplitude of loudness) different brackets that is divided.Port number indicates the number of channels of audio, and frame per second indicates voiced frame in the unit time Number, a frame may include multiple sample sounds.Zero-crossing rate indicates the number of signal zero passage in every frame signal, for embodying audio Frequecy characteristic.Short-time energy is for embodying audio signal in the degree of strength of different moments.Therefore, because different audio datas The value of corresponding feature is different, therefore is combined according to the feature that the corresponding each feature of different audio datas generates, also not identical.

It further, further include being carried out at corresponding data to audio data to be processed before degree of distinguishing calculating Reason is specifically included with improving the accuracy that each feature being calculated combines corresponding discrimination:

For the initial audio data of different types of data, different noise reduction process modes is executed, data type includes number Font, byte type and text-type etc., corresponding noise reduction process mode are judgement processing, assignment processing harmony daylight reason.For number Font initial audio data executes judgement processing, obtains preset value range, and preset value range and numeric type is initial The value of audio data is compared, and judges whether the value of numeric type initial audio data meets preset value range, mentions The numeric type initial audio data for meeting default value range is taken, the noise data in numeric type initial audio data is deleted, it is raw At numeric type data available.

For byte type initial audio data, execute assignment processing, judge byte type initial audio data value whether Meet default value, when the value of byte type initial audio data does not meet default value, default value is assigned to corresponding Byte type initial audio data, and the noise data in the byte type initial audio data after being assigned is deleted, generate byte type Data available.

For text-type initial audio data, statement processing is executed, the constituent of text-type initial audio data is obtained, Be compared according to constituent and default constituent, when text-type initial audio data constituent and default composition at When dividing inconsistent, text-type initial audio data is claimed as default constituent, is deleted in text-type initial audio data Noise data generates text-type data available.

S206 screens the combination of each feature according to preset discrimination threshold value, obtains initial characteristics combination.Specifically Ground, server are compared with discrimination threshold value respectively by the discrimination for combining each feature, are obtained and are exceeded discrimination threshold value Discrimination corresponding feature combination, as initial characteristics combine.Wherein, discrimination threshold value is used for audio data pair to be processed The feature combination answered is traversed and is screened, and then obtains the corresponding audio number to be processed of feature combination for meeting discrimination threshold value According to.That is, server traverses each feature combination correspondence by obtaining preset discrimination threshold value, and according to discrimination threshold value Discrimination generate initial characteristics combination to obtain the discrimination corresponding feature combination beyond discrimination threshold value.

Further, the noise data that threshold value is broken through in initial audio data can also be deleted.Wherein, the noise of threshold value is broken through Data are that discrimination is lower than the corresponding data of initial discrimination threshold value, i.e., invalid audio data, and initial characteristics combine corresponding number According to for the corresponding data of discrimination beyond discrimination threshold value, i.e. initial audio data.In the present solution, can be by discrimination threshold value Range is set as 0.8 to 1, is the noise number that can not carry out noise reduction operation lower than the invalid audio data of discrimination threshold value 0.8 According to being also not belonging to effective audio data, execute delete processing.Initial audio data beyond discrimination threshold value 1 is to need to carry out Noise reduction process, to generate effective audio data.

S208 screens initial characteristics combination using preset evaluation index, obtain meet preset evaluation index can It is combined with feature.

Specifically, the preset evaluation index that server obtains, including AUC value, accuracy rate and recall rate, server root According to the AUC value, accuracy rate and recall rate got, initial characteristics combination is screened, satisfactory initial spy is obtained Sign combination generates available feature combination.

Wherein, AUC value is the size experienced below linearity curve, and full name Area Under Curve is defined as ROC Area under a curve, value range is between 0.5 and 1.ROC curve is to experience linearity curve, and each point reflects identical on curve Sensitivity, they are all the reactions to same signal stimulus, resulting under two different criterion as a result, subject Performance curve is exactly using false positive probability as horizontal axis, and true positives are coordinate diagram composed by the longitudinal axis and subject specific Due to curve that the Different Results obtained using different judgment criterias are drawn under incentive condition.

Accuracy rate (Precision) indicates for given test data set, sample number that classifier is correctly classified and total The ratio between sample number, that is, loss function are accuracys rate when 0-1 loses in test data set.It may is that standard with formula expression True rate=system retrieval to all total number of files retrieved of associated documents/system.

Recall rate (Recall) is the measurement of covering surface, and measurement has multiple positive examples to be divided into positive example.It can be expressed with formula Are as follows: all relevant total number of files of associated documents/system that recall rate=system retrieval arrives.

Further, server obtains the corresponding relationship between AUC value and initial characteristics combination, and according to preset AUC Value screens initial characteristics combination, obtains the initial characteristics combination for meeting preset AUC value.

Wherein, server can set 0.8 for AUC value, using AUC value 0.8 to initial according to the value range of AUC value Feature combination is screened, and the initial characteristics combination for meeting AUC value is obtained.Server obtains accuracy rate and initial characteristics combine it Between corresponding relationship, according to preset accuracy rate to initial characteristics combination screen, obtain meet preset accuracy rate just The combination of beginning feature.Server obtains the corresponding relationship between recall rate and initial characteristics combination, according to preset recall rate to first The combination of beginning feature is screened, and the initial characteristics combination for meeting preset recall rate is obtained.Finally, server is default according to meeting Evaluation index AUC value, accuracy rate and recall rate initial characteristics combination, generate available feature combination.

S210 obtains available feature and combines corresponding audio data to be processed, generates the first initial sound based on discrimination Frequency evidence.

S212 is based on deep learning noise reduction model, noise reduction process is carried out to the first initial audio data, after generating noise reduction Voice data.

Specifically, server according to predetermined length to the first initial audio data carry out slicing treatment, and according to slice after The first initial audio data, generate the first initial audio data voiceprint map to be processed, mentioned from voiceprint map to be processed Take the vocal print parameter to be processed of the first initial audio data.So as to which vocal print parameter to be processed is inputted deep learning noise reduction model In, the voice data after obtaining noise reduction.

Further, server is joined by obtaining the vocal print to be processed of the first initial audio data of pending noise reduction process Number, and parameter to be processed is inputted in deep learning noise reduction model, by the second vocal print parameter and vocal print parameter to be processed progress Match, obtain corresponding first initial audio data of vocal print parameter to be processed for meeting the second vocal print parameter, is joined using the first vocal print Number carries out noise reduction process to corresponding first initial audio data of vocal print parameter to be processed for meeting the second vocal print parameter, obtains Voice data after noise reduction.

In above-mentioned voice data noise-reduction method, by utilize preset discrimination threshold value, to audio data pair to be processed The feature combination answered is traversed, and obtains the feature combination for meeting discrimination threshold value, and using preset evaluation index to meeting The feature combination of discrimination threshold value is screened, and available feature combination is obtained, and is strengthened to voice data and noise data progress The reliability of differentiation.Using deep learning noise reduction model, noise reduction process is carried out to the first initial audio data based on discrimination, Voice data after obtaining noise reduction.And on the basis of improving voice data and noise data discrimination, it is utilized and trains Deep learning noise reduction model, quickly and efficiently realize the noise reduction process of voice data, further improve voice data Noise reduction effect.

In one embodiment, as shown in figure 3, the step of providing a kind of acquisition deep learning noise reduction model, comprising:

S302, obtained from training sample without departing from the feature of discrimination threshold value combine corresponding effective audio data and its Corresponding second initial audio data.

Specifically, include in training sample combined without departing from the feature of discrimination threshold value corresponding effective audio data, with Effective corresponding second initial audio data of audio data and the invalid audio data for breaking through threshold value.Server is deep according to training The requirement of degree study noise reduction model, obtains effective audio data and corresponding with effective audio data second from training sample Initial audio data.

S304 carries out slicing treatment to effective audio data and the second initial audio data respectively according to predetermined length.

Specifically, server also needs before carrying out slicing treatment to effective audio data and the second initial audio data Effective audio data and the second initial audio data are pre-processed, obtain effective audio data and second of predetermined format Initial audio data.Preset slice length is further obtained, and according to the predetermined length to effective sound of a pair of of predetermined format Frequency evidence and the second initial audio data, carry out slicing treatment.

S306 generates the first voiceprint map of effective audio data according to effective audio data after slice, from the first sound The first vocal print parameter of effective audio data is extracted in line map.

S308 generates the second voiceprint map of the second initial audio data according to the second initial audio data after slice, The second vocal print parameter of the second initial audio data is extracted from the second voiceprint map.

Specifically, server generates effective respectively according to the effective audio data and the second initial audio data after slice First voiceprint map of audio data and and corresponding second voiceprint map of initial audio data, and respectively from the first vocal print figure The first vocal print parameter of effective audio data is extracted in spectrum, and the of the second initial audio data is extracted from the second voiceprint map Two vocal print parameters.

Wherein, presently known of voiceprint map specifically includes that broadband vocal print, narrowband vocal print, amplitude vocal print, contour sound Line, time wave spectrum vocal print and section vocal print (and dividing two kinds of broadband, narrowband).Wherein, the frequency of the first two kind display language and strong The variation characteristic of degree over time, the feature that intermediate three kinds of display voice intensity or acoustic pressure change over time；Section vocal print is only It is to show the vocal print figure for sometime putting upper intensity of acoustic wave and frequecy characteristic.

Vocal print parameter is the exclusive feature of an audio data, refers to the important acoustics that can represent a segment of audio data The digital signature based on content of feature, main purpose are to establish a kind of effective mechanism to compare the perception of two audio datas Acoustical quality, can include but is not limited to acoustic feature related with the anatomical structure of pronunciation mechanism of the mankind, such as frequency spectrum, Cepstrum, formant, fundamental tone and reflection coefficient etc..

S310, using the second vocal print parameter of the second initial audio data as the input of deep learning model, corresponding moment Effective audio data output of the first vocal print parameter as deep learning model, deep learning model is trained, is obtained Obtain deep learning noise reduction model.

Specifically, the second vocal print parameter of the second initial audio data, as the input of deep learning model, in requisition for The audio data of noise reduction process is carried out, the first vocal print parameter of effective audio data corresponding with the second initial audio data is made For the output of deep learning model, the corresponding audio data after noise reduction process.By repeatedly obtaining effective sound from sample Frequency evidence and the second initial audio data, and corresponding second vocal print parameter and the first vocal print parameter are extracted, to deep learning mould Type is trained, and deep learning noise reduction model can be obtained.

In above-mentioned steps, server obtains corresponding without departing from the combination of the feature of discrimination threshold value effective from training sample Audio data and its corresponding second initial audio data, and it is initial to effective audio data and second respectively according to predetermined length Audio data carries out slicing treatment and generates the first voiceprint map of effective audio data according to effective audio data after slice, The first vocal print parameter that effective audio data is extracted from the first voiceprint map, according to the second initial audio data after slice, The second voiceprint map for generating the second initial audio data, extracts the second of the second initial audio data from the second voiceprint map Vocal print parameter.So as to using the second vocal print parameter of the second initial audio data as the input of deep learning model, to it is corresponding when Output of the first vocal print parameter of the effective audio data carved as deep learning model, realizes the instruction to deep learning model Practice, obtains the deep learning noise reduction model that can be used for voice data, improve the noise reduction effect for voice data.

In one embodiment, provide it is a kind of obtain the corresponding feature combination of audio data to be processed, and calculate each spy The step of levying combined discrimination, comprising:

Server obtains the incidence relation between the corresponding feature of audio data to be processed and each feature；According to feature and Incidence relation between each feature generates feature corresponding with audio data to be processed and combines；It is combined according to each feature corresponding Incidence relation between feature and each feature calculates separately the discrimination of each feature combination.

Further, audio data includes that corresponding each feature value is different, i.e. sample frequency, bit rate, port number, frame Rate and the value of short-time energy are different, and the incidence relation between character pair is also inconsistent, therefore server passes through respectively The different values and incidence relation of feature can calculate separately to obtain the discrimination between the feature combination of different audio datas.

In above-mentioned steps, server generates and audio data to be processed according to the incidence relation between feature and each feature Corresponding feature combination, and the incidence relation between corresponding feature and each feature is combined according to each feature, calculate separately each spy Levy combined discrimination.It realizes and is combined for different characteristic, corresponding discrimination is calculated separately, by sound effective in voice data The noise data of frequency evidence and pending noise reduction process, quickly distinguishes, and improves work efficiency.

In one embodiment, it provides a kind of combine each feature according to preset discrimination threshold value to screen, obtain The step of obtaining initial characteristics combination, comprising:

The discrimination that each feature combines is compared with discrimination threshold value server respectively；It obtains and exceeds discrimination threshold value Discrimination corresponding feature combination, generate initial characteristics combination.

Specifically, server is compared with discrimination threshold value respectively by the discrimination for combining each feature, is obtained super The corresponding feature combination of the discrimination of discrimination threshold value, as initial characteristics combine out.That is, server is pre- by obtaining If discrimination threshold value, and each feature is traversed according to discrimination threshold value and combines corresponding discrimination, exceed discrimination threshold to obtain The corresponding feature combination of the discrimination of value, generates initial characteristics combination.

In above-mentioned steps, the discrimination that each feature combines is compared with discrimination threshold value server respectively, and obtains The corresponding feature combination of discrimination beyond discrimination threshold value, generates initial characteristics combination.Due to consideration that each feature combination pair The comparison of the discrimination and discrimination threshold value answered can delete the invalid data in audio data to be processed, acquisition need into When the initial audio data of row noise reduction process, screening operation amount is reduced, is improved work efficiency.

In one embodiment, it provides a kind of combine initial characteristics using preset evaluation index to screen, obtain The step of available feature combines, comprising:

Server obtains preset evaluation index；Preset evaluation index includes AUC value, accuracy rate and recall rate；According to AUC Value, accuracy rate and recall rate screen initial characteristics combination；Satisfactory initial characteristics combination is obtained, generation can It is combined with feature.

Specifically, AUC value is the size experienced below linearity curve, and full name Area Under Curve is defined as Area under ROC curve, value range is between 0.5 and 1.Accuracy rate (Precision) is indicated for given test number According to collection, the ratio between sample number and total number of samples that classifier is correctly classified, that is, loss function are test data sets when 0-1 loses On accuracy rate.With formula expression may is that accuracy rate=system retrieval to all files retrieved of associated documents/system Sum.Recall rate (Recall) is the measurement of covering surface, and measurement has multiple positive examples to be divided into positive example.It can be expressed as with formula: All relevant total number of files of associated documents/system that recall rate=system retrieval arrives.

In above-mentioned steps, server is according to preset evaluation index AUC value, accuracy rate and recall rate, to initial characteristics group Conjunction is screened, and available feature combination is generated, and is screened again to feature combination, is further improved initial audio data Obtain efficiency.

In one embodiment, one kind is provided based on deep learning noise reduction model, and the first initial audio data is carried out Noise reduction process, generate noise reduction after voice data the step of, comprising:

Server carries out slicing treatment to the first initial audio data according to predetermined length；It is initial according to first after slice Audio data generates the voiceprint map to be processed of the first initial audio data, and it is initial that first is extracted from voiceprint map to be processed The vocal print parameter to be processed of audio data；Vocal print parameter to be processed is inputted in deep learning noise reduction model, after obtaining noise reduction Voice data.

Specifically, server is joined by obtaining the vocal print to be processed of the first initial audio data of pending noise reduction process Number, and parameter to be processed is inputted in deep learning noise reduction model, by the second vocal print parameter and vocal print parameter to be processed progress Match, obtain corresponding first initial audio data of vocal print parameter to be processed for meeting the second vocal print parameter, is joined using the first vocal print Number carries out noise reduction process to corresponding first initial audio data of vocal print parameter to be processed for meeting the second vocal print parameter, obtains Voice data after noise reduction.

In above-mentioned steps, server carries out slicing treatment to the first initial audio data according to predetermined length, according to slice The first initial audio data afterwards generates the voiceprint map to be processed of the first initial audio data, from voiceprint map to be processed The vocal print parameter to be processed for extracting the first initial audio data inputs vocal print parameter to be processed in deep learning noise reduction model, Voice data after obtaining noise reduction.By utilizing deep learning noise reduction model, to the first initial audio data based on discrimination Noise reduction process is carried out, voice data noise reduction effect is improved.

In one embodiment, a kind of voice data noise-reduction method is provided, further includes:

Server combines the corresponding relationship between data type according to each feature, obtains combine correspondence with each feature respectively Data type；Data type includes numeric type, byte type and text-type；According between data type and data processing method Corresponding relationship, obtain corresponding with data type data processing method；Data processing method includes judgement processing, assignment processing And statement processing；According to each data processing method, data are carried out to the corresponding audio data to be processed of each feature combination respectively Processing.

Specifically, for numeric type primary data, judgement processing is executed, preset value range is obtained, is taken preset The value of value range and numeric type primary data is compared, and judges whether the value of numeric type primary data meets preset take It is worth range, extracts the numeric type primary data for meeting default value range, deletes the noise data in numeric type primary data, it is raw At numeric type data available.

For byte type primary data, assignment processing is executed, it is default to judge whether the value of byte type primary data meets Default value is assigned to corresponding byte type initial number when the value of byte type primary data does not meet default value by value According to, and the noise data in the byte type primary data after being assigned is deleted, generate byte type data available.

For text-type primary data, statement processing is executed, the constituent of text-type primary data is obtained, according to composition Ingredient and default constituent are compared, when the constituent of text-type primary data and default constituent are inconsistent, Text-type primary data is claimed as default constituent, the noise data in text-type primary data is deleted, generates text-type Data available.

In above-mentioned steps, server is before the discrimination for carrying out each feature combination calculates, according to audio data to be processed Different type, perform corresponding data prediction respectively, improve subsequent each feature and combine what corresponding discrimination calculated Accuracy.

It should be understood that although each step in the flow chart of Fig. 2-3 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-3 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.

In one embodiment, as shown in figure 4, providing a kind of voice data denoising device, comprising: receiving module 402, Discrimination computing module 404, initial characteristics combination obtain module 406, available feature combination obtains module 408, initial audio number According to generation module 410 and noise reduction module 412, in which:

Receiving module 402, the noise reduction to audio data to be processed for receiving terminal transmission is requested, and is obtained to be processed Audio data；

Discrimination computing module 404, for obtaining the corresponding feature combination of the audio data to be processed, and described in acquisition Feature combines the incidence relation between interior each feature, combines the association between corresponding feature and each feature according to the feature and closes System calculates the discrimination of each feature combination.

Initial characteristics combination obtains module 406, for being screened according to preset discrimination threshold value to the combination of each feature, Obtain initial characteristics combination；

Available feature combination obtains module 408, for being screened using preset evaluation index to initial characteristics combination, obtains The available feature combination of preset evaluation index must be met；

Initial audio data generation module 410 combines corresponding audio data to be processed for obtaining available feature, generates The first initial audio data based on discrimination；

Noise reduction module 412 carries out noise reduction process to the first initial audio data for being based on deep learning noise reduction model, Voice data after generating noise reduction.

Above-mentioned voice data denoising device, by utilizing preset discrimination threshold value, to corresponding with audio data to be processed Feature combination traversed, obtain the feature combination for meeting discrimination threshold value, and using preset evaluation index to meeting area The feature combination of indexing threshold value is screened, and is obtained available feature combination, is strengthened and carry out area to voice data and noise data The reliability divided.Using deep learning noise reduction model, noise reduction process is carried out to the first initial audio data based on discrimination, is obtained Voice data after obtaining noise reduction.And on the basis of improving voice data and noise data discrimination, it is utilized trained Deep learning noise reduction model, quickly and efficiently realizes the noise reduction process of voice data, further improves voice data drop It makes an uproar effect.

In one embodiment, a kind of deep learning noise reduction model training module is provided, is also used to:

It is obtained from training sample and combines corresponding effective audio data and its correspondence without departing from the feature of discrimination threshold value The second initial audio data；Effective audio data and the second initial audio data are carried out at slice respectively according to predetermined length Reason；According to effective audio data after slice, the first voiceprint map of effective audio data is generated, is mentioned from the first voiceprint map Take the first vocal print parameter of effective audio data；According to the second initial audio data after slice, the second initial audio number is generated According to the second voiceprint map, the second vocal print parameter of the second initial audio data is extracted from the second voiceprint map；It will be at the beginning of second Input of the second vocal print parameter of beginning audio data as deep learning model, the first sound of effective audio data at corresponding moment Output of the line parameter as deep learning model, is trained deep learning model, obtains deep learning noise reduction model.

Above-mentioned deep learning noise reduction model training module, by using the second vocal print parameter of the second initial audio data as The input of deep learning model, the first vocal print parameter of effective audio data at corresponding moment is as the defeated of deep learning model Out, realize the training to deep learning model, obtain the deep learning noise reduction model that can be used for voice data, improve for The noise reduction effect of voice data.

In one embodiment, a kind of discrimination computing module is provided, is also used to:

Obtain the incidence relation between the corresponding feature of audio data to be processed and each feature；According to feature and each feature Between incidence relation, generate corresponding with audio data to be processed feature and combine；According to each feature combine corresponding feature and Incidence relation between each feature calculates separately the discrimination of each feature combination.

Above-mentioned discrimination computing module realizes and combines for different characteristic, corresponding discrimination calculated separately, by voice The noise data of effective audio data and pending noise reduction process, quickly distinguishes, improves work efficiency in data.

In one embodiment, a kind of initial characteristics combination acquisition module is provided, is also used to:

The discrimination that each feature combines is compared with discrimination threshold value respectively；Obtain the differentiation for exceeding discrimination threshold value Corresponding feature combination is spent, initial characteristics combination is generated.

Above-mentioned initial characteristics combination obtains module, it is contemplated that each feature combines the ratio of corresponding discrimination and discrimination threshold value It is right, the invalid data in audio data to be processed can be deleted, when acquisition needs to carry out the initial audio data of noise reduction process, Screening operation amount is reduced, is improved work efficiency.

In one embodiment, a kind of available feature acquisition module is provided, is also used to:

Obtain preset evaluation index；Preset evaluation index includes AUC value, accuracy rate and recall rate；According to AUC value, standard True rate and recall rate screen initial characteristics combination；Satisfactory initial characteristics combination is obtained, available feature is generated Combination.

Above-mentioned available feature obtains module, and server is right according to preset evaluation index AUC value, accuracy rate and recall rate Initial characteristics combination is screened, and available feature combination is generated, and is screened again to feature combination, is further improved initial The acquisition efficiency of audio data.

In one embodiment, a kind of noise reduction module is provided, is also used to:

Slicing treatment is carried out to the first initial audio data according to predetermined length；According to the first initial audio number after slice According to the voiceprint map to be processed of the first initial audio data of generation extracts the first initial audio number from voiceprint map to be processed According to vocal print parameter to be processed；Vocal print parameter to be processed is inputted in deep learning noise reduction model, the voice number after obtaining noise reduction According to.

Above-mentioned noise reduction module, by utilizing deep learning noise reduction model, to the first initial audio data based on discrimination Noise reduction process is carried out, voice data noise reduction effect is improved.

In one embodiment, a kind of data processing module is provided, is also used to:

According to the corresponding relationship between the combination of each feature and data type, obtains combine corresponding data with each feature respectively Type；Data type includes numeric type, byte type and text-type；According to the correspondence between data type and data processing method Relationship obtains data processing method corresponding with data type；Data processing method includes judgement processing, assignment processing and sound Daylight reason；According to each data processing method, data processing is carried out to the corresponding audio data to be processed of each feature combination respectively.

Above-mentioned data processing module, server is before the discrimination for carrying out each feature combination calculates, according to sound to be processed The different type of frequency evidence performs corresponding data prediction respectively, improves subsequent each feature and combines corresponding discrimination The accuracy of calculating.

Specific about voice data denoising device limits the limit that may refer to above for voice data noise-reduction method Fixed, details are not described herein.Modules in above-mentioned voice data denoising device can fully or partially through software, hardware and its Combination is to realize.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be with It is stored in the memory in computer equipment in a software form, in order to which processor calls the above modules of execution corresponding Operation.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 5.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is used for voice data noise reduction data.The network interface of the computer equipment is used for and external terminal It is communicated by network connection.To realize a kind of voice data noise-reduction method when the computer program is executed by processor.

It will be understood by those skilled in the art that structure shown in Fig. 5, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with Computer program, the processor realize the step in above-mentioned each embodiment of the method when executing computer program.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes the step in above-mentioned each embodiment of the method when being executed by processor.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of voice data noise-reduction method, which comprises

The corresponding feature combination of the audio data to be processed is obtained, and obtains the association in the feature combination between each feature Relationship combines the incidence relation between corresponding feature and each feature according to the feature, calculates the area of each feature combination Indexing；

Each feature combination is screened according to preset discrimination threshold value, obtains initial characteristics combination；

Initial characteristics combination is screened using preset evaluation index, obtains and meets the available of the preset evaluation index Feature combination；

It obtains the available feature and combines corresponding audio data to be processed, generate the first initial audio number based on discrimination According to；

Based on deep learning noise reduction model, noise reduction process is carried out to first initial audio data, the voice after generating noise reduction Data.

2. the method according to claim 1, wherein obtaining the mode of the deep learning noise reduction model, comprising:

It is obtained from training sample and combines corresponding effective audio data and its correspondence without departing from the feature of the discrimination threshold value The second initial audio data；

According to effective audio data after slice, the first voiceprint map of effective audio data is generated, from described the The first vocal print parameter of effective audio data is extracted in one voiceprint map；

According to second initial audio data after slice, the second voiceprint map of second initial audio data is generated, The second vocal print parameter of second initial audio data is extracted from second voiceprint map；

Using the second vocal print parameter of second initial audio data as the input of deep learning model, the effective of moment is corresponded to Output of the first vocal print parameter of audio data as deep learning model, is trained deep learning model, obtains depth Learn noise reduction model.

3. the method according to claim 1, wherein obtaining the corresponding feature group of the audio data to be processed It closes, and obtains the incidence relation in the feature combination between each feature, corresponding feature and Ge Te are combined according to the feature Incidence relation between sign calculates the discrimination of each feature combination, comprising:

According to the incidence relation between the feature and each feature, feature group corresponding with the audio data to be processed is generated It closes；

The incidence relation between corresponding feature and each feature is combined according to each feature, calculates separately the area of each feature combination Indexing.

4. according to claim 1 to method described in 3 any one, which is characterized in that described according to preset discrimination threshold value Each feature combination is screened, initial characteristics combination is obtained, comprising:

5. according to claim 1 to method described in 3 any one, which is characterized in that described to utilize preset evaluation index to institute It states initial characteristics combination to be screened, obtains the available feature combination for meeting the preset evaluation index, comprising:

6. according to the method described in claim 2, it is characterized in that, described be based on deep learning noise reduction model, to described first Initial audio data carries out noise reduction process, the voice data after generating noise reduction, comprising:

According to first initial audio data after slice, the vocal print figure to be processed of first initial audio data is generated Spectrum, extracts the vocal print parameter to be processed of first initial audio data from the voiceprint map to be processed；

7. the method according to claim 1, wherein the step of calculating the discrimination of each feature combination it Before, further includes:

According to the corresponding relationship between each feature combination and data type, acquisition is combined corresponding with each feature respectively Data type；The data type includes numeric type, byte type and text-type；

According to the corresponding relationship between the data type and data processing method, data corresponding with the data type are obtained Processing mode；The data processing method includes judgement processing, assignment processing and statement processing；

According to each data processing method, data processing is carried out to the corresponding audio data to be processed of each feature combination respectively.

8. a kind of voice data denoising device, which is characterized in that described device includes:

Receiving module, the noise reduction to audio data to be processed for receiving terminal transmission is requested, and obtains the sound to be processed Frequency evidence；

Discrimination computing module for obtaining the corresponding feature combination of the audio data to be processed, and obtains the feature group Incidence relation in closing between each feature, combines the incidence relation between corresponding feature and each feature according to the feature, counts Calculate the discrimination of each feature combination；

Initial characteristics combination obtains module, for being screened according to preset discrimination threshold value to each feature combination, obtains Obtain initial characteristics combination；

Available feature combination obtains module, for being screened using preset evaluation index to initial characteristics combination, obtains Meet the available feature combination of the preset evaluation index；

Initial audio data generation module combines corresponding audio data to be processed for obtaining the available feature, generates base In the first initial audio data of discrimination；

Noise reduction module carries out noise reduction process to first initial audio data, generates for being based on deep learning noise reduction model Voice data after noise reduction.

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.