CN107680614A

CN107680614A - Acoustic signal processing method, device and storage medium

Info

Publication number: CN107680614A
Application number: CN201710919028.9A
Authority: CN
Inventors: 肖纯智
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2017-09-30
Filing date: 2017-09-30
Publication date: 2018-02-09
Anticipated expiration: 2037-09-30
Also published as: CN107680614B

Abstract

The invention discloses a kind of acoustic signal processing method, device and storage medium, belong to technical field of video processing.Method includes：When terminal detects identification instruction, the first spectrum sequence of the audio signal of user is specified in first determination, and the identification, which instructs, to be used to indicate whether the detection specified user is singing specified song；Terminal determines the mutation time set of the audio signal, the mutation time set includes multiple time points, a time point corresponding mutation spectrum value according to first spectrum sequence；Contrast the audio signal mutation time set and the specified song corresponding to similarity between benchmark mutation time set, obtain comparing result, prompt message corresponding to the comparing result is exported, the prompt message is used to indicate whether the specified user is singing.Be identified by the benchmark mutation time set of mutation time set and specified song, improve the identification specify user whether performance method practicality.

Description

Acoustic signal processing method, device and storage medium

Technical field

The present invention relates to technical field of video processing, more particularly to a kind of audio file processing method, device and storage are situated between Matter.

Background technology

Network direct broadcasting is a kind of network service provided by live application, is a kind of amusement shape very popular at present Formula.Live application sets different direct broadcasting rooms, for example, direct broadcasting room of singing, telling a story between direct broadcasting room or live teaching broadcast etc..Main broadcaster User can carry out different types of net cast in different direct broadcasting rooms.But some main broadcaster users in direct broadcasting room without Net cast corresponding to the direct broadcasting room.For example, main broadcaster user is in singing direct broadcasting room and live without singing, but doing it His thing.Therefore, when main broadcaster user is in singing direct broadcasting room, it is necessary to detect whether the main broadcaster user is singing.

In correlation technique, when main broadcaster user sings in singing direct broadcasting room specifies song, whether the main broadcaster user is identified Can be in the process of singing：Terminal gathers the audio signal of main broadcaster user, extracts the pitch sequence of the audio signal, and obtaining should Song is specified to correspond to standard pitch sequence, the standard pitch sequence is that staff obtains by way of manually marking in advance 's.Terminal calculates the similarity between the pitch sequence of the audio signal and the standard pitch sequence, if the similarity is not small In certain numerical value, then terminal determines that the main broadcaster user is singing, and otherwise, terminal determines that the main broadcaster user is not singing.

During the present invention is realized, inventor has found that correlation technique at least has problems with：

Above method needs carry out manual identified in advance by way of manually marking and go out standard pitch sequence, however, by It is various in original song quantity in the market, manual identified it is extremely inefficient, at present, most original song is not corresponding Standard answer high sequence, when main broadcaster user sings live original song without corresponding standard pitch sequence, will be unable into Row identification, so as to cause the poor practicability of the above method.

The content of the invention

The invention provides a kind of acoustic signal processing method, device and storage medium, can solve prior art practicality The problem of property difference.Technical scheme is as follows：

First aspect, there is provided a kind of acoustic signal processing method, methods described include：

When detecting identification instruction, it is determined that the first spectrum sequence of the audio signal of user is specified, the identification instruction For indicating to detect whether the specified user is singing specified song；

According to first spectrum sequence, the mutation time set of the audio signal, the mutation time set are determined Including multiple time points, a time point corresponding mutation spectrum value；

Contrast the audio signal mutation time set and the specified song corresponding to the set of benchmark mutation time it Between similarity, obtain comparing result, export prompt message corresponding to the comparing result, the prompt message is used to indicate institute State whether specified user is singing.

It is described according to first spectrum sequence in a kind of possible design, when determining the mutation of the audio signal Between gather, including：

According to each spectrum value in first spectrum sequence, the diversity factor between two neighboring spectrum value is determined；

When the diversity factor between two neighboring spectrum value is more than default diversity factor, adjacent the two of default diversity factor are will be greater than A constitutive mutation time at time point set in time point corresponding to individual spectrum value.

In a kind of possible design, the mutation time set of the contrast audio signal and the specified song pair Similarity between the benchmark mutation time set answered, obtains comparison result, exports prompt message corresponding to the comparison result, Including：

Determine the similarity between the mutation time set of the audio signal and the benchmark mutation time set；

When the comparison result is that the similarity is more than default similarity, determine to prompt corresponding to the comparison result Information indicates that the specified user is singing, and user is specified described in output indication in the prompt message of performance；

When the comparison result is that the similarity is not more than the default similarity, determine that the comparison result is corresponding Prompt message indicate that the specified user is not singing, specified user is not in the prompt message of performance described in output indication.

In a kind of possible design, when the mutation time set for determining the audio signal and the benchmark are mutated Between gather between similarity, including：

The number of the match point in the benchmark mutation time set is determined, match point is the benchmark mutation time set In time point with the time Point matching in the mutation time set of the audio signal；

The total number of time in the number and the benchmark mutation time set, determines the similarity.

In a kind of possible design, the benchmark mutation time set includes multiple benchmark mutation time subclass, and one Individual benchmark mutation time subclass corresponds to a benchmark audio sub-signals of the specified song；

Similarity between the mutation time set for determining the audio signal and the benchmark mutation time set, Including：

The mutation time set of the audio signal is divided into multiple mutation time subclass；

Multigroup subclass is determined, one group of subclass includes benchmark mutation time corresponding to same benchmark audio sub-signals Set and mutation time subclass；

The similarity of every group of subclass is determined respectively；

According to the similarity of every group of subclass, the mutation time set and the benchmark for determining the audio signal are dashed forward The similarity become between time set.

In a kind of possible design, first spectrum sequence be short-term spectrum sequence, in short-term log spectrum sequence or Person's any sequence in cepstrum sequence in short-term；First spectrum sequence for determining to specify the audio signal of user, including：

When first spectrum sequence is short-term spectrum sequence, the audio signal of the specified user is gathered, to described Audio signal carries out framing, windowing process and Short Time Fourier Transform, the short-term spectrum sequence of the audio signal is obtained, by institute Short-term spectrum sequence is stated as the first spectrum sequence；

When first spectrum sequence is log spectrum sequence in short-term, the audio signal of the specified user is gathered, it is right The audio signal carries out framing, windowing process and Short Time Fourier Transform, obtains the short-term spectrum sequence of the audio signal, Logarithmic transformation is carried out to the short-term spectrum sequence, obtains the log spectrum sequence in short-term, will described in log spectrum sequence in short-term Arrange the first spectrum sequence as the audio signal；

When first spectrum sequence is cepstrum sequence in short-term, the audio signal of the specified user is gathered, to described Audio signal carries out framing, windowing process and Short Time Fourier Transform, the short-term spectrum sequence of the audio signal is obtained, to institute State short-term spectrum sequence and carry out logarithmic transformation, obtain the log spectrum sequence in short-term, the sequence of log spectrum in short-term is entered Row inverse Fourier transform, the cepstrum sequence in short-term is obtained, using the sequence of cepstrum in short-term as the first of the audio signal Spectrum sequence.

In a kind of possible design, the mutation time set of the contrast audio signal and the specified song pair Similarity between the benchmark mutation time set answered, obtains comparing result, exports prompt message corresponding to the comparing result Before, methods described includes：

Obtain the lyrics or the music score of Chinese operas of the specified song；

The timestamp of the formulation song is obtained, is determined in the lyrics in each character or the music score of Chinese operas corresponding to each note Time point；

Time point corresponding to each note in each character in the lyrics or the music score of Chinese operas is formed into the specified song pair The benchmark mutation time set answered.

Second aspect, there is provided a kind of audio signal processor, described device include：

First determining module, for when detect identification instruction when, it is determined that specify user audio signal the first frequency spectrum Sequence, the identification, which instructs, to be used to indicate to detect whether the specified user is singing specified song；

Second determining module, for according to first spectrum sequence, determining the mutation time set of the audio signal, The mutation time set includes multiple time points, a time point corresponding mutation spectrum value；

Output module, for contrast the audio signal mutation time set and the specified song corresponding to benchmark dash forward The similarity become between time set, obtains comparing result, exports prompt message corresponding to the comparing result, the prompting letter Cease for indicating whether the specified user is singing.

In a kind of possible design, second determining module, including：

First determining unit, for each spectrum value in first spectrum sequence, determine two neighboring frequency spectrum Diversity factor between value；

Component units, for when the diversity factor between two neighboring spectrum value is more than default diversity factor, will be greater than presetting A constitutive mutation time at time point set in time point corresponding to the two neighboring spectrum value of diversity factor.

In a kind of possible design, the output module, it is additionally operable to determine the mutation time set of the audio signal Similarity between the benchmark mutation time set；When the comparison result is that the similarity is more than default similarity When, determine that prompt message corresponding to the comparison result indicates that the specified user is singing, user is specified described in output indication In the prompt message of performance；When the comparison result is that the similarity is not more than the default similarity, the ratio is determined The specified user, which is not singing, to be indicated to prompt message corresponding to result, user's not carrying in performance is specified described in output indication Show information.

In a kind of possible design, the output module, it is additionally operable to determine in the benchmark mutation time set Number with point, match point be the benchmark mutation time set in the time in the mutation time set of the audio signal The time point of Point matching；The total number of time in the number and the benchmark mutation time set, determines the phase Like degree.

The output module, including：

Division unit, for the mutation time set of the audio signal to be divided into multiple mutation time subclass；

Second determining unit, for determining multigroup subclass, one group of subclass includes same benchmark audio sub-signals pair The benchmark mutation time subclass and mutation time subclass answered；

Second determining unit, it is additionally operable to determine the similarity of every group of subclass respectively；According to every group of subclass Similarity, determine the similarity between the mutation time set of the audio signal and the benchmark mutation time set.

In a kind of possible design, first spectrum sequence be short-term spectrum sequence, in short-term log spectrum sequence or Person's any sequence in cepstrum sequence in short-term；

First determining module, it is additionally operable to, when first spectrum sequence is short-term spectrum sequence, gather the finger Determine the audio signal of user, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio The short-term spectrum sequence of signal, using the short-term spectrum sequence as the first spectrum sequence；

First determining module, it is additionally operable to when first spectrum sequence is log spectrum sequence in short-term, gathers institute The audio signal of specified user is stated, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, is obtained described The short-term spectrum sequence of audio signal, logarithmic transformation is carried out to the short-term spectrum sequence, obtain the log spectrum sequence in short-term Row, the first spectrum sequence using the sequence of log spectrum in short-term as the audio signal；

First determining module, it is additionally operable to, when first spectrum sequence is cepstrum sequence in short-term, gather the finger Determine the audio signal of user, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio The short-term spectrum sequence of signal, logarithmic transformation is carried out to the short-term spectrum sequence, obtain the log spectrum sequence in short-term, it is right The sequence of log spectrum in short-term carries out inverse Fourier transform, obtains the cepstrum sequence in short-term, will described in cepstrum sequence in short-term The first spectrum sequence as the audio signal.

In a kind of possible design, described device includes：

Acquisition module, for obtaining the lyrics or the music score of Chinese operas of the specified song；

3rd determining module, for obtaining the timestamp of the formulation song, determine each character or song in the lyrics Time point in spectrum corresponding to each note；

Comprising modules, for described in the time point composition corresponding to each note in each character in the lyrics or the music score of Chinese operas Specify benchmark mutation time set corresponding to song.

The third aspect, there is provided a kind of audio signal processor, including processor and memory；The memory, is used for Deposit computer program；The processor, for performing the computer program deposited on the memory, realize first aspect Described method and step.

Fourth aspect, there is provided a kind of computer-readable recording medium, the computer-readable recording medium internal memory contain meter Calculation machine program, the computer program realize the method and step described in first aspect when being executed by processor.

In the embodiment of the present invention, when terminal detects identification instruction, first determine to specify the first of the audio signal of user Spectrum sequence, the identification, which instructs, to be used to indicate whether the detection specified user is singing specified song；Terminal is according to first frequency Spectral sequence, the mutation time set of the audio signal is determined, the mutation time set includes multiple time points, a time point pair Answer a mutation spectrum value；Contrast audio signal mutation time set and specified song corresponding to the set of benchmark mutation time it Between similarity, obtain comparing result, export prompt message corresponding to comparing result, prompt message is used to indicating specifying user to be It is no to sing.Because the first spectrum sequence based on audio signal determines mutation time set, by the mutation time set and The benchmark mutation time set of song is specified to be identified, current song has benchmark mutation time set, therefore, the present invention The recognition methods that embodiment provides using relatively broad, improve the identification specify user whether performance method practicality.

Brief description of the drawings

Fig. 1 is a kind of implementation environment schematic diagram of acoustic signal processing method provided in an embodiment of the present invention；

Fig. 2 is a kind of acoustic signal processing method flow chart provided in an embodiment of the present invention；

Fig. 3 is a kind of acoustic signal processing method flow chart provided in an embodiment of the present invention；

Fig. 4 is a kind of audio signal processor structural representation provided in an embodiment of the present invention；

Fig. 5 is a kind of structural representation of audio signal processor provided in an embodiment of the present invention.

Embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is described in further detail.

Fig. 1 is the implementation environment schematic diagram of acoustic signal processing method, and the implementation environment includes：Specify the terminal of user 101 and server 102.Connected between terminal 101 and server 102 by wired or wireless network.Clothes are run in terminal 101 It to be engaged in the application program that device 102 associates, terminal 101 can be based on user and identify to log in the application program, with login service device 102, So as to be interacted with server 102.

Application program is any application program that can gather audio signal, for example, live application or K song applications etc.. It is the user sung to specify user.For example, when the application program is live application, this specifies user can be based on Broadcasting user；When the application program is K song applications, this specifies the user that user can be current K songs.

However, user is specified sometimes in the direct broadcasting room of live application and without singing, so as to the sight to the direct broadcasting room Many users bring poor Consumer's Experience, therefore, when specified user is marking oneself singing specified song, detect this and specify Whether user is really singing specified song.In embodiments of the present invention, can be detected by terminal 101 this specify user whether Sing, can also detect whether the user is singing by server 102.In embodiments of the present invention, specified with the detection of terminal 101 Whether user illustrates exemplified by performance.This specify song can be specify song, essay, the song such as cross-talk.

Terminal 101 can be cell phone apparatus, PAD (Portable Android Device, tablet personal computer) equipment or electricity Any equipment that can gather audio signal such as brain equipment.Server 102 refers to the server that background service is provided for terminal 101 102, can be a server 102, or the cluster of server 102 being made up of some servers 102, or a cloud The center of calculation server 102, the embodiment of the present invention are not limited this.In a kind of possible implementation, server 102 can Think the background server for the live application installed in terminal 101.

Fig. 2 is a kind of acoustic signal processing method flow chart provided in an embodiment of the present invention, and this method can be applied at end In end, as shown in Fig. 2 this method comprises the following steps.

Step 201：When detecting identification instruction, it is determined that specifying the first spectrum sequence of the audio signal of user, the knowledge Do not instruct and be used to indicate whether the detection specified user is singing specified song；

Step 202：According to first spectrum sequence, the mutation time set of the audio signal is determined, the mutation time collection Conjunction includes multiple time points, a time point corresponding mutation spectrum value；

Step 203：The mutation time set for contrasting the audio signal specifies benchmark mutation time collection corresponding to song with this Similarity between conjunction, obtains comparing result, exports prompt message corresponding to the comparing result, and the prompt message is used to indicate this Whether specified user is singing.

In a kind of possible design, this determines the mutation time set of the audio signal according to first spectrum sequence, Including：

In a kind of possible design, the mutation time set for contrasting the audio signal specifies benchmark corresponding to song with this Similarity between mutation time set, obtains comparison result, exports prompt message corresponding to the comparison result, including：

When the comparison result is that the similarity is more than default similarity, determine that prompt message corresponding to the comparison result refers to Show that this specifies user singing, prompt message of the output indication specified user in performance；

When the comparison result is that the similarity presets similarity no more than this, prompting letter corresponding to the comparison result is determined This specifies user not singing for breath instruction, and the output indication specified user is not in the prompt message of performance.

In a kind of possible design, the mutation time set and the benchmark mutation time set of the determination audio signal Between similarity, including：

Determine the number of the match point in the benchmark mutation time set, match point be in the benchmark mutation time set with The time point of time Point matching in the mutation time set of the audio signal；

In a kind of possible design, the benchmark mutation time set includes multiple benchmark mutation time subclass, one Benchmark mutation time subclass is to that should specify benchmark audio sub-signals of song；

Similarity between the mutation time set and the benchmark mutation time set of the determination audio signal, including：

The similarity of every group of subclass is determined respectively；

According to the similarity of every group of subclass, mutation time set and the benchmark mutation time of the audio signal are determined Similarity between set.

In a kind of possible design, this method also includes：

When it is determined that this specifies user not singing, instruction message is sent to server, the instruction message is used to indicate this User is specified not sing, so that the server specifies user to carry out designated treatment this, the designated treatment includes：Remind this to refer to Determining the spectators user of user should specify user not singing and/or specifying user to punish this.

In a kind of possible design, first spectrum sequence be short-term spectrum sequence, in short-term log spectrum sequence or Any sequence in cepstrum sequence in short-term；The first spectrum sequence of the audio signal of user is specified in the determination, including：

When first spectrum sequence is short-term spectrum sequence, the audio signal for specifying user is gathered, the audio is believed Number framing, windowing process and Short Time Fourier Transform are carried out, the short-term spectrum sequence of the audio signal is obtained, by the short-term spectrum Sequence is as the first spectrum sequence；

When first spectrum sequence is log spectrum sequence in short-term, the audio signal for specifying user is gathered, to the sound Frequency signal carries out framing, windowing process and Short Time Fourier Transform, obtains the short-term spectrum sequence of the audio signal, to this in short-term Spectrum sequence carries out logarithmic transformation, obtains the log spectrum sequence in short-term, and using this, log spectrum sequence is believed as the audio in short-term Number the first spectrum sequence；

When first spectrum sequence is cepstrum sequence in short-term, the audio signal for specifying user is gathered, the audio is believed Number framing, windowing process and Short Time Fourier Transform are carried out, the short-term spectrum sequence of the audio signal is obtained, to the short-term spectrum Sequence carries out logarithmic transformation, obtains the log spectrum sequence in short-term, and to this, log spectrum sequence carries out inverse Fourier transform in short-term, The cepstrum sequence in short-term is obtained, using first spectrum sequence of the cepstrum sequence as the audio signal in short-term.

In a kind of possible design, the mutation time set of the contrast audio signal and this specify base corresponding to song Similarity between quasi- mutation time set, obtains comparing result, before exporting prompt message corresponding to the comparing result, the party Method includes：

Obtain the lyrics or the music score of Chinese operas of specifying song；

Obtain the timestamp of the formulation song, determine in the lyrics in each character or the music score of Chinese operas corresponding to each note when Between point；

Time point corresponding to each note in each character in the lyrics or the music score of Chinese operas is formed into this to specify corresponding to song Benchmark mutation time set.

In the embodiment of the present invention, when terminal detects identification instruction, first determine to specify the first of the audio signal of user Spectrum sequence, first spectrum sequence are short-term spectrum sequence, in short-term log spectrum sequence or cepstrum sequence in short-term, the identification Instruct and be used to indicate whether the detection specified user is singing specified song；Terminal determines the sound according to first spectrum sequence The mutation time set of frequency signal, the mutation time set include multiple time points, a time point corresponding mutation frequency spectrum Value；Similarity between benchmark mutation time set corresponding to the mutation time set of contrast audio signal and specified song, is obtained To comparing result, prompt message corresponding to comparing result is exported, prompt message is used to indicate to specify whether user is singing.Due to The first spectrum sequence based on audio signal determines mutation time set, passes through the mutation time set and the benchmark of specified song Mutation time set is identified, and current song has benchmark mutation time set, therefore, knowledge provided in an embodiment of the present invention Other method using relatively broad, improve the identification specify user whether performance method practicality.

Fig. 3 is a kind of acoustic signal processing method flow chart provided in an embodiment of the present invention, and this method can be applied at end In end, can also apply in the server, the embodiment of the present invention is not especially limited to this, the embodiment of the present invention only using terminal as Example illustrates, as shown in figure 3, this method comprises the following steps.

Step 301：When detecting identification instruction, terminal determines to specify the first spectrum sequence of the audio signal of user, The identification, which instructs, to be used to indicate whether the detection specified user is singing specified song.

In the embodiment of the present invention, terminal can show live button in the interface of the live application, and specified user can be with By triggering the live button, the direct broadcasting room of the performance song is opened, it is live so as to be carried out to spectators user.However, this is specified User in the direct broadcasting room of the performance song may and without sing, so as to waste enter the direct broadcasting room etc. it is to be watched Spectators user time, or, this is specified, and user is actual when singing, because the failure of the terminal for specifying user, network are believed The reason such as number poor so that in the direct broadcasting room display this specify user not singing, so as to cause the spectators user in direct broadcasting room It can not watch, poor Consumer's Experience is brought to specified user and spectators user.Therefore, in order to lift specified user and spectators The Consumer's Experience of user, terminal can identify whether the specified user in direct broadcasting room is singing.When terminal detection reach it is default During identification condition, terminal obtains identification instruction, and the identification, which instructs, to be detected this for instruction terminal and specify whether user refers in performance Determine song.

Wherein, the default identification condition can include but is not limited to：When the live button is triggered, or the direct broadcasting room When reaching certain time after being opened, or, this presets identification when either the recognition button in the terminal of spectators user is triggered Condition can also be that terminal detect when not having sound in the direct broadcasting room etc..Accordingly, terminal obtains the realization side of identification instruction Formula includes but is not limited to any mode in following (1)-(4)：

(1)：When terminal detects that the live button is triggered, terminal generation identification instruction.

(2)：When terminal detects current time and the difference of the opening time of the direct broadcasting room is more than preset time difference, terminal Generation identification instruction.

Wherein, the preset time difference can be needed to set and change according to user, and the embodiment of the present invention is not done specifically to this Limit.For example, the preset time difference can be 2 seconds, 6 seconds etc..

(3)：The identification instruction that terminal the reception server is sent, identification instruction detect current live for vlewer terminals Recognition button in interface is sent to server when being triggered.

Wherein, the recognition button in the live interface that spectators user can be by triggering direct broadcasting room, to trigger vlewer terminals Generation identification instruction, vlewer terminals send identification instruction to server, and server specifies the terminal of user to forward identification to this Instruction, this is specified the terminal of user to receive the identification of server forwarding and instructed.

It should be noted that the terminal in the embodiment of the present invention refers to the terminal for specifying user to use, vlewer terminals refer to Watch the terminal for specifying the spectators user of user to use.

(4)：When the audio signal that terminal is detected in the direct broadcasting room does not change in preset duration, terminal generation Identification instruction.

Wherein, when the direct broadcasting room is opened, terminal starts to detect the situation of change of the audio signal in the direct broadcasting room in real time, The preset duration can be needed to set and change according to user, and the embodiment of the present invention is not specifically limited to this.For example, this is default Duration can be 10 seconds, 6 seconds etc..

Wherein, the audio signal that terminal can specify user to be played in the direct broadcasting room of the performance song based on this is examined Survey, therefore, when terminal detects identification instruction, terminal determines to specify the first spectrum sequence of the audio signal of user, should First spectrum sequence is short-term spectrum sequence, in short-term log spectrum sequence or any sequence in cepstrum sequence in short-term, accordingly , terminal determines to specify the implementation of the first spectrum sequence of the audio signal of user can include following three kinds.

For the first implementation, when first spectrum sequence is short-term spectrum sequence, gathers this and specify user's Audio signal, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the frequency in short-term of the audio signal Spectral sequence, using the short-term spectrum sequence as the first spectrum sequence.

In the embodiment of the present invention, when the terminal detects identification instruction, terminal collection specifies user in the direct broadcasting room Audio signal, according to default frame length, the audio signal is divided into multiframe audio sub-signals, the length per frame audio sub-signals Frame length is preset for this；Meanwhile in order to prevent spectrum leakage, for every frame audio sub-signals, terminal is by presetting window function, to this Frame audio sub-signals carry out windowing process, obtain the multiframe audio sub-signals after windowing process；It is every after terminal-pair windowing process Frame audio sub-signals carry out Short Time Fourier Transform, multiframe short-term spectrum signal are obtained, by the frequency of the multiframe short-term spectrum signal Spectrum forms the first spectrum sequence.

Wherein, the default frame length and default window function can need to set and change, the embodiment of the present invention pair according to user This is not especially limited.For example, the default frame length can be 25 milliseconds, 30 milliseconds etc..The default window function can be Hanning window Function, Hamming window function etc..

For second of implementation, when first spectrum sequence is log spectrum sequence in short-term, gathers this and specify use The audio signal at family, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the short of the audio signal Time-frequency spectral sequence, to the short-term spectrum sequence carry out logarithmic transformation, obtain the log spectrum sequence in short-term, by this in short-term logarithm frequency First spectrum sequence of the spectral sequence as the audio signal.

Wherein, terminal obtains this in short-term after the general sequence of frequency, for every frame short-term spectrum signal in the short-term spectrum sequence, Terminal will carry out logarithmic transformation per frame short-term spectrum signal, obtain every frame log spectrum signal in short-term, will per frame logarithm frequency in short-term The spectrum value of spectrum signal forms the first spectrum sequence.Terminal obtains the mode of short-term spectrum sequence with the first above-mentioned implementation In the mode that is related to it is consistent, no longer repeat one by one herein.

It should be noted that log spectrum signal is more smooth in short-term, sound in the audio signal can be preferably embodied The details fluctuation of sound, so that first spectrum sequence actual sound corresponding with the audio signal is more pressed close to, is improved Identification specify user whether singing accuracy rate.

For the third implementation, when first spectrum sequence is cepstrum sequence in short-term, gathers this and specify user's Audio signal, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the frequency in short-term of the audio signal Spectral sequence, logarithmic transformation is carried out to the short-term spectrum sequence, obtains the log spectrum sequence in short-term, to the log spectrum sequence in short-term Row carry out inverse Fourier transform, obtain the cepstrum sequence in short-term, using this in short-term cepstrum sequence as the audio signal first frequency Spectral sequence.

Wherein, log spectrum sequence in short-term is obtained based on the method in above-mentioned second of implementation, terminal, for every frame Log spectrum signal in short-term, will per frame in short-term log spectrum signal carry out inverse Fourier transform, obtain every frame cepstrum signal in short-term, By every frame in short-term cepstrum signal spectrum value form the first spectrum sequence.

It should be noted that the interference free performance of cepstrum signal is preferable in short-term, therefore, terminal will per frame cepstrum signal in short-term Spectrum value form the first spectrum sequence, so as to effectively reduce the interference that brings of environmental factor, improve and determine first frequency spectrum The accuracy of sequence.

In the embodiment of the present invention, because the sonification model of people is when sending different characters, the audio signal that people sends exists Spectrum value on frequency domain can be more sane the rhythm characteristic for embodying the audio signal, compared to the sound of the audio signal Feature, first spectrum sequences of the audio signal such as high, generation density more can accurately embody the sonification model of voice Send change during kinds of characters, so as to substantially increase identification specify user whether performance accuracy rate.

Step 302：Terminal determines the mutation time set of the audio signal according to first spectrum sequence, during the mutation Between set include multiple time points, corresponding one an of time point is mutated spectrum value.

In the embodiment of the present invention, due to specifying user when singing specified song, specified user can specify song according to this The purpose lyrics or the music score of Chinese operas are sung, and this specifies the lyrics of song to include multiple characters, and the character can be Chinese character, English word Or the word of other any languages；This specifies the music score of Chinese operas of song to include multiple notes, and the note can be numbered musical notation note or five Line spectrum note etc., spectrum value corresponding to different character or note also differ, that is, continuously multiple characters or note pair when singing Difference between the multiple spectrum values answered is also larger, and multiple time points that the mutation time set includes are in the lyrics of song Different characters corresponding time point in the total duration of the song.Therefore, after terminal obtains the first spectrum sequence, first obtaining should The mutation spectrum value being had differences in first spectrum sequence, according to the plurality of mutation spectrum value, determine the mutation of the audio signal Time gathers.

This step can be realized by following steps 3021-3022.

Step 3021：Each spectrum value of the terminal in first spectrum sequence, determine between two neighboring spectrum value Diversity factor.

In this step, for every two neighboring spectrum value in first spectrum sequence, terminal is calculated by preset algorithm Diversity factor between the two neighboring spectrum value, and the corresponding relation between the two neighboring spectrum value and the diversity factor is stored, In order to which SS later is according to the diversity factor, spectrum value corresponding to diversity factor is searched from the corresponding relation.Wherein, the pre- imputation Method can be needed to set and change according to user, and the embodiment of the present invention is not especially limited to this.For example, the preset algorithm can be with To calculate algorithm of the algorithm of variance, the algorithm of calculating difference, the algorithm of calculating Euclidean distance or calculating COS distance etc., phase Answer, variance, difference, Euclidean distance or the COS distance that the diversity factor can be between the spectrum value and adjacent frequency spectrum Deng.

Step 3022：When the diversity factor between two neighboring spectrum value is more than default diversity factor, terminal will be greater than presetting A constitutive mutation time at time point set in time point corresponding to the two neighboring spectrum value of diversity factor.

In this step, terminal is obtained multiple diversity factoies, selected from the plurality of diversity factor big by above-mentioned steps 3021 In the diversity factor of default diversity factor, from the corresponding relation of diversity factor and two neighboring spectrum value, multiple differences of the selection are searched Two neighboring spectrum value corresponding to different degree.Terminal obtains the time corresponding to previous spectrum value in the two neighboring spectrum value searched Point, the constitutive mutation time at time point of the acquisition is gathered；Or terminal obtains the latter in the two neighboring spectrum value searched At time point corresponding to spectrum value, the constitutive mutation time at time point of the acquisition is gathered.

It should be noted that because terminal is according to default frame length, one section of audio signal gathered is divided into multiple sounds Frequency subsignal, therefore, each audio sub-signals can to should be in the total duration of section audio signal a time point, the time Point can be the audio sub-signals corresponding start time point for playing the period, termination time point in the total duration of audio signal Or middle time point etc., for example, the audio signal that certain section of total duration is 100 seconds includes 300 audio sub-signals, Mei Geyin altogether The frame length of frequency subsignal is 20 milliseconds, then the 30th audio sub-signals corresponding broadcasting period in the total duration of the audio signal For：40 milliseconds~the 10th second 9th second, the 30th time point corresponding to audio from signal can be start time point, i.e., the 9th second 40 milliseconds, or terminate time point, i.e., the 10th second, or middle time point, i.e., the 9th second 50 milliseconds.Each spectrum value be based on What each audio sub-signals after division obtained, the corresponding audio sub-signals of a spectrum value, then corresponding to each spectrum value Time point is time point corresponding to the audio sub-signals, and therefore, terminal should by time point composition corresponding to the spectrum value of acquisition The step of mutation time set can be：Terminal searches audio sub-signals corresponding to the spectrum value each obtained, by what is found Constitutive mutation time at time point set corresponding to each audio sub-signals.

In the embodiment of the present invention, when specified user, which sings, specifies song, terminal can specify this audio of user to believe Mutation time set corresponding to number, benchmark mutation time set corresponding with specified song are contrasted, and pass through following steps 303, identify that this specifies whether user is singing.

Therefore, before being identified, terminal also needs first to obtain benchmark mutation time set corresponding to the specified song, this Step can be：Terminal obtains the lyrics or the music score of Chinese operas of specifying song；The timestamp of the formulation song is obtained, is determined in the lyrics Time point in each character or the music score of Chinese operas corresponding to each note；Each note institute in each character in the lyrics or the music score of Chinese operas is right The time point answered forms this and specifies benchmark mutation time set corresponding to song.

Wherein, this specifies song to generally correspond to the lyrics or the music score of Chinese operas, and it is nominally to drill that this, which specifies the timestamp of song, When singing the specified song, in the lyrics of the specified song in time point, and the music score of Chinese operas of the specified song corresponding to each character At time point corresponding to each note, in general, this can be specified to the original song of song as a reference song, should Time point corresponding to each character or note can be the time point of character or note in the original song of the specified song.Typically For, each time point corresponding to character in the lyrics and the music score of Chinese operas, and the lyrics of most of songs has been included in network at present With the time point of each note in the music score of Chinese operas, therefore, the time point based on each character or note obtains the base for specifying song Quasi- mutation time set, and based on the benchmark mutation time set to specifying user to be identified, so that the present invention is implemented The application of example is more extensive, substantially increases the practicality of the embodiment of the present invention.

It should be noted that when terminal detects that the live button is triggered, can show in the current interface of terminal Show input frame, the input frame is used to indicate the song mark for specifying user to input specified song to be sung；Specified user can be with According to the instruction of the input frame, the song mark of specified song to be sung is inputted in the input frame, during end of input, pass through Triggering ACK button has been terminated with confirming to input.

Wherein, when terminal detects identification instruction, just start, to specifying user to be identified, to be obtained by terminal in step 3011 At least four implementations for taking identification to instruct understand that terminal can be identified when specified user starts live, now, eventually When end detects that the live button is triggered, that is, obtain the lyrics or the music score of Chinese operas of specifying song, i.e., following first way.Separately Outside, terminal can also open direct broadcasting room and then be identified, and now, terminal need to combine current time, it is determined that specifying song The lyrics or the music score of Chinese operas, i.e., the following second way.

For first way.When terminal obtain specify song the lyrics when, the acquisition specify song the lyrics the step of Can be：When terminal detects that ACK button is triggered, the song mark in the input frame is obtained, can be deposited in advance in terminal Storage specifies number the corresponding relation between the song mark of a song and the lyrics, and terminal identifies according to the song, searches local deposit In the song mark of storage and the corresponding relation of the lyrics song song mark is specified with the presence or absence of this.When it is present, terminal is right from this The lyrics corresponding to middle acquisition song mark should be related to.When in the absence of when, terminal to server send obtain request so that service Device sends the lyrics corresponding to song mark to terminal, and terminal receives the lyrics., should when terminal obtains the music score of Chinese operas for specifying song It is consistent with the mode of the above-mentioned acquisition lyrics to obtain the implementation for the music score of Chinese operas for specifying song, no longer repeats one by one herein.

For the second way.When terminal obtain specify song the lyrics when, the acquisition specify song the lyrics the step of Can be：When terminal detects identification instruction, the song mark for specifying song is obtained, terminal is identified according to the song, obtained The lyrics corresponding to song mark, when terminal acquisition detects identification instruction, opened corresponding to the acquisition current time direct broadcasting room Duration is opened, determines that the opening time corresponding time point, terminal in the total duration for specifying song were obtained after the time point This specify song corresponding to the lyrics.

When terminal obtains the music score of Chinese operas for specifying song, the acquisition specifies the implementation of the music score of Chinese operas of song to be sung with above-mentioned acquisition The mode of word is consistent, no longer repeats one by one herein.

Step 303：Corresponding to mutation time set of the terminal-pair than the audio signal and the specified song during benchmark mutation Between gather between similarity, obtain comparing result, export prompt message corresponding to the comparing result.

In the embodiment of the present invention, the prompt message is used to indicate that this specifies whether user is singing, and terminal can be based on should Similarity between mutation time set and the benchmark mutation time set is identified.Accordingly, this step can by with Lower step 3031-3032 is realized.

Step 3031：Terminal determines the phase between the mutation time set of the audio signal and the benchmark mutation time set Like degree.

In this step, matching that terminal can be directly based upon in the benchmark mutation time set, which is counted out, to be determined, i.e., Following first way.Or terminal is also based on multiple benchmark mutation time subsets that benchmark mutation time set includes Close, and multiple mutation time subclass that mutation time set includes are determined, i.e., the following second way.

For first way, this step can be realized by following steps a.

Step a：Terminal determines the number of the match point in the benchmark mutation time set, is dashed forward according to the number and the benchmark The total number of the time become in time set, determines the similarity.

Wherein, on the basis of the match point in mutation time set with the time point in the mutation time set of the audio signal The time point of matching.For each time point in the benchmark mutation time set, terminal is selected from mutation time set And the time point of the difference at the time point in the benchmark mutation time set within a preset range, by the time point of selection be defined as with The time point that time point in the benchmark mutation time set matches.Terminal obtains time point in the benchmark mutation time set Total number, and the number of match point in the benchmark mutation time set is calculated, by the number of the match point divided by the time point Total number obtained by business as the similarity.

Wherein, the preset range can need to set and change according to user, and the embodiment of the present invention does not do specific limit to this It is fixed.For example, the preset range can be (- 0.1s, 0.1s), if the sometime point in the benchmark mutation time set is 3.10 seconds, it was 3.09 seconds to deserve existence time point in mutation time set, and the difference at two time points is in the preset range, then Exist and 3.10 seconds time points matched, as 3.09 seconds in mutation time set.

For the second way, this step can be realized by following steps b-c.

Step b：The mutation time set of the audio signal is divided into multiple mutation time subclass by terminal；Terminal determines Multigroup subclass, when one group of subclass includes benchmark mutation time subclass corresponding to same benchmark audio sub-signals and mutation Between subclass.

In the embodiment of the present invention, when nominally singing the specified song, the specified song frequently includes more songs Word, every lyrics include multiple continuous characters, and the multiple continuous characters that can include a lyrics form a character String, or multiple continuous notes corresponding to a lyrics are formed into a note string, the i.e. corresponding character string of a lyrics Or a note string, in order to improve the accuracy for calculating the similarity between mutation time set and benchmark mutation time set, Multiple character strings or note string corresponding to the specified song can be based on, multiple benchmark are divided into by the benchmark mutation time set Mutation time subclass, mutation time set is divided into multiple mutation time subclass, based on each benchmark mutation time Similarity between set and corresponding mutation time subclass, determines mutation time set and benchmark mutation time set Between similarity.

Wherein, the benchmark mutation time set includes multiple benchmark mutation time subclass, benchmark mutation time Gather a benchmark audio sub-signals to song should be specified, the benchmark audio sub-signals can be that this specifies in song one Character string or a note the string corresponding benchmark audio sub-signals in baseline audio signal.Therefore, terminal can be more according to this The start time point of individual benchmark mutation time subclass, the mutation time subclass is divided into multiple mutation time subclass, Each the start time point of the corresponding benchmark mutation time subclass of the start time point of mutation time subclass is identical.So Afterwards, the mutation time subclass and corresponding fiducial time subclass are defined as one group of subclass by terminal, are obtained multigroup Subclass.

In a kind of possible design, each time point and phase that terminal directly can also include according to mutation time set Time difference between adjacent time point, mutation time set is divided, the step can be：For in the mutation time set At each time point, calculate the time difference between the time point and adjacent time point；So as to obtain the time corresponding to multiple time points Difference, the time difference more than predetermined threshold value is selected from the plurality of time difference, time point corresponding to the time difference of the selection is made For the sliced time of mutation time set, according to the sliced time, the mutation time set is divided into multiple mutation time Set.Terminal is according to the sliced time, it is determined that the period corresponding to each mutation time subclass.For benchmark mutation time collection Close, according to the initial time of each benchmark mutation time subclass, determine the period corresponding to the benchmark mutation time subclass. For each mutation time subclass, according to the period of the mutation time subclass, searched from benchmark mutation time set Period and the period degree of overlapping highest benchmark mutation time subclass of the mutation time subclass；By the degree of overlapping highest Benchmark mutation time subclass and the mutation time subclass be defined as one group of set, so as to obtain multigroup subclass.

Step c：Terminal determines the similarity of every group of subclass respectively, according to the similarity of every group of subclass, it is determined that should Similarity between the mutation time set of audio signal and the benchmark mutation time set.

In this step, for every group of subclass, the benchmark mutation time subclass includes multiple characters in a character string Different time points corresponding to multiple notes in corresponding different time points, or a note string, for benchmark mutation time Each time point in set, terminal are searched and the benchmark mutation time subset from the mutation time subclass of this group of subclass The time point of the difference at the time point in conjunction within a preset range, by the mutation time subclass with benchmark mutation time subclass Difference time point within a preset range at time point be defined as matching with the time point in the benchmark mutation time subclass Time point.Terminal obtains the number at time point in the benchmark mutation time subclass, and calculates the benchmark mutation time subset The number of match point in conjunction, by the business obtained by the number at time point in the number of the match point divided by the mutation time subclass Similarity as this group of subclass.

Terminal calculates the similarity of every group of subclass successively, obtains the similarity of multigroup subclass, whole for every group of set End calculates the product of the weight of this group of subclass and the similarity of this group of subclass, so as to obtain multiple products, every group of subclass A corresponding product, terminal-pair more products are summed, and the Weighted Similarity after summation is defined as into the prominent of the audio signal The similarity become between time set and the benchmark mutation time set.

Step 3032：When the comparison result is that the similarity is more than default similarity, terminal determines the comparison result pair The prompt message answered indicates that this specifies user singing, prompt message of the output indication specified user in performance；When the comparison During as a result for the similarity no more than the default similarity, terminal determines that prompt message corresponding to the comparison result indicates that this is specified User is not singing, and the output indication specified user is not in the prompt message of performance.

In the embodiment of the present invention, when the similarity is more than default similarity, illustrate that this specifies the mutation time collection of user Close and specified song the set of benchmark mutation time more coincide, terminal determine this specify user singing, when the similarity not During more than default similarity, illustrate that this is specified and had differences between the mutation time set of user and specified song, that is, specify and use Do not singing at family.

Further, when specified user does not sing, terminal can also be sent by following steps 304 to server Prompting specifies user without the notification message sung.

Step 304：When prompt message indicates that terminal to server sends a notification message, should when specifying user not singing Notification message is used to notify the server specified user not singing, so that the server specifies user to carry out specifying place this Reason, the designated treatment include：This is reminded to specify the spectators user of user that user should be specified not sing and/or specifying user to this Punished.

In the embodiment of the present invention, terminal detects specified user not when singing, terminal generation notification message, and to clothes Business device sends the notification message, the notification message that server receiving terminal is sent, is sent out to the vlewer terminals for specifying user are watched The notification message is given, vlewer terminals receive and the notification message is shown in direct broadcasting room, so as to remind spectators user's direct broadcasting room Specified user do not singing.

Wherein, server is also based on the user behavior that this specifies user not sung but into the direct broadcasting room of live application Punished, for example, deducting respective resources numerical value in specifying the destiny account of user from this, or specify user to send police to this Accuse message etc..Wherein, resource numerical value can be the gold coin, game money or the amount of thumbing up of specified user's acquisition of the destiny account Deng.

It should be noted that the above method can also be performed by server in the embodiment of the present invention, i.e. pass through server To identify that this specifies whether user is singing.The process can be：When specified user carries out live in live application, specify and use The terminal at family in real time sends the audio signal of recording the audio signal sent to server, server real-time reception terminal.When When detecting identification instruction, server determines to specify the first spectrum sequence of the audio signal of user, according to the first frequency spectrum sequence Row, determine the mutation time set of the audio signal.Server is according to the mutation time set of the audio signal and the specified song Benchmark mutation time set corresponding to mesh, determine that this specifies whether user is singing.When it is determined that this specifies user not singing, Server sends a notification message to the terminal of the spectators user of the direct broadcasting room, so as to remind this to specify the spectators user of user to refer to Determine user do not singing, and server sends a notification message to the terminal of specified user, so as to specify user to punish this Penalize.Wherein, the implementation being identified by server, it is similar with the implementation being identified by terminal, herein not Repeat one by one again.

In the embodiment of the present invention, when terminal detects identification instruction, first determine to specify the first of the audio signal of user Spectrum sequence, the identification, which instructs, to be used to indicate whether the detection specified user is singing specified song；Terminal is according to first frequency Spectral sequence, the mutation time set of the audio signal is determined, the mutation time set includes multiple time points, a time point pair Answer a mutation spectrum value；Contrast audio signal mutation time set and specified song corresponding to the set of benchmark mutation time it Between similarity, obtain comparing result, export prompt message corresponding to comparing result, prompt message is used to indicating specifying user to be It is no to sing.Because the first spectrum sequence based on audio signal determines mutation time set, by the mutation time set and The benchmark mutation time set of song is specified to be identified, current song has benchmark mutation time set, therefore, the present invention The recognition methods that embodiment provides using relatively broad, improve the identification specify user whether performance practicality.

Fig. 4 is a kind of structural representation of audio signal processor provided in an embodiment of the present invention, and the device can answer With in the terminal, as shown in figure 4, the device includes：

First determining module 401, for when detect identification instruction when, it is determined that specify user audio signal first frequency Spectral sequence, the identification, which instructs, to be used to indicate whether the detection specified user is singing specified song；

Second determining module 402, for according to first spectrum sequence, determining the mutation time set of the audio signal, The mutation time set includes multiple time points, a time point corresponding mutation spectrum value；

Output module 403, the mutation time set for contrasting the audio signal specify benchmark corresponding to song to dash forward with this The similarity become between time set, obtains comparing result, exports prompt message corresponding to the comparing result, and the prompt message is used In instruction, this specifies whether user is singing.

In a kind of possible design, second determining module 402, including：

First determining unit, for each spectrum value in first spectrum sequence, determine two neighboring spectrum value Between diversity factor；

In a kind of possible design, the output module 403, be additionally operable to determine the audio signal mutation time set and Similarity between the benchmark mutation time set；When the comparison result is that the similarity is more than default similarity, it is determined that should Prompt message corresponding to comparison result indicates that this specifies user singing, and the output indication specified user believes in the prompting of performance Breath；When the comparison result is that the similarity presets similarity no more than this, determine that prompt message corresponding to the comparison result refers to Show that this specifies user not singing, the output indication specified user is not in the prompt message of performance.

In a kind of possible design, the output module 403, it is additionally operable to determine the matching in the benchmark mutation time set Point number, match point be the benchmark mutation time set in the time Point matching in the mutation time set of the audio signal Time point；The total number of time in the number and the benchmark mutation time set, determines the similarity.

The output module 403, including：

Second determining unit, it is additionally operable to determine the similarity of every group of subclass respectively；According to the phase of every group of subclass Like degree, the similarity between the mutation time set of the audio signal and the benchmark mutation time set is determined.

In a kind of possible design, the device also includes：

Sending module, for indicating that this specifies user not singing constantly when the prompt message, send and notify to server Message, the notification message is used to notify the server specified user not singing, so that the server specifies user to enter this Row designated treatment, the designated treatment include：This is reminded to specify the spectators user of user to specify user not in performance and/or to this Specified user is punished.

In a kind of possible design, first spectrum sequence be short-term spectrum sequence, in short-term log spectrum sequence or Any sequence in cepstrum sequence in short-term；

First determining module 401, it is additionally operable to when first spectrum sequence is short-term spectrum sequence, gathers this and specify use The audio signal at family, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the short of the audio signal Time-frequency spectral sequence, using the short-term spectrum sequence as the first spectrum sequence；

First determining module 401, it is additionally operable to, when first spectrum sequence is log spectrum sequence in short-term, gather this and refer to Determine the audio signal of user, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio signal Short-term spectrum sequence, to the short-term spectrum sequence carry out logarithmic transformation, obtain the log spectrum sequence in short-term, this is right in short-term First spectrum sequence of the number spectrum sequence as the audio signal；

First determining module 401, it is additionally operable to when first spectrum sequence is cepstrum sequence in short-term, gathers this and specify use The audio signal at family, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the short of the audio signal Time-frequency spectral sequence, to the short-term spectrum sequence carry out logarithmic transformation, obtain the log spectrum sequence in short-term, to this in short-term logarithm frequency Spectral sequence carries out inverse Fourier transform, obtains the cepstrum sequence in short-term, using this in short-term cepstrum sequence as the audio signal the One spectrum sequence.

In a kind of possible design, the device includes：

Acquisition module, for obtaining the lyrics or the music score of Chinese operas of specifying song；

3rd determining module, for obtaining the timestamp of the formulation song, determine in the lyrics in each character or the music score of Chinese operas Time point corresponding to each note；

Comprising modules, form this for the time point corresponding to each note in each character in the lyrics or the music score of Chinese operas and specify Benchmark mutation time set corresponding to song.

It should be noted that：Above-described embodiment provide audio signal processor in Audio Signal Processing, only more than The division progress of each functional module is stated for example, in practical application, can be as needed and by above-mentioned function distribution by difference Functional module complete, i.e., the internal structure of device is divided into different functional modules, with complete it is described above whole or Person's partial function.In addition, the audio signal processor that above-described embodiment provides belongs to acoustic signal processing method embodiment Same design, its specific implementation process refer to embodiment of the method, repeated no more here.

Fig. 5 is a kind of structural representation of audio signal processor provided in an embodiment of the present invention.The device can be used The function performed by terminal in the acoustic signal processing method shown by implementation above-described embodiment.Specifically：

Terminal 500 can include RF (Radio Frequency, radio frequency) circuit 510, include one or more meters The memory 520 of calculation machine readable storage medium storing program for executing, input block 530, display unit 540, sensor 550, voicefrequency circuit 560, biography Defeated module 570, include the part such as one or the processor 580 of more than one processing core and power supply 590.This area Technical staff is appreciated that the restriction of the terminal structure shown in Fig. 5 not structure paired terminal, can include than illustrate it is more or Less part, either combine some parts or different parts arrangement.Wherein：

RF circuits 510 can be used for receive and send messages or communication process in, the reception and transmission of signal, especially, by base station After downlink information receives, transfer to one or more than one processor 580 is handled；In addition, it is sent to up data are related to Base station.Generally, RF circuits 510 include but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, use Family identity module (SIM) card, transceiver, coupler, LNA (Low Noise Amplifier, low-noise amplifier), duplex Device etc..In addition, RF circuits 510 can also be communicated by radio communication with network and other-end.The radio communication can make With any communication standard or agreement, and including but not limited to GSM (Global System of Mobile communication, entirely Ball mobile communcations system), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple Access, WCDMA), LTE (Long Term Evolution, Long Term Evolution), Email, SMS (Short Messaging Service, Short Message Service) etc..

Memory 520 can be used for storage software program and module, the terminal institute as shown by above-mentioned exemplary embodiment Corresponding software program and module, processor 580 are stored in the software program and module of memory 520 by operation, from And various function application and data processing are performed, such as realize the interaction based on video.Memory 520 can mainly include storage Program area and storage data field, wherein, storing program area can storage program area, the application program needed at least one function (such as sound-playing function, image player function etc.) etc.；Storage data field can store uses what is created according to terminal 500 Data (such as voice data, phone directory etc.) etc.., can be with addition, memory 520 can include high-speed random access memory Including nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-states Part.Correspondingly, memory 520 can also include Memory Controller, to provide processor 580 and input block 530 to storage The access of device 520.

Input block 530 can be used for the numeral or character information for receiving input, and generation is set with user and function Control relevant keyboard, mouse, action bars, optics or the input of trace ball signal.Specifically, input block 530 may include to touch Sensitive surfaces 531 and other input terminals 532.Touch sensitive surface 531, also referred to as touch display screen or Trackpad, collect and use Family on or near it touch operation (such as user using any suitable object or annex such as finger, stylus in touch-sensitive table Operation on face 531 or near touch sensitive surface 531), and corresponding linked set is driven according to formula set in advance.It is optional , touch sensitive surface 531 may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus detection is used The touch orientation at family, and the signal that touch operation is brought is detected, transmit a signal to touch controller；Touch controller is from touch Touch information is received in detection means, and is converted into contact coordinate, then gives processor 580, and can reception processing device 580 The order sent simultaneously is performed.Furthermore, it is possible to using polytypes such as resistance-type, condenser type, infrared ray and surface acoustic waves Realize touch sensitive surface 531.Except touch sensitive surface 531, input block 530 can also include other input terminals 532.Specifically, Other input terminals 532 can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.), One or more in trace ball, mouse, action bars etc..

Display unit 540 can be used for display by the information of user's input or be supplied to the information and terminal 500 of user Various graphical user interface, these graphical user interface can be made up of figure, text, icon, video and its any combination. Display unit 540 may include display panel 541, optionally, can use LCD (Liquid Crystal Display, liquid crystal Show device), the form such as OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) configure display panel 551.Further, touch sensitive surface 531 can cover display panel 541, when touch sensitive surface 531 detects touching on or near it After touching operation, processor 580 is sent to determine the type of touch event, is followed by subsequent processing type of the device 580 according to touch event Corresponding visual output is provided on display panel 541.Although in Figure 5, touch sensitive surface 531 and display panel 541 are conducts Two independent parts come realize input and input function, but in some embodiments it is possible to by touch sensitive surface 531 with display Panel 541 is integrated and realizes input and output function.

Terminal 500 may also include at least one sensor 550, such as optical sensor, motion sensor and other sensings Device.Specifically, optical sensor may include ambient light sensor and proximity transducer, wherein, ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display panel 541, and proximity transducer can close display when terminal 500 is moved in one's ear Panel 541 and/or backlight.As one kind of motion sensor, gravity accelerometer can detect in all directions (generally Three axles) acceleration size, size and the direction of gravity are can detect that when static, available for identification mobile phone posture application (ratio Such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap)；Extremely The other sensors such as the gyroscope that can also configure in terminal 500, barometer, hygrometer, thermometer, infrared ray sensor, herein Repeat no more.

Voicefrequency circuit 560, loudspeaker 561, microphone 562 can provide the COBBAIF between user and terminal 500.Audio Electric signal after the voice data received conversion can be transferred to loudspeaker 561, sound is converted to by loudspeaker 561 by circuit 560 Sound signal exports；On the other hand, the voice signal of collection is converted to electric signal by microphone 562, after being received by voicefrequency circuit 560 Voice data is converted to, then after voice data output processor 580 is handled, through RF circuits 510 to be sent to such as another end End, or voice data is exported to memory 520 further to handle.Voicefrequency circuit 560 is also possible that earphone jack, To provide the communication of peripheral hardware earphone and terminal 500.

Terminal 500 can help user to send and receive e-mail, browse webpage and access streaming video by transport module 570 Deng it has provided the user broadband internet wirelessly or non-wirelessly and accessed., can be with although Fig. 5 shows transport module 570 Understand, it is simultaneously not belonging to must be configured into for terminal 500, can not change the essential scope of invention as needed completely It is interior and omit.

Processor 580 is the control centre of terminal 500, and each portion of whole mobile phone is linked using various interfaces and circuit Point, by running or performing the software program and/or module that are stored in memory 520, and call and be stored in memory 520 Interior data, the various functions and processing data of terminal 500 are performed, so as to carry out integral monitoring to mobile phone.Optionally, processor 580 may include one or more processing cores；Preferably, processor 580 can integrate application processor and modem processor, Wherein, application processor mainly handles operating system, user interface and application program etc., and modem processor mainly handles nothing Line communicates.It is understood that above-mentioned modem processor can not also be integrated into processor 580.

Terminal 500 also includes the power supply 590 (such as battery) to all parts power supply, it is preferred that power supply can pass through electricity Management system and processor 580 are logically contiguous, so as to realize management charging, electric discharge and power consumption by power-supply management system The functions such as management.Power supply 590 can also include one or more direct current or AC power, recharging system, power supply event The random component such as barrier detection circuit, power supply changeover device or inverter, power supply status indicator.

Although being not shown, terminal 500 can also include camera, bluetooth module etc., will not be repeated here.Specifically in this reality Apply in example, the display unit of terminal 500 is touch-screen display, and terminal 500 also includes memory, and one or one More than program, one of them or more than one program storage in memory, and be configured to by one or one with Upper computing device said one or more than one program bag, which contain, is used to implement the performed operation of terminal in above-described embodiment Instruction.

In the exemplary embodiment, a kind of computer-readable recording medium for being stored with computer program, example are additionally provided The memory of computer program is such as stored with, above computer program realizes the audio in above-described embodiment when being executed by processor Signal processing method.For example, the computer-readable recording medium can be read-only memory (Read-Only Memory, ROM), Random access memory (Random Access Memory, RAM), read-only optical disc (Compact Disc Read-Only Memory, CD-ROM), tape, floppy disk and optical data storage devices etc..

One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment To complete, by program the hardware of correlation can also be instructed to complete, described program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..

The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims

1. a kind of acoustic signal processing method, it is characterised in that methods described includes：

When detecting identification instruction, it is determined that specifying the first spectrum sequence of the audio signal of user, the identification instruction is used for Instruction detects whether the specified user is singing specified song；

According to first spectrum sequence, the mutation time set of the audio signal is determined, the mutation time set includes Multiple time points, a time point corresponding mutation spectrum value；

Contrast the audio signal mutation time set and the specified song corresponding between benchmark mutation time set Similarity, comparing result is obtained, export prompt message corresponding to the comparing result, the prompt message is used to indicate the finger Determine whether user is singing.

2. according to the method for claim 1, it is characterised in that it is described according to first spectrum sequence, determine the sound The mutation time set of frequency signal, including：

When the diversity factor between two neighboring spectrum value is more than default diversity factor, the two neighboring frequency of default diversity factor will be greater than A constitutive mutation time at time point set in time point corresponding to spectrum.

3. according to the method for claim 1, it is characterised in that the mutation time set of the contrast audio signal and Similarity between benchmark mutation time set corresponding to the specified song, obtains comparison result, exports the comparison result Corresponding prompt message, including：

When the comparison result is that the similarity is more than default similarity, prompt message corresponding to the comparison result is determined Indicate that the specified user is singing, user is specified described in output indication in the prompt message of performance；

When the comparison result is that the similarity is not more than the default similarity, determine to carry corresponding to the comparison result Show that information indicates that the specified user is not singing, user is specified described in output indication not in the prompt message of performance.

4. according to the method for claim 3, it is characterised in that the mutation time set for determining the audio signal and Similarity between the benchmark mutation time set, including：

5. according to the method for claim 3, it is characterised in that the benchmark mutation time set is mutated including multiple benchmark Chronon set, a benchmark mutation time subclass correspond to a benchmark audio sub-signals of the specified song；

Similarity between the mutation time set for determining the audio signal and the benchmark mutation time set, bag Include：

Multigroup subclass is determined, one group of subclass includes benchmark mutation time subclass corresponding to same benchmark audio sub-signals With mutation time subclass；

The similarity of every group of subclass is determined respectively；

According to the similarity of every group of subclass, when determining that the mutation time set of the audio signal and the benchmark are mutated Between gather between similarity.

6. according to the method for claim 1, it is characterised in that first spectrum sequence is short-term spectrum sequence, in short-term Any sequence in log spectrum sequence or in short-term cepstrum sequence；First frequency spectrum for determining to specify the audio signal of user Sequence, including：

When first spectrum sequence is short-term spectrum sequence, the audio signal of the specified user is gathered, to the audio Signal carries out framing, windowing process and Short Time Fourier Transform, obtains the short-term spectrum sequence of the audio signal, will be described short Time-frequency spectral sequence is as the first spectrum sequence；

When first spectrum sequence is log spectrum sequence in short-term, the audio signal of the specified user is gathered, to described Audio signal carries out framing, windowing process and Short Time Fourier Transform, the short-term spectrum sequence of the audio signal is obtained, to institute State short-term spectrum sequence and carry out logarithmic transformation, obtain the log spectrum sequence in short-term, log spectrum sequence is made in short-term by described in For the first spectrum sequence of the audio signal；

When first spectrum sequence is cepstrum sequence in short-term, the audio signal of the specified user is gathered, to the audio Signal carries out framing, windowing process and Short Time Fourier Transform, the short-term spectrum sequence of the audio signal is obtained, to described short Time-frequency spectral sequence carries out logarithmic transformation, obtains the log spectrum sequence in short-term, the sequence of log spectrum in short-term is carried out inverse Fourier transformation, obtain the cepstrum sequence in short-term, the first frequency spectrum using the sequence of cepstrum in short-term as the audio signal Sequence.

7. according to the method for claim 1, it is characterised in that the mutation time set of the contrast audio signal and Similarity between benchmark mutation time set corresponding to the specified song, obtains comparing result, exports the comparing result Before corresponding prompt message, methods described includes：

Obtain the lyrics or the music score of Chinese operas of the specified song；

Time point corresponding to each note in each character in the lyrics or the music score of Chinese operas is formed corresponding to the specified song Benchmark mutation time set.

8. a kind of audio signal processor, it is characterised in that described device includes：

First determining module, for when detect identification instruction when, it is determined that specify user audio signal the first spectrum sequence, The identification, which instructs, to be used to indicate to detect whether the specified user is singing specified song；

Second determining module, it is described for according to first spectrum sequence, determining the mutation time set of the audio signal Mutation time set includes multiple time points, a time point corresponding mutation spectrum value；

Output module, for contrast the audio signal mutation time set and the specified song corresponding to benchmark mutation when Between gather between similarity, obtain comparing result, export prompt message corresponding to the comparing result, the prompt message is used In instruction, whether the specified user is singing.

9. device according to claim 8, it is characterised in that second determining module, including：

First determining unit, for each spectrum value in first spectrum sequence, determine two neighboring spectrum value it Between diversity factor；

Component units, for when the diversity factor between two neighboring spectrum value is more than default diversity factor, will be greater than default difference A constitutive mutation time at time point set in time point corresponding to the two neighboring spectrum value of degree.

10. device according to claim 8, it is characterised in that

The output module, be additionally operable to determine the audio signal mutation time set and the benchmark mutation time set it Between similarity；When the comparison result is that the similarity is more than default similarity, determine corresponding to the comparison result Prompt message indicates that the specified user is singing, and user is specified described in output indication in the prompt message of performance；When the ratio When being that the similarity is not more than the default similarity to result, determine that prompt message corresponding to the comparison result indicates institute State specified user do not singing, user is specified described in output indication not in the prompt message of performance.

11. device according to claim 10, it is characterised in that

The output module, it is additionally operable to determine the number of the match point in the benchmark mutation time set, match point is described In benchmark mutation time set with the time point of the time Point matching in the mutation time set of the audio signal；According to described Number and the total number of the time in the benchmark mutation time set, determine the similarity.

12. device according to claim 10, it is characterised in that the benchmark mutation time set is dashed forward including multiple benchmark Become chronon set, a benchmark mutation time subclass corresponds to a benchmark audio sub-signals of the specified song；

The output module, including：

Second determining unit, for determining multigroup subclass, one group of subclass is included corresponding to same benchmark audio sub-signals Benchmark mutation time subclass and mutation time subclass；

13. device according to claim 8, it is characterised in that first spectrum sequence is short-term spectrum sequence, in short-term Any sequence in log spectrum sequence or in short-term cepstrum sequence；

First determining module, it is additionally operable to when first spectrum sequence is short-term spectrum sequence, gathers the specified use The audio signal at family, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio signal Short-term spectrum sequence, using the short-term spectrum sequence as the first spectrum sequence；

First determining module, it is additionally operable to, when first spectrum sequence is log spectrum sequence in short-term, gather the finger Determine the audio signal of user, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio The short-term spectrum sequence of signal, logarithmic transformation is carried out to the short-term spectrum sequence, obtain the log spectrum sequence in short-term, will First spectrum sequence of the sequence of log spectrum in short-term as the audio signal；

First determining module, it is additionally operable to when first spectrum sequence is cepstrum sequence in short-term, gathers the specified use The audio signal at family, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio signal Short-term spectrum sequence, logarithmic transformation is carried out to the short-term spectrum sequence, the log spectrum sequence in short-term is obtained, to described Log spectrum sequence carries out inverse Fourier transform in short-term, obtains the cepstrum sequence in short-term, the cepstrum sequence conduct in short-term by described in First spectrum sequence of the audio signal.

14. device according to claim 8, it is characterised in that described device includes：

Comprising modules, it is described specified for the time point composition corresponding to each note in each character in the lyrics or the music score of Chinese operas Benchmark mutation time set corresponding to song.

15. a kind of audio signal processor, it is characterised in that including processor and memory；The memory, for depositing Computer program；The processor, for performing the computer program deposited on the memory, realize claim 1-7 Method and step described in any one.

16. a kind of computer-readable recording medium, it is characterised in that the computer-readable recording medium internal memory contains computer Program, the computer program realize the method and step described in claim any one of 1-7 when being executed by processor.