CN107680614A - Acoustic signal processing method, device and storage medium - Google Patents
Acoustic signal processing method, device and storage medium Download PDFInfo
- Publication number
- CN107680614A CN107680614A CN201710919028.9A CN201710919028A CN107680614A CN 107680614 A CN107680614 A CN 107680614A CN 201710919028 A CN201710919028 A CN 201710919028A CN 107680614 A CN107680614 A CN 107680614A
- Authority
- CN
- China
- Prior art keywords
- audio signal
- short
- mutation
- sequence
- benchmark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 230000035772 mutation Effects 0.000 claims abstract description 293
- 238000001228 spectrum Methods 0.000 claims abstract description 272
- 230000005236 sound signal Effects 0.000 claims abstract description 197
- 238000000034 method Methods 0.000 claims abstract description 64
- 230000008569 process Effects 0.000 claims description 27
- 238000009432 framing Methods 0.000 claims description 21
- 230000009466 transformation Effects 0.000 claims description 16
- 230000003595 spectral effect Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 10
- 239000000203 mixture Substances 0.000 claims description 10
- 238000009472 formulation Methods 0.000 claims description 7
- 108010076504 Protein Sorting Signals Proteins 0.000 claims 1
- 238000000151 deposition Methods 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 13
- 238000012545 processing Methods 0.000 abstract description 9
- 238000013461 design Methods 0.000 description 28
- 230000006870 function Effects 0.000 description 17
- 230000008859 change Effects 0.000 description 9
- 230000001960 triggered effect Effects 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 6
- 230000006854 communication Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000036438 mutation frequency Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/72—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for transmitting results of analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/762—Media network packet handling at the source
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Computer Networks & Wireless Communication (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The invention discloses a kind of acoustic signal processing method, device and storage medium, belong to technical field of video processing.Method includes:When terminal detects identification instruction, the first spectrum sequence of the audio signal of user is specified in first determination, and the identification, which instructs, to be used to indicate whether the detection specified user is singing specified song;Terminal determines the mutation time set of the audio signal, the mutation time set includes multiple time points, a time point corresponding mutation spectrum value according to first spectrum sequence;Contrast the audio signal mutation time set and the specified song corresponding to similarity between benchmark mutation time set, obtain comparing result, prompt message corresponding to the comparing result is exported, the prompt message is used to indicate whether the specified user is singing.Be identified by the benchmark mutation time set of mutation time set and specified song, improve the identification specify user whether performance method practicality.
Description
Technical field
The present invention relates to technical field of video processing, more particularly to a kind of audio file processing method, device and storage are situated between
Matter.
Background technology
Network direct broadcasting is a kind of network service provided by live application, is a kind of amusement shape very popular at present
Formula.Live application sets different direct broadcasting rooms, for example, direct broadcasting room of singing, telling a story between direct broadcasting room or live teaching broadcast etc..Main broadcaster
User can carry out different types of net cast in different direct broadcasting rooms.But some main broadcaster users in direct broadcasting room without
Net cast corresponding to the direct broadcasting room.For example, main broadcaster user is in singing direct broadcasting room and live without singing, but doing it
His thing.Therefore, when main broadcaster user is in singing direct broadcasting room, it is necessary to detect whether the main broadcaster user is singing.
In correlation technique, when main broadcaster user sings in singing direct broadcasting room specifies song, whether the main broadcaster user is identified
Can be in the process of singing:Terminal gathers the audio signal of main broadcaster user, extracts the pitch sequence of the audio signal, and obtaining should
Song is specified to correspond to standard pitch sequence, the standard pitch sequence is that staff obtains by way of manually marking in advance
's.Terminal calculates the similarity between the pitch sequence of the audio signal and the standard pitch sequence, if the similarity is not small
In certain numerical value, then terminal determines that the main broadcaster user is singing, and otherwise, terminal determines that the main broadcaster user is not singing.
During the present invention is realized, inventor has found that correlation technique at least has problems with:
Above method needs carry out manual identified in advance by way of manually marking and go out standard pitch sequence, however, by
It is various in original song quantity in the market, manual identified it is extremely inefficient, at present, most original song is not corresponding
Standard answer high sequence, when main broadcaster user sings live original song without corresponding standard pitch sequence, will be unable into
Row identification, so as to cause the poor practicability of the above method.
The content of the invention
The invention provides a kind of acoustic signal processing method, device and storage medium, can solve prior art practicality
The problem of property difference.Technical scheme is as follows:
First aspect, there is provided a kind of acoustic signal processing method, methods described include:
When detecting identification instruction, it is determined that the first spectrum sequence of the audio signal of user is specified, the identification instruction
For indicating to detect whether the specified user is singing specified song;
According to first spectrum sequence, the mutation time set of the audio signal, the mutation time set are determined
Including multiple time points, a time point corresponding mutation spectrum value;
Contrast the audio signal mutation time set and the specified song corresponding to the set of benchmark mutation time it
Between similarity, obtain comparing result, export prompt message corresponding to the comparing result, the prompt message is used to indicate institute
State whether specified user is singing.
It is described according to first spectrum sequence in a kind of possible design, when determining the mutation of the audio signal
Between gather, including:
According to each spectrum value in first spectrum sequence, the diversity factor between two neighboring spectrum value is determined;
When the diversity factor between two neighboring spectrum value is more than default diversity factor, adjacent the two of default diversity factor are will be greater than
A constitutive mutation time at time point set in time point corresponding to individual spectrum value.
In a kind of possible design, the mutation time set of the contrast audio signal and the specified song pair
Similarity between the benchmark mutation time set answered, obtains comparison result, exports prompt message corresponding to the comparison result,
Including:
Determine the similarity between the mutation time set of the audio signal and the benchmark mutation time set;
When the comparison result is that the similarity is more than default similarity, determine to prompt corresponding to the comparison result
Information indicates that the specified user is singing, and user is specified described in output indication in the prompt message of performance;
When the comparison result is that the similarity is not more than the default similarity, determine that the comparison result is corresponding
Prompt message indicate that the specified user is not singing, specified user is not in the prompt message of performance described in output indication.
In a kind of possible design, when the mutation time set for determining the audio signal and the benchmark are mutated
Between gather between similarity, including:
The number of the match point in the benchmark mutation time set is determined, match point is the benchmark mutation time set
In time point with the time Point matching in the mutation time set of the audio signal;
The total number of time in the number and the benchmark mutation time set, determines the similarity.
In a kind of possible design, the benchmark mutation time set includes multiple benchmark mutation time subclass, and one
Individual benchmark mutation time subclass corresponds to a benchmark audio sub-signals of the specified song;
Similarity between the mutation time set for determining the audio signal and the benchmark mutation time set,
Including:
The mutation time set of the audio signal is divided into multiple mutation time subclass;
Multigroup subclass is determined, one group of subclass includes benchmark mutation time corresponding to same benchmark audio sub-signals
Set and mutation time subclass;
The similarity of every group of subclass is determined respectively;
According to the similarity of every group of subclass, the mutation time set and the benchmark for determining the audio signal are dashed forward
The similarity become between time set.
In a kind of possible design, first spectrum sequence be short-term spectrum sequence, in short-term log spectrum sequence or
Person's any sequence in cepstrum sequence in short-term;First spectrum sequence for determining to specify the audio signal of user, including:
When first spectrum sequence is short-term spectrum sequence, the audio signal of the specified user is gathered, to described
Audio signal carries out framing, windowing process and Short Time Fourier Transform, the short-term spectrum sequence of the audio signal is obtained, by institute
Short-term spectrum sequence is stated as the first spectrum sequence;
When first spectrum sequence is log spectrum sequence in short-term, the audio signal of the specified user is gathered, it is right
The audio signal carries out framing, windowing process and Short Time Fourier Transform, obtains the short-term spectrum sequence of the audio signal,
Logarithmic transformation is carried out to the short-term spectrum sequence, obtains the log spectrum sequence in short-term, will described in log spectrum sequence in short-term
Arrange the first spectrum sequence as the audio signal;
When first spectrum sequence is cepstrum sequence in short-term, the audio signal of the specified user is gathered, to described
Audio signal carries out framing, windowing process and Short Time Fourier Transform, the short-term spectrum sequence of the audio signal is obtained, to institute
State short-term spectrum sequence and carry out logarithmic transformation, obtain the log spectrum sequence in short-term, the sequence of log spectrum in short-term is entered
Row inverse Fourier transform, the cepstrum sequence in short-term is obtained, using the sequence of cepstrum in short-term as the first of the audio signal
Spectrum sequence.
In a kind of possible design, the mutation time set of the contrast audio signal and the specified song pair
Similarity between the benchmark mutation time set answered, obtains comparing result, exports prompt message corresponding to the comparing result
Before, methods described includes:
Obtain the lyrics or the music score of Chinese operas of the specified song;
The timestamp of the formulation song is obtained, is determined in the lyrics in each character or the music score of Chinese operas corresponding to each note
Time point;
Time point corresponding to each note in each character in the lyrics or the music score of Chinese operas is formed into the specified song pair
The benchmark mutation time set answered.
Second aspect, there is provided a kind of audio signal processor, described device include:
First determining module, for when detect identification instruction when, it is determined that specify user audio signal the first frequency spectrum
Sequence, the identification, which instructs, to be used to indicate to detect whether the specified user is singing specified song;
Second determining module, for according to first spectrum sequence, determining the mutation time set of the audio signal,
The mutation time set includes multiple time points, a time point corresponding mutation spectrum value;
Output module, for contrast the audio signal mutation time set and the specified song corresponding to benchmark dash forward
The similarity become between time set, obtains comparing result, exports prompt message corresponding to the comparing result, the prompting letter
Cease for indicating whether the specified user is singing.
In a kind of possible design, second determining module, including:
First determining unit, for each spectrum value in first spectrum sequence, determine two neighboring frequency spectrum
Diversity factor between value;
Component units, for when the diversity factor between two neighboring spectrum value is more than default diversity factor, will be greater than presetting
A constitutive mutation time at time point set in time point corresponding to the two neighboring spectrum value of diversity factor.
In a kind of possible design, the output module, it is additionally operable to determine the mutation time set of the audio signal
Similarity between the benchmark mutation time set;When the comparison result is that the similarity is more than default similarity
When, determine that prompt message corresponding to the comparison result indicates that the specified user is singing, user is specified described in output indication
In the prompt message of performance;When the comparison result is that the similarity is not more than the default similarity, the ratio is determined
The specified user, which is not singing, to be indicated to prompt message corresponding to result, user's not carrying in performance is specified described in output indication
Show information.
In a kind of possible design, the output module, it is additionally operable to determine in the benchmark mutation time set
Number with point, match point be the benchmark mutation time set in the time in the mutation time set of the audio signal
The time point of Point matching;The total number of time in the number and the benchmark mutation time set, determines the phase
Like degree.
In a kind of possible design, the benchmark mutation time set includes multiple benchmark mutation time subclass, and one
Individual benchmark mutation time subclass corresponds to a benchmark audio sub-signals of the specified song;
The output module, including:
Division unit, for the mutation time set of the audio signal to be divided into multiple mutation time subclass;
Second determining unit, for determining multigroup subclass, one group of subclass includes same benchmark audio sub-signals pair
The benchmark mutation time subclass and mutation time subclass answered;
Second determining unit, it is additionally operable to determine the similarity of every group of subclass respectively;According to every group of subclass
Similarity, determine the similarity between the mutation time set of the audio signal and the benchmark mutation time set.
In a kind of possible design, first spectrum sequence be short-term spectrum sequence, in short-term log spectrum sequence or
Person's any sequence in cepstrum sequence in short-term;
First determining module, it is additionally operable to, when first spectrum sequence is short-term spectrum sequence, gather the finger
Determine the audio signal of user, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio
The short-term spectrum sequence of signal, using the short-term spectrum sequence as the first spectrum sequence;
First determining module, it is additionally operable to when first spectrum sequence is log spectrum sequence in short-term, gathers institute
The audio signal of specified user is stated, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, is obtained described
The short-term spectrum sequence of audio signal, logarithmic transformation is carried out to the short-term spectrum sequence, obtain the log spectrum sequence in short-term
Row, the first spectrum sequence using the sequence of log spectrum in short-term as the audio signal;
First determining module, it is additionally operable to, when first spectrum sequence is cepstrum sequence in short-term, gather the finger
Determine the audio signal of user, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio
The short-term spectrum sequence of signal, logarithmic transformation is carried out to the short-term spectrum sequence, obtain the log spectrum sequence in short-term, it is right
The sequence of log spectrum in short-term carries out inverse Fourier transform, obtains the cepstrum sequence in short-term, will described in cepstrum sequence in short-term
The first spectrum sequence as the audio signal.
In a kind of possible design, described device includes:
Acquisition module, for obtaining the lyrics or the music score of Chinese operas of the specified song;
3rd determining module, for obtaining the timestamp of the formulation song, determine each character or song in the lyrics
Time point in spectrum corresponding to each note;
Comprising modules, for described in the time point composition corresponding to each note in each character in the lyrics or the music score of Chinese operas
Specify benchmark mutation time set corresponding to song.
The third aspect, there is provided a kind of audio signal processor, including processor and memory;The memory, is used for
Deposit computer program;The processor, for performing the computer program deposited on the memory, realize first aspect
Described method and step.
Fourth aspect, there is provided a kind of computer-readable recording medium, the computer-readable recording medium internal memory contain meter
Calculation machine program, the computer program realize the method and step described in first aspect when being executed by processor.
In the embodiment of the present invention, when terminal detects identification instruction, first determine to specify the first of the audio signal of user
Spectrum sequence, the identification, which instructs, to be used to indicate whether the detection specified user is singing specified song;Terminal is according to first frequency
Spectral sequence, the mutation time set of the audio signal is determined, the mutation time set includes multiple time points, a time point pair
Answer a mutation spectrum value;Contrast audio signal mutation time set and specified song corresponding to the set of benchmark mutation time it
Between similarity, obtain comparing result, export prompt message corresponding to comparing result, prompt message is used to indicating specifying user to be
It is no to sing.Because the first spectrum sequence based on audio signal determines mutation time set, by the mutation time set and
The benchmark mutation time set of song is specified to be identified, current song has benchmark mutation time set, therefore, the present invention
The recognition methods that embodiment provides using relatively broad, improve the identification specify user whether performance method practicality.
Brief description of the drawings
Fig. 1 is a kind of implementation environment schematic diagram of acoustic signal processing method provided in an embodiment of the present invention;
Fig. 2 is a kind of acoustic signal processing method flow chart provided in an embodiment of the present invention;
Fig. 3 is a kind of acoustic signal processing method flow chart provided in an embodiment of the present invention;
Fig. 4 is a kind of audio signal processor structural representation provided in an embodiment of the present invention;
Fig. 5 is a kind of structural representation of audio signal processor provided in an embodiment of the present invention.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention
Formula is described in further detail.
Fig. 1 is the implementation environment schematic diagram of acoustic signal processing method, and the implementation environment includes:Specify the terminal of user
101 and server 102.Connected between terminal 101 and server 102 by wired or wireless network.Clothes are run in terminal 101
It to be engaged in the application program that device 102 associates, terminal 101 can be based on user and identify to log in the application program, with login service device 102,
So as to be interacted with server 102.
Application program is any application program that can gather audio signal, for example, live application or K song applications etc..
It is the user sung to specify user.For example, when the application program is live application, this specifies user can be based on
Broadcasting user;When the application program is K song applications, this specifies the user that user can be current K songs.
However, user is specified sometimes in the direct broadcasting room of live application and without singing, so as to the sight to the direct broadcasting room
Many users bring poor Consumer's Experience, therefore, when specified user is marking oneself singing specified song, detect this and specify
Whether user is really singing specified song.In embodiments of the present invention, can be detected by terminal 101 this specify user whether
Sing, can also detect whether the user is singing by server 102.In embodiments of the present invention, specified with the detection of terminal 101
Whether user illustrates exemplified by performance.This specify song can be specify song, essay, the song such as cross-talk.
Terminal 101 can be cell phone apparatus, PAD (Portable Android Device, tablet personal computer) equipment or electricity
Any equipment that can gather audio signal such as brain equipment.Server 102 refers to the server that background service is provided for terminal 101
102, can be a server 102, or the cluster of server 102 being made up of some servers 102, or a cloud
The center of calculation server 102, the embodiment of the present invention are not limited this.In a kind of possible implementation, server 102 can
Think the background server for the live application installed in terminal 101.
Fig. 2 is a kind of acoustic signal processing method flow chart provided in an embodiment of the present invention, and this method can be applied at end
In end, as shown in Fig. 2 this method comprises the following steps.
Step 201:When detecting identification instruction, it is determined that specifying the first spectrum sequence of the audio signal of user, the knowledge
Do not instruct and be used to indicate whether the detection specified user is singing specified song;
Step 202:According to first spectrum sequence, the mutation time set of the audio signal is determined, the mutation time collection
Conjunction includes multiple time points, a time point corresponding mutation spectrum value;
Step 203:The mutation time set for contrasting the audio signal specifies benchmark mutation time collection corresponding to song with this
Similarity between conjunction, obtains comparing result, exports prompt message corresponding to the comparing result, and the prompt message is used to indicate this
Whether specified user is singing.
In a kind of possible design, this determines the mutation time set of the audio signal according to first spectrum sequence,
Including:
According to each spectrum value in first spectrum sequence, the diversity factor between two neighboring spectrum value is determined;
When the diversity factor between two neighboring spectrum value is more than default diversity factor, adjacent the two of default diversity factor are will be greater than
A constitutive mutation time at time point set in time point corresponding to individual spectrum value.
In a kind of possible design, the mutation time set for contrasting the audio signal specifies benchmark corresponding to song with this
Similarity between mutation time set, obtains comparison result, exports prompt message corresponding to the comparison result, including:
Determine the similarity between the mutation time set of the audio signal and the benchmark mutation time set;
When the comparison result is that the similarity is more than default similarity, determine that prompt message corresponding to the comparison result refers to
Show that this specifies user singing, prompt message of the output indication specified user in performance;
When the comparison result is that the similarity presets similarity no more than this, prompting letter corresponding to the comparison result is determined
This specifies user not singing for breath instruction, and the output indication specified user is not in the prompt message of performance.
In a kind of possible design, the mutation time set and the benchmark mutation time set of the determination audio signal
Between similarity, including:
Determine the number of the match point in the benchmark mutation time set, match point be in the benchmark mutation time set with
The time point of time Point matching in the mutation time set of the audio signal;
The total number of time in the number and the benchmark mutation time set, determines the similarity.
In a kind of possible design, the benchmark mutation time set includes multiple benchmark mutation time subclass, one
Benchmark mutation time subclass is to that should specify benchmark audio sub-signals of song;
Similarity between the mutation time set and the benchmark mutation time set of the determination audio signal, including:
The mutation time set of the audio signal is divided into multiple mutation time subclass;
Multigroup subclass is determined, one group of subclass includes benchmark mutation time corresponding to same benchmark audio sub-signals
Set and mutation time subclass;
The similarity of every group of subclass is determined respectively;
According to the similarity of every group of subclass, mutation time set and the benchmark mutation time of the audio signal are determined
Similarity between set.
In a kind of possible design, this method also includes:
When it is determined that this specifies user not singing, instruction message is sent to server, the instruction message is used to indicate this
User is specified not sing, so that the server specifies user to carry out designated treatment this, the designated treatment includes:Remind this to refer to
Determining the spectators user of user should specify user not singing and/or specifying user to punish this.
In a kind of possible design, first spectrum sequence be short-term spectrum sequence, in short-term log spectrum sequence or
Any sequence in cepstrum sequence in short-term;The first spectrum sequence of the audio signal of user is specified in the determination, including:
When first spectrum sequence is short-term spectrum sequence, the audio signal for specifying user is gathered, the audio is believed
Number framing, windowing process and Short Time Fourier Transform are carried out, the short-term spectrum sequence of the audio signal is obtained, by the short-term spectrum
Sequence is as the first spectrum sequence;
When first spectrum sequence is log spectrum sequence in short-term, the audio signal for specifying user is gathered, to the sound
Frequency signal carries out framing, windowing process and Short Time Fourier Transform, obtains the short-term spectrum sequence of the audio signal, to this in short-term
Spectrum sequence carries out logarithmic transformation, obtains the log spectrum sequence in short-term, and using this, log spectrum sequence is believed as the audio in short-term
Number the first spectrum sequence;
When first spectrum sequence is cepstrum sequence in short-term, the audio signal for specifying user is gathered, the audio is believed
Number framing, windowing process and Short Time Fourier Transform are carried out, the short-term spectrum sequence of the audio signal is obtained, to the short-term spectrum
Sequence carries out logarithmic transformation, obtains the log spectrum sequence in short-term, and to this, log spectrum sequence carries out inverse Fourier transform in short-term,
The cepstrum sequence in short-term is obtained, using first spectrum sequence of the cepstrum sequence as the audio signal in short-term.
In a kind of possible design, the mutation time set of the contrast audio signal and this specify base corresponding to song
Similarity between quasi- mutation time set, obtains comparing result, before exporting prompt message corresponding to the comparing result, the party
Method includes:
Obtain the lyrics or the music score of Chinese operas of specifying song;
Obtain the timestamp of the formulation song, determine in the lyrics in each character or the music score of Chinese operas corresponding to each note when
Between point;
Time point corresponding to each note in each character in the lyrics or the music score of Chinese operas is formed into this to specify corresponding to song
Benchmark mutation time set.
In the embodiment of the present invention, when terminal detects identification instruction, first determine to specify the first of the audio signal of user
Spectrum sequence, first spectrum sequence are short-term spectrum sequence, in short-term log spectrum sequence or cepstrum sequence in short-term, the identification
Instruct and be used to indicate whether the detection specified user is singing specified song;Terminal determines the sound according to first spectrum sequence
The mutation time set of frequency signal, the mutation time set include multiple time points, a time point corresponding mutation frequency spectrum
Value;Similarity between benchmark mutation time set corresponding to the mutation time set of contrast audio signal and specified song, is obtained
To comparing result, prompt message corresponding to comparing result is exported, prompt message is used to indicate to specify whether user is singing.Due to
The first spectrum sequence based on audio signal determines mutation time set, passes through the mutation time set and the benchmark of specified song
Mutation time set is identified, and current song has benchmark mutation time set, therefore, knowledge provided in an embodiment of the present invention
Other method using relatively broad, improve the identification specify user whether performance method practicality.
Fig. 3 is a kind of acoustic signal processing method flow chart provided in an embodiment of the present invention, and this method can be applied at end
In end, can also apply in the server, the embodiment of the present invention is not especially limited to this, the embodiment of the present invention only using terminal as
Example illustrates, as shown in figure 3, this method comprises the following steps.
Step 301:When detecting identification instruction, terminal determines to specify the first spectrum sequence of the audio signal of user,
The identification, which instructs, to be used to indicate whether the detection specified user is singing specified song.
In the embodiment of the present invention, terminal can show live button in the interface of the live application, and specified user can be with
By triggering the live button, the direct broadcasting room of the performance song is opened, it is live so as to be carried out to spectators user.However, this is specified
User in the direct broadcasting room of the performance song may and without sing, so as to waste enter the direct broadcasting room etc. it is to be watched
Spectators user time, or, this is specified, and user is actual when singing, because the failure of the terminal for specifying user, network are believed
The reason such as number poor so that in the direct broadcasting room display this specify user not singing, so as to cause the spectators user in direct broadcasting room
It can not watch, poor Consumer's Experience is brought to specified user and spectators user.Therefore, in order to lift specified user and spectators
The Consumer's Experience of user, terminal can identify whether the specified user in direct broadcasting room is singing.When terminal detection reach it is default
During identification condition, terminal obtains identification instruction, and the identification, which instructs, to be detected this for instruction terminal and specify whether user refers in performance
Determine song.
Wherein, the default identification condition can include but is not limited to:When the live button is triggered, or the direct broadcasting room
When reaching certain time after being opened, or, this presets identification when either the recognition button in the terminal of spectators user is triggered
Condition can also be that terminal detect when not having sound in the direct broadcasting room etc..Accordingly, terminal obtains the realization side of identification instruction
Formula includes but is not limited to any mode in following (1)-(4):
(1):When terminal detects that the live button is triggered, terminal generation identification instruction.
(2):When terminal detects current time and the difference of the opening time of the direct broadcasting room is more than preset time difference, terminal
Generation identification instruction.
Wherein, the preset time difference can be needed to set and change according to user, and the embodiment of the present invention is not done specifically to this
Limit.For example, the preset time difference can be 2 seconds, 6 seconds etc..
(3):The identification instruction that terminal the reception server is sent, identification instruction detect current live for vlewer terminals
Recognition button in interface is sent to server when being triggered.
Wherein, the recognition button in the live interface that spectators user can be by triggering direct broadcasting room, to trigger vlewer terminals
Generation identification instruction, vlewer terminals send identification instruction to server, and server specifies the terminal of user to forward identification to this
Instruction, this is specified the terminal of user to receive the identification of server forwarding and instructed.
It should be noted that the terminal in the embodiment of the present invention refers to the terminal for specifying user to use, vlewer terminals refer to
Watch the terminal for specifying the spectators user of user to use.
(4):When the audio signal that terminal is detected in the direct broadcasting room does not change in preset duration, terminal generation
Identification instruction.
Wherein, when the direct broadcasting room is opened, terminal starts to detect the situation of change of the audio signal in the direct broadcasting room in real time,
The preset duration can be needed to set and change according to user, and the embodiment of the present invention is not specifically limited to this.For example, this is default
Duration can be 10 seconds, 6 seconds etc..
Wherein, the audio signal that terminal can specify user to be played in the direct broadcasting room of the performance song based on this is examined
Survey, therefore, when terminal detects identification instruction, terminal determines to specify the first spectrum sequence of the audio signal of user, should
First spectrum sequence is short-term spectrum sequence, in short-term log spectrum sequence or any sequence in cepstrum sequence in short-term, accordingly
, terminal determines to specify the implementation of the first spectrum sequence of the audio signal of user can include following three kinds.
For the first implementation, when first spectrum sequence is short-term spectrum sequence, gathers this and specify user's
Audio signal, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the frequency in short-term of the audio signal
Spectral sequence, using the short-term spectrum sequence as the first spectrum sequence.
In the embodiment of the present invention, when the terminal detects identification instruction, terminal collection specifies user in the direct broadcasting room
Audio signal, according to default frame length, the audio signal is divided into multiframe audio sub-signals, the length per frame audio sub-signals
Frame length is preset for this;Meanwhile in order to prevent spectrum leakage, for every frame audio sub-signals, terminal is by presetting window function, to this
Frame audio sub-signals carry out windowing process, obtain the multiframe audio sub-signals after windowing process;It is every after terminal-pair windowing process
Frame audio sub-signals carry out Short Time Fourier Transform, multiframe short-term spectrum signal are obtained, by the frequency of the multiframe short-term spectrum signal
Spectrum forms the first spectrum sequence.
Wherein, the default frame length and default window function can need to set and change, the embodiment of the present invention pair according to user
This is not especially limited.For example, the default frame length can be 25 milliseconds, 30 milliseconds etc..The default window function can be Hanning window
Function, Hamming window function etc..
For second of implementation, when first spectrum sequence is log spectrum sequence in short-term, gathers this and specify use
The audio signal at family, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the short of the audio signal
Time-frequency spectral sequence, to the short-term spectrum sequence carry out logarithmic transformation, obtain the log spectrum sequence in short-term, by this in short-term logarithm frequency
First spectrum sequence of the spectral sequence as the audio signal.
Wherein, terminal obtains this in short-term after the general sequence of frequency, for every frame short-term spectrum signal in the short-term spectrum sequence,
Terminal will carry out logarithmic transformation per frame short-term spectrum signal, obtain every frame log spectrum signal in short-term, will per frame logarithm frequency in short-term
The spectrum value of spectrum signal forms the first spectrum sequence.Terminal obtains the mode of short-term spectrum sequence with the first above-mentioned implementation
In the mode that is related to it is consistent, no longer repeat one by one herein.
It should be noted that log spectrum signal is more smooth in short-term, sound in the audio signal can be preferably embodied
The details fluctuation of sound, so that first spectrum sequence actual sound corresponding with the audio signal is more pressed close to, is improved
Identification specify user whether singing accuracy rate.
For the third implementation, when first spectrum sequence is cepstrum sequence in short-term, gathers this and specify user's
Audio signal, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the frequency in short-term of the audio signal
Spectral sequence, logarithmic transformation is carried out to the short-term spectrum sequence, obtains the log spectrum sequence in short-term, to the log spectrum sequence in short-term
Row carry out inverse Fourier transform, obtain the cepstrum sequence in short-term, using this in short-term cepstrum sequence as the audio signal first frequency
Spectral sequence.
Wherein, log spectrum sequence in short-term is obtained based on the method in above-mentioned second of implementation, terminal, for every frame
Log spectrum signal in short-term, will per frame in short-term log spectrum signal carry out inverse Fourier transform, obtain every frame cepstrum signal in short-term,
By every frame in short-term cepstrum signal spectrum value form the first spectrum sequence.
It should be noted that the interference free performance of cepstrum signal is preferable in short-term, therefore, terminal will per frame cepstrum signal in short-term
Spectrum value form the first spectrum sequence, so as to effectively reduce the interference that brings of environmental factor, improve and determine first frequency spectrum
The accuracy of sequence.
In the embodiment of the present invention, because the sonification model of people is when sending different characters, the audio signal that people sends exists
Spectrum value on frequency domain can be more sane the rhythm characteristic for embodying the audio signal, compared to the sound of the audio signal
Feature, first spectrum sequences of the audio signal such as high, generation density more can accurately embody the sonification model of voice
Send change during kinds of characters, so as to substantially increase identification specify user whether performance accuracy rate.
Step 302:Terminal determines the mutation time set of the audio signal according to first spectrum sequence, during the mutation
Between set include multiple time points, corresponding one an of time point is mutated spectrum value.
In the embodiment of the present invention, due to specifying user when singing specified song, specified user can specify song according to this
The purpose lyrics or the music score of Chinese operas are sung, and this specifies the lyrics of song to include multiple characters, and the character can be Chinese character, English word
Or the word of other any languages;This specifies the music score of Chinese operas of song to include multiple notes, and the note can be numbered musical notation note or five
Line spectrum note etc., spectrum value corresponding to different character or note also differ, that is, continuously multiple characters or note pair when singing
Difference between the multiple spectrum values answered is also larger, and multiple time points that the mutation time set includes are in the lyrics of song
Different characters corresponding time point in the total duration of the song.Therefore, after terminal obtains the first spectrum sequence, first obtaining should
The mutation spectrum value being had differences in first spectrum sequence, according to the plurality of mutation spectrum value, determine the mutation of the audio signal
Time gathers.
This step can be realized by following steps 3021-3022.
Step 3021:Each spectrum value of the terminal in first spectrum sequence, determine between two neighboring spectrum value
Diversity factor.
In this step, for every two neighboring spectrum value in first spectrum sequence, terminal is calculated by preset algorithm
Diversity factor between the two neighboring spectrum value, and the corresponding relation between the two neighboring spectrum value and the diversity factor is stored,
In order to which SS later is according to the diversity factor, spectrum value corresponding to diversity factor is searched from the corresponding relation.Wherein, the pre- imputation
Method can be needed to set and change according to user, and the embodiment of the present invention is not especially limited to this.For example, the preset algorithm can be with
To calculate algorithm of the algorithm of variance, the algorithm of calculating difference, the algorithm of calculating Euclidean distance or calculating COS distance etc., phase
Answer, variance, difference, Euclidean distance or the COS distance that the diversity factor can be between the spectrum value and adjacent frequency spectrum
Deng.
Step 3022:When the diversity factor between two neighboring spectrum value is more than default diversity factor, terminal will be greater than presetting
A constitutive mutation time at time point set in time point corresponding to the two neighboring spectrum value of diversity factor.
In this step, terminal is obtained multiple diversity factoies, selected from the plurality of diversity factor big by above-mentioned steps 3021
In the diversity factor of default diversity factor, from the corresponding relation of diversity factor and two neighboring spectrum value, multiple differences of the selection are searched
Two neighboring spectrum value corresponding to different degree.Terminal obtains the time corresponding to previous spectrum value in the two neighboring spectrum value searched
Point, the constitutive mutation time at time point of the acquisition is gathered;Or terminal obtains the latter in the two neighboring spectrum value searched
At time point corresponding to spectrum value, the constitutive mutation time at time point of the acquisition is gathered.
It should be noted that because terminal is according to default frame length, one section of audio signal gathered is divided into multiple sounds
Frequency subsignal, therefore, each audio sub-signals can to should be in the total duration of section audio signal a time point, the time
Point can be the audio sub-signals corresponding start time point for playing the period, termination time point in the total duration of audio signal
Or middle time point etc., for example, the audio signal that certain section of total duration is 100 seconds includes 300 audio sub-signals, Mei Geyin altogether
The frame length of frequency subsignal is 20 milliseconds, then the 30th audio sub-signals corresponding broadcasting period in the total duration of the audio signal
For:40 milliseconds~the 10th second 9th second, the 30th time point corresponding to audio from signal can be start time point, i.e., the 9th second
40 milliseconds, or terminate time point, i.e., the 10th second, or middle time point, i.e., the 9th second 50 milliseconds.Each spectrum value be based on
What each audio sub-signals after division obtained, the corresponding audio sub-signals of a spectrum value, then corresponding to each spectrum value
Time point is time point corresponding to the audio sub-signals, and therefore, terminal should by time point composition corresponding to the spectrum value of acquisition
The step of mutation time set can be:Terminal searches audio sub-signals corresponding to the spectrum value each obtained, by what is found
Constitutive mutation time at time point set corresponding to each audio sub-signals.
In the embodiment of the present invention, when specified user, which sings, specifies song, terminal can specify this audio of user to believe
Mutation time set corresponding to number, benchmark mutation time set corresponding with specified song are contrasted, and pass through following steps
303, identify that this specifies whether user is singing.
Therefore, before being identified, terminal also needs first to obtain benchmark mutation time set corresponding to the specified song, this
Step can be:Terminal obtains the lyrics or the music score of Chinese operas of specifying song;The timestamp of the formulation song is obtained, is determined in the lyrics
Time point in each character or the music score of Chinese operas corresponding to each note;Each note institute in each character in the lyrics or the music score of Chinese operas is right
The time point answered forms this and specifies benchmark mutation time set corresponding to song.
Wherein, this specifies song to generally correspond to the lyrics or the music score of Chinese operas, and it is nominally to drill that this, which specifies the timestamp of song,
When singing the specified song, in the lyrics of the specified song in time point, and the music score of Chinese operas of the specified song corresponding to each character
At time point corresponding to each note, in general, this can be specified to the original song of song as a reference song, should
Time point corresponding to each character or note can be the time point of character or note in the original song of the specified song.Typically
For, each time point corresponding to character in the lyrics and the music score of Chinese operas, and the lyrics of most of songs has been included in network at present
With the time point of each note in the music score of Chinese operas, therefore, the time point based on each character or note obtains the base for specifying song
Quasi- mutation time set, and based on the benchmark mutation time set to specifying user to be identified, so that the present invention is implemented
The application of example is more extensive, substantially increases the practicality of the embodiment of the present invention.
It should be noted that when terminal detects that the live button is triggered, can show in the current interface of terminal
Show input frame, the input frame is used to indicate the song mark for specifying user to input specified song to be sung;Specified user can be with
According to the instruction of the input frame, the song mark of specified song to be sung is inputted in the input frame, during end of input, pass through
Triggering ACK button has been terminated with confirming to input.
Wherein, when terminal detects identification instruction, just start, to specifying user to be identified, to be obtained by terminal in step 3011
At least four implementations for taking identification to instruct understand that terminal can be identified when specified user starts live, now, eventually
When end detects that the live button is triggered, that is, obtain the lyrics or the music score of Chinese operas of specifying song, i.e., following first way.Separately
Outside, terminal can also open direct broadcasting room and then be identified, and now, terminal need to combine current time, it is determined that specifying song
The lyrics or the music score of Chinese operas, i.e., the following second way.
For first way.When terminal obtain specify song the lyrics when, the acquisition specify song the lyrics the step of
Can be:When terminal detects that ACK button is triggered, the song mark in the input frame is obtained, can be deposited in advance in terminal
Storage specifies number the corresponding relation between the song mark of a song and the lyrics, and terminal identifies according to the song, searches local deposit
In the song mark of storage and the corresponding relation of the lyrics song song mark is specified with the presence or absence of this.When it is present, terminal is right from this
The lyrics corresponding to middle acquisition song mark should be related to.When in the absence of when, terminal to server send obtain request so that service
Device sends the lyrics corresponding to song mark to terminal, and terminal receives the lyrics., should when terminal obtains the music score of Chinese operas for specifying song
It is consistent with the mode of the above-mentioned acquisition lyrics to obtain the implementation for the music score of Chinese operas for specifying song, no longer repeats one by one herein.
For the second way.When terminal obtain specify song the lyrics when, the acquisition specify song the lyrics the step of
Can be:When terminal detects identification instruction, the song mark for specifying song is obtained, terminal is identified according to the song, obtained
The lyrics corresponding to song mark, when terminal acquisition detects identification instruction, opened corresponding to the acquisition current time direct broadcasting room
Duration is opened, determines that the opening time corresponding time point, terminal in the total duration for specifying song were obtained after the time point
This specify song corresponding to the lyrics.
When terminal obtains the music score of Chinese operas for specifying song, the acquisition specifies the implementation of the music score of Chinese operas of song to be sung with above-mentioned acquisition
The mode of word is consistent, no longer repeats one by one herein.
Step 303:Corresponding to mutation time set of the terminal-pair than the audio signal and the specified song during benchmark mutation
Between gather between similarity, obtain comparing result, export prompt message corresponding to the comparing result.
In the embodiment of the present invention, the prompt message is used to indicate that this specifies whether user is singing, and terminal can be based on should
Similarity between mutation time set and the benchmark mutation time set is identified.Accordingly, this step can by with
Lower step 3031-3032 is realized.
Step 3031:Terminal determines the phase between the mutation time set of the audio signal and the benchmark mutation time set
Like degree.
In this step, matching that terminal can be directly based upon in the benchmark mutation time set, which is counted out, to be determined, i.e.,
Following first way.Or terminal is also based on multiple benchmark mutation time subsets that benchmark mutation time set includes
Close, and multiple mutation time subclass that mutation time set includes are determined, i.e., the following second way.
For first way, this step can be realized by following steps a.
Step a:Terminal determines the number of the match point in the benchmark mutation time set, is dashed forward according to the number and the benchmark
The total number of the time become in time set, determines the similarity.
Wherein, on the basis of the match point in mutation time set with the time point in the mutation time set of the audio signal
The time point of matching.For each time point in the benchmark mutation time set, terminal is selected from mutation time set
And the time point of the difference at the time point in the benchmark mutation time set within a preset range, by the time point of selection be defined as with
The time point that time point in the benchmark mutation time set matches.Terminal obtains time point in the benchmark mutation time set
Total number, and the number of match point in the benchmark mutation time set is calculated, by the number of the match point divided by the time point
Total number obtained by business as the similarity.
Wherein, the preset range can need to set and change according to user, and the embodiment of the present invention does not do specific limit to this
It is fixed.For example, the preset range can be (- 0.1s, 0.1s), if the sometime point in the benchmark mutation time set is
3.10 seconds, it was 3.09 seconds to deserve existence time point in mutation time set, and the difference at two time points is in the preset range, then
Exist and 3.10 seconds time points matched, as 3.09 seconds in mutation time set.
For the second way, this step can be realized by following steps b-c.
Step b:The mutation time set of the audio signal is divided into multiple mutation time subclass by terminal;Terminal determines
Multigroup subclass, when one group of subclass includes benchmark mutation time subclass corresponding to same benchmark audio sub-signals and mutation
Between subclass.
In the embodiment of the present invention, when nominally singing the specified song, the specified song frequently includes more songs
Word, every lyrics include multiple continuous characters, and the multiple continuous characters that can include a lyrics form a character
String, or multiple continuous notes corresponding to a lyrics are formed into a note string, the i.e. corresponding character string of a lyrics
Or a note string, in order to improve the accuracy for calculating the similarity between mutation time set and benchmark mutation time set,
Multiple character strings or note string corresponding to the specified song can be based on, multiple benchmark are divided into by the benchmark mutation time set
Mutation time subclass, mutation time set is divided into multiple mutation time subclass, based on each benchmark mutation time
Similarity between set and corresponding mutation time subclass, determines mutation time set and benchmark mutation time set
Between similarity.
Wherein, the benchmark mutation time set includes multiple benchmark mutation time subclass, benchmark mutation time
Gather a benchmark audio sub-signals to song should be specified, the benchmark audio sub-signals can be that this specifies in song one
Character string or a note the string corresponding benchmark audio sub-signals in baseline audio signal.Therefore, terminal can be more according to this
The start time point of individual benchmark mutation time subclass, the mutation time subclass is divided into multiple mutation time subclass,
Each the start time point of the corresponding benchmark mutation time subclass of the start time point of mutation time subclass is identical.So
Afterwards, the mutation time subclass and corresponding fiducial time subclass are defined as one group of subclass by terminal, are obtained multigroup
Subclass.
In a kind of possible design, each time point and phase that terminal directly can also include according to mutation time set
Time difference between adjacent time point, mutation time set is divided, the step can be:For in the mutation time set
At each time point, calculate the time difference between the time point and adjacent time point;So as to obtain the time corresponding to multiple time points
Difference, the time difference more than predetermined threshold value is selected from the plurality of time difference, time point corresponding to the time difference of the selection is made
For the sliced time of mutation time set, according to the sliced time, the mutation time set is divided into multiple mutation time
Set.Terminal is according to the sliced time, it is determined that the period corresponding to each mutation time subclass.For benchmark mutation time collection
Close, according to the initial time of each benchmark mutation time subclass, determine the period corresponding to the benchmark mutation time subclass.
For each mutation time subclass, according to the period of the mutation time subclass, searched from benchmark mutation time set
Period and the period degree of overlapping highest benchmark mutation time subclass of the mutation time subclass;By the degree of overlapping highest
Benchmark mutation time subclass and the mutation time subclass be defined as one group of set, so as to obtain multigroup subclass.
Step c:Terminal determines the similarity of every group of subclass respectively, according to the similarity of every group of subclass, it is determined that should
Similarity between the mutation time set of audio signal and the benchmark mutation time set.
In this step, for every group of subclass, the benchmark mutation time subclass includes multiple characters in a character string
Different time points corresponding to multiple notes in corresponding different time points, or a note string, for benchmark mutation time
Each time point in set, terminal are searched and the benchmark mutation time subset from the mutation time subclass of this group of subclass
The time point of the difference at the time point in conjunction within a preset range, by the mutation time subclass with benchmark mutation time subclass
Difference time point within a preset range at time point be defined as matching with the time point in the benchmark mutation time subclass
Time point.Terminal obtains the number at time point in the benchmark mutation time subclass, and calculates the benchmark mutation time subset
The number of match point in conjunction, by the business obtained by the number at time point in the number of the match point divided by the mutation time subclass
Similarity as this group of subclass.
Terminal calculates the similarity of every group of subclass successively, obtains the similarity of multigroup subclass, whole for every group of set
End calculates the product of the weight of this group of subclass and the similarity of this group of subclass, so as to obtain multiple products, every group of subclass
A corresponding product, terminal-pair more products are summed, and the Weighted Similarity after summation is defined as into the prominent of the audio signal
The similarity become between time set and the benchmark mutation time set.
Step 3032:When the comparison result is that the similarity is more than default similarity, terminal determines the comparison result pair
The prompt message answered indicates that this specifies user singing, prompt message of the output indication specified user in performance;When the comparison
During as a result for the similarity no more than the default similarity, terminal determines that prompt message corresponding to the comparison result indicates that this is specified
User is not singing, and the output indication specified user is not in the prompt message of performance.
In the embodiment of the present invention, when the similarity is more than default similarity, illustrate that this specifies the mutation time collection of user
Close and specified song the set of benchmark mutation time more coincide, terminal determine this specify user singing, when the similarity not
During more than default similarity, illustrate that this is specified and had differences between the mutation time set of user and specified song, that is, specify and use
Do not singing at family.
Further, when specified user does not sing, terminal can also be sent by following steps 304 to server
Prompting specifies user without the notification message sung.
Step 304:When prompt message indicates that terminal to server sends a notification message, should when specifying user not singing
Notification message is used to notify the server specified user not singing, so that the server specifies user to carry out specifying place this
Reason, the designated treatment include:This is reminded to specify the spectators user of user that user should be specified not sing and/or specifying user to this
Punished.
In the embodiment of the present invention, terminal detects specified user not when singing, terminal generation notification message, and to clothes
Business device sends the notification message, the notification message that server receiving terminal is sent, is sent out to the vlewer terminals for specifying user are watched
The notification message is given, vlewer terminals receive and the notification message is shown in direct broadcasting room, so as to remind spectators user's direct broadcasting room
Specified user do not singing.
Wherein, server is also based on the user behavior that this specifies user not sung but into the direct broadcasting room of live application
Punished, for example, deducting respective resources numerical value in specifying the destiny account of user from this, or specify user to send police to this
Accuse message etc..Wherein, resource numerical value can be the gold coin, game money or the amount of thumbing up of specified user's acquisition of the destiny account
Deng.
It should be noted that the above method can also be performed by server in the embodiment of the present invention, i.e. pass through server
To identify that this specifies whether user is singing.The process can be:When specified user carries out live in live application, specify and use
The terminal at family in real time sends the audio signal of recording the audio signal sent to server, server real-time reception terminal.When
When detecting identification instruction, server determines to specify the first spectrum sequence of the audio signal of user, according to the first frequency spectrum sequence
Row, determine the mutation time set of the audio signal.Server is according to the mutation time set of the audio signal and the specified song
Benchmark mutation time set corresponding to mesh, determine that this specifies whether user is singing.When it is determined that this specifies user not singing,
Server sends a notification message to the terminal of the spectators user of the direct broadcasting room, so as to remind this to specify the spectators user of user to refer to
Determine user do not singing, and server sends a notification message to the terminal of specified user, so as to specify user to punish this
Penalize.Wherein, the implementation being identified by server, it is similar with the implementation being identified by terminal, herein not
Repeat one by one again.
In the embodiment of the present invention, when terminal detects identification instruction, first determine to specify the first of the audio signal of user
Spectrum sequence, the identification, which instructs, to be used to indicate whether the detection specified user is singing specified song;Terminal is according to first frequency
Spectral sequence, the mutation time set of the audio signal is determined, the mutation time set includes multiple time points, a time point pair
Answer a mutation spectrum value;Contrast audio signal mutation time set and specified song corresponding to the set of benchmark mutation time it
Between similarity, obtain comparing result, export prompt message corresponding to comparing result, prompt message is used to indicating specifying user to be
It is no to sing.Because the first spectrum sequence based on audio signal determines mutation time set, by the mutation time set and
The benchmark mutation time set of song is specified to be identified, current song has benchmark mutation time set, therefore, the present invention
The recognition methods that embodiment provides using relatively broad, improve the identification specify user whether performance practicality.
Fig. 4 is a kind of structural representation of audio signal processor provided in an embodiment of the present invention, and the device can answer
With in the terminal, as shown in figure 4, the device includes:
First determining module 401, for when detect identification instruction when, it is determined that specify user audio signal first frequency
Spectral sequence, the identification, which instructs, to be used to indicate whether the detection specified user is singing specified song;
Second determining module 402, for according to first spectrum sequence, determining the mutation time set of the audio signal,
The mutation time set includes multiple time points, a time point corresponding mutation spectrum value;
Output module 403, the mutation time set for contrasting the audio signal specify benchmark corresponding to song to dash forward with this
The similarity become between time set, obtains comparing result, exports prompt message corresponding to the comparing result, and the prompt message is used
In instruction, this specifies whether user is singing.
In a kind of possible design, second determining module 402, including:
First determining unit, for each spectrum value in first spectrum sequence, determine two neighboring spectrum value
Between diversity factor;
Component units, for when the diversity factor between two neighboring spectrum value is more than default diversity factor, will be greater than presetting
A constitutive mutation time at time point set in time point corresponding to the two neighboring spectrum value of diversity factor.
In a kind of possible design, the output module 403, be additionally operable to determine the audio signal mutation time set and
Similarity between the benchmark mutation time set;When the comparison result is that the similarity is more than default similarity, it is determined that should
Prompt message corresponding to comparison result indicates that this specifies user singing, and the output indication specified user believes in the prompting of performance
Breath;When the comparison result is that the similarity presets similarity no more than this, determine that prompt message corresponding to the comparison result refers to
Show that this specifies user not singing, the output indication specified user is not in the prompt message of performance.
In a kind of possible design, the output module 403, it is additionally operable to determine the matching in the benchmark mutation time set
Point number, match point be the benchmark mutation time set in the time Point matching in the mutation time set of the audio signal
Time point;The total number of time in the number and the benchmark mutation time set, determines the similarity.
In a kind of possible design, the benchmark mutation time set includes multiple benchmark mutation time subclass, one
Benchmark mutation time subclass is to that should specify benchmark audio sub-signals of song;
The output module 403, including:
Division unit, for the mutation time set of the audio signal to be divided into multiple mutation time subclass;
Second determining unit, for determining multigroup subclass, one group of subclass includes same benchmark audio sub-signals pair
The benchmark mutation time subclass and mutation time subclass answered;
Second determining unit, it is additionally operable to determine the similarity of every group of subclass respectively;According to the phase of every group of subclass
Like degree, the similarity between the mutation time set of the audio signal and the benchmark mutation time set is determined.
In a kind of possible design, the device also includes:
Sending module, for indicating that this specifies user not singing constantly when the prompt message, send and notify to server
Message, the notification message is used to notify the server specified user not singing, so that the server specifies user to enter this
Row designated treatment, the designated treatment include:This is reminded to specify the spectators user of user to specify user not in performance and/or to this
Specified user is punished.
In a kind of possible design, first spectrum sequence be short-term spectrum sequence, in short-term log spectrum sequence or
Any sequence in cepstrum sequence in short-term;
First determining module 401, it is additionally operable to when first spectrum sequence is short-term spectrum sequence, gathers this and specify use
The audio signal at family, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the short of the audio signal
Time-frequency spectral sequence, using the short-term spectrum sequence as the first spectrum sequence;
First determining module 401, it is additionally operable to, when first spectrum sequence is log spectrum sequence in short-term, gather this and refer to
Determine the audio signal of user, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio signal
Short-term spectrum sequence, to the short-term spectrum sequence carry out logarithmic transformation, obtain the log spectrum sequence in short-term, this is right in short-term
First spectrum sequence of the number spectrum sequence as the audio signal;
First determining module 401, it is additionally operable to when first spectrum sequence is cepstrum sequence in short-term, gathers this and specify use
The audio signal at family, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the short of the audio signal
Time-frequency spectral sequence, to the short-term spectrum sequence carry out logarithmic transformation, obtain the log spectrum sequence in short-term, to this in short-term logarithm frequency
Spectral sequence carries out inverse Fourier transform, obtains the cepstrum sequence in short-term, using this in short-term cepstrum sequence as the audio signal the
One spectrum sequence.
In a kind of possible design, the device includes:
Acquisition module, for obtaining the lyrics or the music score of Chinese operas of specifying song;
3rd determining module, for obtaining the timestamp of the formulation song, determine in the lyrics in each character or the music score of Chinese operas
Time point corresponding to each note;
Comprising modules, form this for the time point corresponding to each note in each character in the lyrics or the music score of Chinese operas and specify
Benchmark mutation time set corresponding to song.
In the embodiment of the present invention, when terminal detects identification instruction, first determine to specify the first of the audio signal of user
Spectrum sequence, the identification, which instructs, to be used to indicate whether the detection specified user is singing specified song;Terminal is according to first frequency
Spectral sequence, the mutation time set of the audio signal is determined, the mutation time set includes multiple time points, a time point pair
Answer a mutation spectrum value;Contrast audio signal mutation time set and specified song corresponding to the set of benchmark mutation time it
Between similarity, obtain comparing result, export prompt message corresponding to comparing result, prompt message is used to indicating specifying user to be
It is no to sing.Because the first spectrum sequence based on audio signal determines mutation time set, by the mutation time set and
The benchmark mutation time set of song is specified to be identified, current song has benchmark mutation time set, therefore, the present invention
The recognition methods that embodiment provides using relatively broad, improve the identification specify user whether performance method practicality.
It should be noted that:Above-described embodiment provide audio signal processor in Audio Signal Processing, only more than
The division progress of each functional module is stated for example, in practical application, can be as needed and by above-mentioned function distribution by difference
Functional module complete, i.e., the internal structure of device is divided into different functional modules, with complete it is described above whole or
Person's partial function.In addition, the audio signal processor that above-described embodiment provides belongs to acoustic signal processing method embodiment
Same design, its specific implementation process refer to embodiment of the method, repeated no more here.
Fig. 5 is a kind of structural representation of audio signal processor provided in an embodiment of the present invention.The device can be used
The function performed by terminal in the acoustic signal processing method shown by implementation above-described embodiment.Specifically:
Terminal 500 can include RF (Radio Frequency, radio frequency) circuit 510, include one or more meters
The memory 520 of calculation machine readable storage medium storing program for executing, input block 530, display unit 540, sensor 550, voicefrequency circuit 560, biography
Defeated module 570, include the part such as one or the processor 580 of more than one processing core and power supply 590.This area
Technical staff is appreciated that the restriction of the terminal structure shown in Fig. 5 not structure paired terminal, can include than illustrate it is more or
Less part, either combine some parts or different parts arrangement.Wherein:
RF circuits 510 can be used for receive and send messages or communication process in, the reception and transmission of signal, especially, by base station
After downlink information receives, transfer to one or more than one processor 580 is handled;In addition, it is sent to up data are related to
Base station.Generally, RF circuits 510 include but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, use
Family identity module (SIM) card, transceiver, coupler, LNA (Low Noise Amplifier, low-noise amplifier), duplex
Device etc..In addition, RF circuits 510 can also be communicated by radio communication with network and other-end.The radio communication can make
With any communication standard or agreement, and including but not limited to GSM (Global System of Mobile communication, entirely
Ball mobile communcations system), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code
Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple
Access, WCDMA), LTE (Long Term Evolution, Long Term Evolution), Email, SMS (Short
Messaging Service, Short Message Service) etc..
Memory 520 can be used for storage software program and module, the terminal institute as shown by above-mentioned exemplary embodiment
Corresponding software program and module, processor 580 are stored in the software program and module of memory 520 by operation, from
And various function application and data processing are performed, such as realize the interaction based on video.Memory 520 can mainly include storage
Program area and storage data field, wherein, storing program area can storage program area, the application program needed at least one function
(such as sound-playing function, image player function etc.) etc.;Storage data field can store uses what is created according to terminal 500
Data (such as voice data, phone directory etc.) etc.., can be with addition, memory 520 can include high-speed random access memory
Including nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-states
Part.Correspondingly, memory 520 can also include Memory Controller, to provide processor 580 and input block 530 to storage
The access of device 520.
Input block 530 can be used for the numeral or character information for receiving input, and generation is set with user and function
Control relevant keyboard, mouse, action bars, optics or the input of trace ball signal.Specifically, input block 530 may include to touch
Sensitive surfaces 531 and other input terminals 532.Touch sensitive surface 531, also referred to as touch display screen or Trackpad, collect and use
Family on or near it touch operation (such as user using any suitable object or annex such as finger, stylus in touch-sensitive table
Operation on face 531 or near touch sensitive surface 531), and corresponding linked set is driven according to formula set in advance.It is optional
, touch sensitive surface 531 may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus detection is used
The touch orientation at family, and the signal that touch operation is brought is detected, transmit a signal to touch controller;Touch controller is from touch
Touch information is received in detection means, and is converted into contact coordinate, then gives processor 580, and can reception processing device 580
The order sent simultaneously is performed.Furthermore, it is possible to using polytypes such as resistance-type, condenser type, infrared ray and surface acoustic waves
Realize touch sensitive surface 531.Except touch sensitive surface 531, input block 530 can also include other input terminals 532.Specifically,
Other input terminals 532 can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.),
One or more in trace ball, mouse, action bars etc..
Display unit 540 can be used for display by the information of user's input or be supplied to the information and terminal 500 of user
Various graphical user interface, these graphical user interface can be made up of figure, text, icon, video and its any combination.
Display unit 540 may include display panel 541, optionally, can use LCD (Liquid Crystal Display, liquid crystal
Show device), the form such as OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) configure display panel
551.Further, touch sensitive surface 531 can cover display panel 541, when touch sensitive surface 531 detects touching on or near it
After touching operation, processor 580 is sent to determine the type of touch event, is followed by subsequent processing type of the device 580 according to touch event
Corresponding visual output is provided on display panel 541.Although in Figure 5, touch sensitive surface 531 and display panel 541 are conducts
Two independent parts come realize input and input function, but in some embodiments it is possible to by touch sensitive surface 531 with display
Panel 541 is integrated and realizes input and output function.
Terminal 500 may also include at least one sensor 550, such as optical sensor, motion sensor and other sensings
Device.Specifically, optical sensor may include ambient light sensor and proximity transducer, wherein, ambient light sensor can be according to environment
The light and shade of light adjusts the brightness of display panel 541, and proximity transducer can close display when terminal 500 is moved in one's ear
Panel 541 and/or backlight.As one kind of motion sensor, gravity accelerometer can detect in all directions (generally
Three axles) acceleration size, size and the direction of gravity are can detect that when static, available for identification mobile phone posture application (ratio
Such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap);Extremely
The other sensors such as the gyroscope that can also configure in terminal 500, barometer, hygrometer, thermometer, infrared ray sensor, herein
Repeat no more.
Voicefrequency circuit 560, loudspeaker 561, microphone 562 can provide the COBBAIF between user and terminal 500.Audio
Electric signal after the voice data received conversion can be transferred to loudspeaker 561, sound is converted to by loudspeaker 561 by circuit 560
Sound signal exports;On the other hand, the voice signal of collection is converted to electric signal by microphone 562, after being received by voicefrequency circuit 560
Voice data is converted to, then after voice data output processor 580 is handled, through RF circuits 510 to be sent to such as another end
End, or voice data is exported to memory 520 further to handle.Voicefrequency circuit 560 is also possible that earphone jack,
To provide the communication of peripheral hardware earphone and terminal 500.
Terminal 500 can help user to send and receive e-mail, browse webpage and access streaming video by transport module 570
Deng it has provided the user broadband internet wirelessly or non-wirelessly and accessed., can be with although Fig. 5 shows transport module 570
Understand, it is simultaneously not belonging to must be configured into for terminal 500, can not change the essential scope of invention as needed completely
It is interior and omit.
Processor 580 is the control centre of terminal 500, and each portion of whole mobile phone is linked using various interfaces and circuit
Point, by running or performing the software program and/or module that are stored in memory 520, and call and be stored in memory 520
Interior data, the various functions and processing data of terminal 500 are performed, so as to carry out integral monitoring to mobile phone.Optionally, processor
580 may include one or more processing cores;Preferably, processor 580 can integrate application processor and modem processor,
Wherein, application processor mainly handles operating system, user interface and application program etc., and modem processor mainly handles nothing
Line communicates.It is understood that above-mentioned modem processor can not also be integrated into processor 580.
Terminal 500 also includes the power supply 590 (such as battery) to all parts power supply, it is preferred that power supply can pass through electricity
Management system and processor 580 are logically contiguous, so as to realize management charging, electric discharge and power consumption by power-supply management system
The functions such as management.Power supply 590 can also include one or more direct current or AC power, recharging system, power supply event
The random component such as barrier detection circuit, power supply changeover device or inverter, power supply status indicator.
Although being not shown, terminal 500 can also include camera, bluetooth module etc., will not be repeated here.Specifically in this reality
Apply in example, the display unit of terminal 500 is touch-screen display, and terminal 500 also includes memory, and one or one
More than program, one of them or more than one program storage in memory, and be configured to by one or one with
Upper computing device said one or more than one program bag, which contain, is used to implement the performed operation of terminal in above-described embodiment
Instruction.
In the exemplary embodiment, a kind of computer-readable recording medium for being stored with computer program, example are additionally provided
The memory of computer program is such as stored with, above computer program realizes the audio in above-described embodiment when being executed by processor
Signal processing method.For example, the computer-readable recording medium can be read-only memory (Read-Only Memory, ROM),
Random access memory (Random Access Memory, RAM), read-only optical disc (Compact Disc Read-Only
Memory, CD-ROM), tape, floppy disk and optical data storage devices etc..
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment
To complete, by program the hardware of correlation can also be instructed to complete, described program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.
Claims (16)
1. a kind of acoustic signal processing method, it is characterised in that methods described includes:
When detecting identification instruction, it is determined that specifying the first spectrum sequence of the audio signal of user, the identification instruction is used for
Instruction detects whether the specified user is singing specified song;
According to first spectrum sequence, the mutation time set of the audio signal is determined, the mutation time set includes
Multiple time points, a time point corresponding mutation spectrum value;
Contrast the audio signal mutation time set and the specified song corresponding between benchmark mutation time set
Similarity, comparing result is obtained, export prompt message corresponding to the comparing result, the prompt message is used to indicate the finger
Determine whether user is singing.
2. according to the method for claim 1, it is characterised in that it is described according to first spectrum sequence, determine the sound
The mutation time set of frequency signal, including:
According to each spectrum value in first spectrum sequence, the diversity factor between two neighboring spectrum value is determined;
When the diversity factor between two neighboring spectrum value is more than default diversity factor, the two neighboring frequency of default diversity factor will be greater than
A constitutive mutation time at time point set in time point corresponding to spectrum.
3. according to the method for claim 1, it is characterised in that the mutation time set of the contrast audio signal and
Similarity between benchmark mutation time set corresponding to the specified song, obtains comparison result, exports the comparison result
Corresponding prompt message, including:
Determine the similarity between the mutation time set of the audio signal and the benchmark mutation time set;
When the comparison result is that the similarity is more than default similarity, prompt message corresponding to the comparison result is determined
Indicate that the specified user is singing, user is specified described in output indication in the prompt message of performance;
When the comparison result is that the similarity is not more than the default similarity, determine to carry corresponding to the comparison result
Show that information indicates that the specified user is not singing, user is specified described in output indication not in the prompt message of performance.
4. according to the method for claim 3, it is characterised in that the mutation time set for determining the audio signal and
Similarity between the benchmark mutation time set, including:
Determine the number of the match point in the benchmark mutation time set, match point be in the benchmark mutation time set with
The time point of time Point matching in the mutation time set of the audio signal;
The total number of time in the number and the benchmark mutation time set, determines the similarity.
5. according to the method for claim 3, it is characterised in that the benchmark mutation time set is mutated including multiple benchmark
Chronon set, a benchmark mutation time subclass correspond to a benchmark audio sub-signals of the specified song;
Similarity between the mutation time set for determining the audio signal and the benchmark mutation time set, bag
Include:
The mutation time set of the audio signal is divided into multiple mutation time subclass;
Multigroup subclass is determined, one group of subclass includes benchmark mutation time subclass corresponding to same benchmark audio sub-signals
With mutation time subclass;
The similarity of every group of subclass is determined respectively;
According to the similarity of every group of subclass, when determining that the mutation time set of the audio signal and the benchmark are mutated
Between gather between similarity.
6. according to the method for claim 1, it is characterised in that first spectrum sequence is short-term spectrum sequence, in short-term
Any sequence in log spectrum sequence or in short-term cepstrum sequence;First frequency spectrum for determining to specify the audio signal of user
Sequence, including:
When first spectrum sequence is short-term spectrum sequence, the audio signal of the specified user is gathered, to the audio
Signal carries out framing, windowing process and Short Time Fourier Transform, obtains the short-term spectrum sequence of the audio signal, will be described short
Time-frequency spectral sequence is as the first spectrum sequence;
When first spectrum sequence is log spectrum sequence in short-term, the audio signal of the specified user is gathered, to described
Audio signal carries out framing, windowing process and Short Time Fourier Transform, the short-term spectrum sequence of the audio signal is obtained, to institute
State short-term spectrum sequence and carry out logarithmic transformation, obtain the log spectrum sequence in short-term, log spectrum sequence is made in short-term by described in
For the first spectrum sequence of the audio signal;
When first spectrum sequence is cepstrum sequence in short-term, the audio signal of the specified user is gathered, to the audio
Signal carries out framing, windowing process and Short Time Fourier Transform, the short-term spectrum sequence of the audio signal is obtained, to described short
Time-frequency spectral sequence carries out logarithmic transformation, obtains the log spectrum sequence in short-term, the sequence of log spectrum in short-term is carried out inverse
Fourier transformation, obtain the cepstrum sequence in short-term, the first frequency spectrum using the sequence of cepstrum in short-term as the audio signal
Sequence.
7. according to the method for claim 1, it is characterised in that the mutation time set of the contrast audio signal and
Similarity between benchmark mutation time set corresponding to the specified song, obtains comparing result, exports the comparing result
Before corresponding prompt message, methods described includes:
Obtain the lyrics or the music score of Chinese operas of the specified song;
Obtain the timestamp of the formulation song, determine in the lyrics in each character or the music score of Chinese operas corresponding to each note when
Between point;
Time point corresponding to each note in each character in the lyrics or the music score of Chinese operas is formed corresponding to the specified song
Benchmark mutation time set.
8. a kind of audio signal processor, it is characterised in that described device includes:
First determining module, for when detect identification instruction when, it is determined that specify user audio signal the first spectrum sequence,
The identification, which instructs, to be used to indicate to detect whether the specified user is singing specified song;
Second determining module, it is described for according to first spectrum sequence, determining the mutation time set of the audio signal
Mutation time set includes multiple time points, a time point corresponding mutation spectrum value;
Output module, for contrast the audio signal mutation time set and the specified song corresponding to benchmark mutation when
Between gather between similarity, obtain comparing result, export prompt message corresponding to the comparing result, the prompt message is used
In instruction, whether the specified user is singing.
9. device according to claim 8, it is characterised in that second determining module, including:
First determining unit, for each spectrum value in first spectrum sequence, determine two neighboring spectrum value it
Between diversity factor;
Component units, for when the diversity factor between two neighboring spectrum value is more than default diversity factor, will be greater than default difference
A constitutive mutation time at time point set in time point corresponding to the two neighboring spectrum value of degree.
10. device according to claim 8, it is characterised in that
The output module, be additionally operable to determine the audio signal mutation time set and the benchmark mutation time set it
Between similarity;When the comparison result is that the similarity is more than default similarity, determine corresponding to the comparison result
Prompt message indicates that the specified user is singing, and user is specified described in output indication in the prompt message of performance;When the ratio
When being that the similarity is not more than the default similarity to result, determine that prompt message corresponding to the comparison result indicates institute
State specified user do not singing, user is specified described in output indication not in the prompt message of performance.
11. device according to claim 10, it is characterised in that
The output module, it is additionally operable to determine the number of the match point in the benchmark mutation time set, match point is described
In benchmark mutation time set with the time point of the time Point matching in the mutation time set of the audio signal;According to described
Number and the total number of the time in the benchmark mutation time set, determine the similarity.
12. device according to claim 10, it is characterised in that the benchmark mutation time set is dashed forward including multiple benchmark
Become chronon set, a benchmark mutation time subclass corresponds to a benchmark audio sub-signals of the specified song;
The output module, including:
Division unit, for the mutation time set of the audio signal to be divided into multiple mutation time subclass;
Second determining unit, for determining multigroup subclass, one group of subclass is included corresponding to same benchmark audio sub-signals
Benchmark mutation time subclass and mutation time subclass;
Second determining unit, it is additionally operable to determine the similarity of every group of subclass respectively;According to the phase of every group of subclass
Like degree, the similarity between the mutation time set of the audio signal and the benchmark mutation time set is determined.
13. device according to claim 8, it is characterised in that first spectrum sequence is short-term spectrum sequence, in short-term
Any sequence in log spectrum sequence or in short-term cepstrum sequence;
First determining module, it is additionally operable to when first spectrum sequence is short-term spectrum sequence, gathers the specified use
The audio signal at family, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio signal
Short-term spectrum sequence, using the short-term spectrum sequence as the first spectrum sequence;
First determining module, it is additionally operable to, when first spectrum sequence is log spectrum sequence in short-term, gather the finger
Determine the audio signal of user, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio
The short-term spectrum sequence of signal, logarithmic transformation is carried out to the short-term spectrum sequence, obtain the log spectrum sequence in short-term, will
First spectrum sequence of the sequence of log spectrum in short-term as the audio signal;
First determining module, it is additionally operable to when first spectrum sequence is cepstrum sequence in short-term, gathers the specified use
The audio signal at family, framing, windowing process and Short Time Fourier Transform are carried out to the audio signal, obtain the audio signal
Short-term spectrum sequence, logarithmic transformation is carried out to the short-term spectrum sequence, the log spectrum sequence in short-term is obtained, to described
Log spectrum sequence carries out inverse Fourier transform in short-term, obtains the cepstrum sequence in short-term, the cepstrum sequence conduct in short-term by described in
First spectrum sequence of the audio signal.
14. device according to claim 8, it is characterised in that described device includes:
Acquisition module, for obtaining the lyrics or the music score of Chinese operas of the specified song;
3rd determining module, for obtaining the timestamp of the formulation song, determine in the lyrics in each character or the music score of Chinese operas
Time point corresponding to each note;
Comprising modules, it is described specified for the time point composition corresponding to each note in each character in the lyrics or the music score of Chinese operas
Benchmark mutation time set corresponding to song.
15. a kind of audio signal processor, it is characterised in that including processor and memory;The memory, for depositing
Computer program;The processor, for performing the computer program deposited on the memory, realize claim 1-7
Method and step described in any one.
16. a kind of computer-readable recording medium, it is characterised in that the computer-readable recording medium internal memory contains computer
Program, the computer program realize the method and step described in claim any one of 1-7 when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710919028.9A CN107680614B (en) | 2017-09-30 | 2017-09-30 | Audio signal processing method, apparatus and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710919028.9A CN107680614B (en) | 2017-09-30 | 2017-09-30 | Audio signal processing method, apparatus and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107680614A true CN107680614A (en) | 2018-02-09 |
CN107680614B CN107680614B (en) | 2021-02-12 |
Family
ID=61137787
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710919028.9A Active CN107680614B (en) | 2017-09-30 | 2017-09-30 | Audio signal processing method, apparatus and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107680614B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108769772A (en) * | 2018-05-28 | 2018-11-06 | 广州虎牙信息科技有限公司 | Direct broadcasting room display methods, device, equipment and storage medium |
CN110299049A (en) * | 2019-06-17 | 2019-10-01 | 韶关市启之信息技术有限公司 | A kind of intelligence of electronic music shows method |
CN110335629A (en) * | 2019-06-28 | 2019-10-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Pitch recognition methods, device and the storage medium of audio file |
CN111462775A (en) * | 2020-03-30 | 2020-07-28 | 腾讯科技(深圳)有限公司 | Audio similarity determination method, device, server and medium |
CN113824979A (en) * | 2021-09-09 | 2021-12-21 | 广州方硅信息技术有限公司 | Live broadcast room recommendation method and device and computer equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1173008A (en) * | 1996-08-06 | 1998-02-11 | 雅马哈株式会社 | Karaoke scoring apparatus analyzing singing voice relative to melody data |
CN1729506A (en) * | 2002-12-20 | 2006-02-01 | 皇家飞利浦电子股份有限公司 | Audio signal identification method and system |
-
2017
- 2017-09-30 CN CN201710919028.9A patent/CN107680614B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1173008A (en) * | 1996-08-06 | 1998-02-11 | 雅马哈株式会社 | Karaoke scoring apparatus analyzing singing voice relative to melody data |
CN1729506A (en) * | 2002-12-20 | 2006-02-01 | 皇家飞利浦电子股份有限公司 | Audio signal identification method and system |
Non-Patent Citations (1)
Title |
---|
MENGLU LI,ZHIJUN ZHAO,PING SHI: "Query by Humming Based on Music Phrase Segmentation and Matching", 《2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD)》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108769772A (en) * | 2018-05-28 | 2018-11-06 | 广州虎牙信息科技有限公司 | Direct broadcasting room display methods, device, equipment and storage medium |
CN108769772B (en) * | 2018-05-28 | 2019-06-14 | 广州虎牙信息科技有限公司 | Direct broadcasting room display methods, device, equipment and storage medium |
CN110299049A (en) * | 2019-06-17 | 2019-10-01 | 韶关市启之信息技术有限公司 | A kind of intelligence of electronic music shows method |
CN110299049B (en) * | 2019-06-17 | 2021-12-17 | 韶关市启之信息技术有限公司 | Intelligent display method of electronic music score |
CN110335629A (en) * | 2019-06-28 | 2019-10-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Pitch recognition methods, device and the storage medium of audio file |
CN110335629B (en) * | 2019-06-28 | 2021-08-03 | 腾讯音乐娱乐科技(深圳)有限公司 | Pitch recognition method and device of audio file and storage medium |
CN111462775A (en) * | 2020-03-30 | 2020-07-28 | 腾讯科技(深圳)有限公司 | Audio similarity determination method, device, server and medium |
CN111462775B (en) * | 2020-03-30 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Audio similarity determination method, device, server and medium |
CN113824979A (en) * | 2021-09-09 | 2021-12-21 | 广州方硅信息技术有限公司 | Live broadcast room recommendation method and device and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107680614B (en) | 2021-02-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107680614A (en) | Acoustic signal processing method, device and storage medium | |
CN107863095A (en) | Acoustic signal processing method, device and storage medium | |
CN109166593A (en) | audio data processing method, device and storage medium | |
CN106531149B (en) | Information processing method and device | |
CN110472145A (en) | A kind of content recommendation method and electronic equipment | |
CN108347704A (en) | Information recommendation method and mobile terminal | |
CN107396137A (en) | The method, apparatus and system of online interaction | |
CN106571151A (en) | Challenge song recording method and device | |
CN109903773A (en) | Audio-frequency processing method, device and storage medium | |
CN110096611A (en) | A kind of song recommendations method, mobile terminal and computer readable storage medium | |
CN109256146A (en) | Audio-frequency detection, device and storage medium | |
CN106782600A (en) | The methods of marking and device of audio file | |
CN106210266B (en) | A kind of acoustic signal processing method and audio signal processor | |
CN108090140A (en) | A kind of playback of songs method and mobile terminal | |
CN107798107A (en) | The method and mobile device of song recommendations | |
CN106210755A (en) | A kind of methods, devices and systems playing live video | |
CN107645682A (en) | Carry out live method and system | |
CN105959482B (en) | A kind of control method and electronic equipment of scene audio | |
CN107743178A (en) | A kind of message player method and mobile terminal | |
CN108763316A (en) | A kind of audio list management method and mobile terminal | |
CN108228882A (en) | The recommendation method and terminal device of a kind of audition for the songs segment | |
CN106652981B (en) | BPM detection method and device | |
CN107731241A (en) | Handle the method, apparatus and storage medium of audio signal | |
CN107507628A (en) | Singing methods of marking, device and terminal | |
CN106558299A (en) | The mode switching method and device of audio rendition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |