CN103297590B

CN103297590B - A kind of method and system realizing equipment unblock based on audio frequency

Info

Publication number: CN103297590B
Application number: CN201210044261.4A
Authority: CN
Inventors: 刘成芳
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2012-02-24
Filing date: 2012-02-24
Publication date: 2016-12-14
Anticipated expiration: 2032-02-24
Also published as: WO2013123747A1; CN103297590A

Abstract

The invention discloses and a kind of realize the method and system that equipment unlocks based on audio frequency, at least one of can be extracted from the audio frequency password received by equipment: melody, rhythm, tone color；When described melody, rhythm and tone color match with default melody, rhythm, tone color respectively, described equipment unlocks.The present invention realizes, based on audio frequency, the technology that equipment unlocks, and is considered the material elements in audio frequency when unlocking, thus improves the safety that equipment unlocks.

Description

Method and system for unlocking equipment based on audio

Technical Field

The invention relates to an audio processing technology, in particular to a method and a system for unlocking equipment based on audio.

Background

With the enhancement of the experience requirements of the mobile phone, the existing mobile phone unlocking function can not meet the requirements of people. The existing mobile phone unlocking function can be mainly divided into: unlocking with a common password, fingerprint unlocking and head portrait unlocking. However, the mobile phone unlocking function cannot meet the mobile phone experience requirements of people, and particularly, the safety of ordinary password unlocking is very low; fingerprint unlocking and avatar unlocking are both based on images, but the safety of fingerprint unlocking and avatar unlocking is reduced by the current fingerprint reverse mode and makeup technology. Therefore, no matter whether the mobile phone or other equipment needing information confidentiality is used, the unlocking function with higher safety is required, but the unlocking function does not exist at present.

Disclosure of Invention

In view of this, the present invention provides a method and a system for unlocking a device based on audio to improve the security of unlocking the device.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a method for unlocking a device based on audio comprises the following steps:

the device extracts from the received audio password at least one of: melody, rhythm, timbre;

and when the melody, the rhythm and the tone are respectively matched with the preset melody, the preset rhythm and the preset tone, the equipment is unlocked.

The process of extracting the melody includes:

down-sampling the input signal x (n) to obtain y (n), detecting the end points of the y (n), and judging the end points of the signal starting and ending; dividing the signal into multiple frames according to the short-time stationarity of the music sound and extracting the fundamental frequency of the signal; and converting the extracted fundamental frequency into MIDI notes of a musical instrument digital interface according to the twelve-tone rhythm characteristics.

Extracting the fundamental frequency is realized based on an enhanced improved Mel cepstrum coefficient (Specmurt) algorithm; the enhanced Specmurt algorithm comprises a complex wavelet transform-based Specmurt algorithm for realizing enhancement, or a short-time Fourier transform (STFT) and a modified Specmurt algorithm for realizing MDCT.

The process of extracting the rhythm includes:

down-sampling the input signal x (n) to obtain z (n), performing endpoint detection on z (n), and judging the endpoints of the beginning and the end of the signal;

performing STFT on z (n) to obtain U (wk, ti), and performing ACF on the signals divided into multiple frames to obtain A (l, ti); obtaining A (wl, ti) according to the characteristics of the autocorrelation, wherein wl is l/fs sampling frequency;

combining STFT and FM-ACF to obtain a combined function Y (wk, ti) ═ U (wk, ti). a (wl, ti), solving for the cadence of the signal from Y (wk, ti).

The process of extracting the timbre comprises the following steps:

down-sampling an input signal x (n) to obtain v (n), carrying out endpoint detection on v (n), and judging endpoints q (n) of signal start and end; solving and storing the MFCC coefficients of v (n); solving the spectrum envelope for q (n), and solving the amplitude envelope of the signal for v (n).

When the melody, the rhythm and the timbre are respectively matched with the preset melody, the preset rhythm and the preset timbre, the equipment unlocking process comprises the following steps:

the method comprises the steps of obtaining a comprehensive value of matching distortion degree and path deviation of an input audio password for unlocking, judging whether rhythm of the audio password for unlocking and rhythm of a preset audio password are lower than a preset threshold value or not, comparing whether rhythm of the audio password for unlocking and rhythm of the preset audio password are consistent or not when the rhythm of the audio password for unlocking and the rhythm of the preset audio password are lower than the threshold value or not, judging whether timbre of the audio password for unlocking and timbre of the preset audio password are lower than the preset threshold value or not when the rhythm of the audio password for unlocking and the rhythm of the preset audio password are consistent, and unlocking equipment when the rhythm of the audio password.

A system for enabling device unlocking based on audio, the system comprising a musical tone feature decision module, and further comprising at least one of: the rhythm extraction module, the rhythm extraction module and the tone extraction module are arranged in the frame; wherein,

the rhythm extraction module, the rhythm extraction module and the tone extraction module are respectively used for extracting corresponding contents from the received audio password when the rhythm extraction module, the rhythm extraction module and the tone extraction module are arranged in the system;

and the musical tone characteristic decision module is used for unlocking the equipment when the melody, the rhythm and the tone respectively extracted by the melody extraction module, the rhythm extraction module and the tone extraction module are respectively matched with the preset melody, rhythm and tone.

The melody extraction module is used for:

down-sampling the input signal x (n) to obtain y (n), detecting the end points of the y (n), and judging the end points of the signal starting and ending; dividing the signal into multiple frames according to the short-time stationarity of the music sound and extracting the fundamental frequency of the signal; and converting the extracted fundamental frequency into MIDI notes according to the twelve-tone rhythm characteristics.

The melody extraction module is used for realizing extraction of the fundamental frequency based on an enhanced Specmurt algorithm; the enhanced Specmurt algorithm comprises a complex wavelet transform-based enhanced Specmurt algorithm or an STFT and MDCT-based enhanced Specmurt algorithm.

The rhythm extraction module is used for:

performing STFT on z (n) to obtain U (wk, ti), and performing ACF on the signals divided into multiple frames to obtain A (l, ti); a (wl, ti) is obtained from this in combination with the characteristics of the autocorrelation, where wl ═ l/fs;

The tone extraction module is used for:

The tone feature decision module, when unlocking the device, is to:

The invention is based on the technology of realizing equipment unlocking by audio, and takes the specific factors in the audio into consideration during unlocking, thereby improving the safety of equipment unlocking.

Drawings

Fig. 1 is a flowchart illustrating a mobile phone unlocking process according to an embodiment of the present invention;

FIG. 2 is a flow chart of feature extraction according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a principle of melody extraction according to an embodiment of the present invention;

FIG. 4 is a diagram of a Specmurt enhancement algorithm implemented by complex wavelet transform in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a Specmurt enhancement algorithm implemented by MDCT (modified discrete cosine transform) and DFT (Fourier transform) in an embodiment of the present invention;

FIG. 6 is a schematic diagram of the rhythm extraction in the embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating the principle of extracting timbre in the embodiment of the present invention;

fig. 8 is a flowchart illustrating unlocking a mobile phone according to another embodiment of the present invention;

fig. 9 is a simplified flowchart of unlocking a mobile phone according to an embodiment of the present invention.

Detailed Description

In practical applications, a flow as shown in fig. 1 may be performed, and a refinement process of the flow may be illustrated by fig. 2 to 8. For audio provided by the user (e.g., music input by the user), the process shown in fig. 2 may be performed: extracting melody, rhythm and tone to obtain music characteristics.

It should be noted that, no matter the mobile phone or other devices requiring information confidentiality, the unlocking function with higher security is required, so the method and system of the present invention are not limited to the mobile phone for the devices requiring information confidentiality, and the technology for unlocking the devices such as the mobile phone is uniform.

The following is a detailed description of the present invention with reference to the drawings and embodiments, taking a mobile phone as an example only.

When the mobile phone is unlocked based on the audio, the following three steps can be performed:

the method comprises the following steps: collecting input sources (including melody, rhythm and tone);

step two: the user inputs an audio password (comprising melody, rhythm and tone) required by unlocking;

step three: and comparing the information based on the audio to unlock the mobile phone.

In the step one, the input source of the user can be collected by using an information collecting device of a mobile phone or other terminal equipment. The concrete form can be as follows:

a. and the microphone is used for acquiring signals by acquiring air vibration waves.

b. And signal acquisition is realized by using throat vibration.

c. And the jaw bone part is vibrated to realize signal acquisition.

d. Other signal acquisitions are achieved by acquiring vibrations of the object.

No matter the input source is the preset audio password or the audio password for unlocking input during unlocking, the mobile phone can process the acquired signal (such as analog-to-digital conversion and recording sampling frequency) and store the processed signal as an input signal. Specifically, the following processing may be performed on the acquired signals: melody extraction as shown in fig. 3; tempo extraction as shown in fig. 6; as illustrated in fig. 7 for tone extraction.

When melody extraction is performed, the following operations may be performed:

a. the input signal x (n) is down-sampled to obtain y (n), suggesting that the sampling rate is reduced to 22050Hz or less, but not less than 10000 Hz.

b. The end point detection unit detects the end points of y (n) and judges the end points of the signal; the framing unit divides the signal into multiple frames according to the short-time stationarity of the musical sound and sends the multiple frames to the fundamental frequency extracting unit.

c. The fundamental frequency extraction unit extracts the fundamental frequency of the signal by using an enhanced Specmurt (modified Mel cepstral coefficient) algorithm.

d. And converting the extracted fundamental frequency into MIDI (musical instrument digital interface) notes by a temperament conversion unit according to the twelve-tone temperament characteristics and storing the MIDI notes in a database.

It should be noted that there are various implementations that can be used for the enhanced speccurt algorithm, such as the enhanced speccurt algorithm implemented by using complex wavelet transform, or the enhanced speccurt algorithm implemented by using STFT (short time fourier transform), MDCT (just one of them), and so on.

When the enhanced specmurat algorithm is implemented using the complex wavelet transform, the operations shown in fig. 4 may be performed, specifically:

a) calculated at SpecmurtIn the implementation process of the method, two aspects of the design of a harmonic structure and the solution of a frequency spectrum under logarithmic frequency are considered. Using fcent ═ log₂(f_HzPer 2) 120 to eliminate the frequency f in the Spcmurt algorithm_HzAnd the distance between the logarithmic frequencies after the relation with the logarithmic frequency fcent is converted is too small, thereby causing great influence on the calculation of the fundamental frequency. The appropriate harmonic structure is selected experimentally.

b) When the Specmurt algorithm is realized by using complex wavelet transform after short-time framing, the noise interference is reduced by adopting a method of complex wavelet transform and overlapping energy under each scale.

c) By using the low-pass filtering method, the distribution of energy of each scale is regarded as a time domain signal, and low-pass filtering is performed to filter out the interference frequency.

d) Deconvolving the result obtained in step c with the fundamental harmonic structure.

When the enhanced speccurt algorithm is implemented by using STFT and MDCT, operations as shown in fig. 5 may be performed, specifically:

a) the linear spectrum is converted to a logarithmic spectrum.

b) The defect existing when the linear frequency spectrum is directly converted into the logarithmic frequency spectrum is eliminated by adopting the method of extracting the envelope.

c) Deconvolving with the fundamental harmonic structure.

When the tempo extraction is performed, the following operations may be performed:

a. the input signal x (n) is down-sampled by a down-sampling unit to obtain z (n), which is suggested to be 11.025 Hz.

b. The end point detection unit detects the end points of the signal z (n) and judges the end points of the signal start and end; the framing unit accordingly divides the signal into a plurality of frames and stores as p (n) according to the short-time stationarity of the tones.

c. An ACF (autocorrelation solution) unit performs ACF on the signal p (n) stored by the framing unit to obtain A (l, ti); the FM-ACF unit obtains a (wl, ti) from this in combination with the auto-correlation feature, where wl ═ l/fs (sampling frequency).

d. STFT, DFT is carried out on the signal p (n) stored by the framing unit to obtain U (wk, ti); the formula is as follows:

X (k) = DFT [x (n)] = Σ_{n = 0}^{N - 1} x (n) W_{N}^{kn}, 0 \leq k \leq N - 1 .

e. the union function processing unit union STFT and FM-ACF to obtain union function Y (wk, ti) ═ U (wk, ti). A (wl, ti).

e. And solving the rhythm of the signal according to Y (wk, ti) and storing the rhythm in a database.

When performing tone extraction, the following operations may be performed:

a. the input signal x (n) is down-sampled to obtain v (n), preferably 22050 Hz. And carrying out endpoint detection on v (n), and judging the endpoints of the beginning and the end of the signal.

Specifically, the start-stop algorithm characterized by energy (E) and Zero Crossing Rate (ZCR) is based on the fact that background noise is statistically significantly different from short periods and features of speech. Setting the waveform time domain signal as x (1), windowing and framing to obtain the nth frame speech signal so as to firstly use short-time energy to make first discrimination, and on the basis of said first discrimination, using short-time zero-crossing rate to make second discrimination. When the short-time energy is used for the first discrimination, a dual-threshold comparison method is often adopted in order to prevent the local falling point of the speech energy from being mistaken as the start point and the end point.

b. Solving the spectrum envelope of the v (n) of the judged end point and storing the spectrum envelope. Such as: and carrying out short-time Fourier transform on v (n), then obtaining a maximum value for each frame of signal, and connecting local maximum values to obtain a spectrum envelope.

c. And (5) obtaining the amplitude envelope of the signal for v (n) and storing the amplitude envelope. Such as: the Teager energy operator envelopes: t (v (n)) ═ v (n)]²-v (n-1) × v (n + 1)); a low-pass filter can be used for filtering out high-frequency components, and the low-frequency components are envelopes; and solving local maximum values, setting an envelope threshold, and connecting the local maximum values.

And when the second step is executed, a song required for unlocking can be input into the mobile phone, the mobile phone collects and processes the information input by the user according to the method in the first step, and the processed result is stored in the database.

In performing step three, the tone feature decision module shown in fig. 2 may perform the following operations:

a. comparing the melodies: by adopting DTW (dynamic time warping algorithm), time warping and distance measurement calculation are combined, and errors caused by different time lengths are reduced. And acquiring a comprehensive value of matching distortion and path deviation of the input audio password for unlocking, judging whether the melody of the audio password for unlocking and the melody of the preset audio password are lower than a preset threshold value, if so, comparing the melody with the preset threshold value, and otherwise, prompting that the unlocking is unsuccessful.

b. Comparing rhythm information: and comparing whether the rhythm of the audio password used for unlocking is consistent with the preset rhythm of the audio password, if so, carrying out next comparison, and otherwise, prompting that the unlocking is unsuccessful.

C. And comparing tone color information: and obtaining the spectrum envelope, the amplitude envelope and the MFCC coefficient by adopting a DTW (delay tolerant wavelet) method to obtain a final comprehensive value, judging whether the tone of the audio password for unlocking and the tone of the preset audio password are lower than a preset threshold value or not, unlocking the mobile phone and prompting that the unlocking is successful if the tone of the audio password for unlocking and the tone of the preset audio password are lower than the preset threshold value, and otherwise prompting that the unlocking is unsuccessful.

In practical applications, only one or two of the melody, the rhythm and the tone may be determined to determine whether to unlock the lock, and the specific determination method is similar to the above description.

As can be seen from the above description, the operation idea of the present invention for unlocking a device based on audio can be represented by a process shown in fig. 9, where the process includes the following steps:

step 910: the device extracts from the received audio password at least one of: melody, rhythm, timbre.

Step 920: and when the melody, the rhythm and the tone are respectively matched with the preset melody, the preset rhythm and the preset tone, the equipment is unlocked. The matching may be completely consistent or may be consistent with the aforementioned threshold requirement when melody, rhythm or timbre contrast is performed.

In summary, in both the method and the system, the technology for unlocking the equipment based on the audio frequency is adopted, and specific factors in the audio frequency are considered during unlocking, so that the safety of equipment unlocking is improved.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. A method for unlocking a device based on audio is characterized by comprising the following steps:

when the melody, the rhythm and the tone are respectively matched with the preset melody, rhythm and tone, the equipment is unlocked;

when the tone extraction is performed, the following operations are specifically performed:

a. carrying out windowing and framing processing on an input signal x (n) to obtain an nth frame voice signal so as to obtain v (n), carrying out endpoint detection on v (n) by using a start-stop point algorithm and a double-threshold comparison method which are characterized by energy (E) and Zero Crossing Rate (ZCR), and judging endpoints of the beginning and the end of the signal;

b. carrying out short-time Fourier transform on v (n) of which the end points are judged, then solving a maximum value of each frame of signal, and connecting local maximum values to obtain a spectrum envelope;

c. obtaining and storing an amplitude envelope of a signal v (n) through a Teager energy operator T (v (n)) [ v (n)) ]2-v (n-1) × v (n + 1)); specifically, a low-pass filter is used for filtering out high-frequency components, and low-frequency components are envelopes; and solving local maximum values, setting an envelope threshold, and connecting the local maximum values.

2. The method of claim 1, wherein the process of extracting the melody comprises:

3. The method according to claim 2, characterized in that the extraction of the fundamental frequency is carried out based on an enhanced modified mel-frequency cepstrum coefficient (Specmurt) algorithm; the enhanced Specmurt algorithm comprises a complex wavelet transform-based Specmurt algorithm for realizing enhancement, or a short-time Fourier transform (STFT) and a modified Specmurt algorithm for realizing MDCT.

4. The method of claim 1, wherein extracting the tempo comprises:

5. The method according to any one of claims 1 to 4, wherein when the melody, rhythm and timbre are respectively matched with the preset melody, rhythm and timbre, the unlocking process of the device comprises:

6. A system for unlocking a device based on audio, the system comprising a musical tone characteristic decision module, and at least one of the following modules: the rhythm extraction module, the rhythm extraction module and the tone extraction module are arranged in the frame; wherein,

the musical tone characteristic decision module is used for unlocking the equipment when the melody, the rhythm and the tone respectively extracted by the melody extraction module, the rhythm extraction module and the tone extraction module are respectively matched with the preset melody, rhythm and tone;

wherein, the tone extraction module is specifically configured to:

a. carrying out windowing and framing processing on an input signal x (n) to obtain an nth frame voice signal so as to obtain v (n), carrying out endpoint detection on v (n) by using a start-stop point algorithm and a double-threshold comparison method which are characterized by energy (E) and Zero Crossing Rate (ZCR), and judging endpoints of the beginning and the end of the signal; b. carrying out short-time Fourier transform on v (n) of which the end points are judged, then solving a maximum value of each frame of signal, and connecting local maximum values to obtain a spectrum envelope;

7. The system of claim 6, wherein the melody extraction module, when extracting the melody, is configured to:

8. The system of claim 7, wherein the melody extraction module is configured to perform the extraction of the fundamental frequency based on an enhanced speccurt algorithm; the enhanced Specmurt algorithm comprises a complex wavelet transform-based enhanced Specmurt algorithm or an STFT and MDCT-based enhanced Specmurt algorithm.

9. The system of claim 6, wherein the tempo extraction module, when extracting the tempo, is configured to:

10. The system of any one of claims 6 to 9, wherein the musical sound characteristic decision module, when unlocking the apparatus, is configured to: