CN105469802A

CN105469802A - Speech quality improving method and system and mobile terminal

Info

Publication number: CN105469802A
Application number: CN201410428590.8A
Authority: CN
Inventors: 李闻; 薛华; 王进军
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2014-08-26
Filing date: 2014-08-26
Publication date: 2016-04-06
Also published as: WO2015117343A1

Abstract

The invention discloses a speech quality improving method and system and a mobile terminal. The method comprises: obtaining, by the mobile terminal, a spatial position from the face of a user to the mobile terminal; determining audio parameters corresponding to the spatial position from the face of the user to the mobile terminal; adjusting audio parameters of a speech processing module of the mobile terminal to the audio parameters corresponding to the spatial position from the face to the mobile terminal; and outputting a speech signal, which is received from a network side, after being processed by the speech processing module, the audio parameters of which are adjusted. By means of the method, the hands-free frequency response and sensitivity can basically be kept in a relatively good state and can be basically kept unchanged, thereby avoiding the case of low frequency or medium-high frequency loss of a frequency response curve due to change of the distance or the angle from the mobile terminal to the face, improving speech quality in video call, and improving the listening effect of the user.

Description

A kind of method, system and mobile terminal improving speech quality

Technical field

The present invention relates to the communications field, be specifically related to a kind of method, system and the mobile terminal that improve speech quality.

Background technology

Now, widely, popularity is very high in the application of mobile terminal.Video calling, as a New function of mobile terminal, obtains increasing use.The lifting of the speech quality in video calling is also the problem that all mobile terminals are all devoted to make great efforts research.

From the auditory properties of people's ear, low frequency is root, if low-frequency sound sound pressure level is inadequate, then the tone color that can seem is simple, and lack dynamics, this part is very large on the impact of the sense of hearing.For intermediate frequency sound, be the region that human auditory system is the sensitiveest, suitably promote the telepresenc being conducive to strengthening playback, be conducive to promoting clearness and stereovision.Lifting for high-frequency sound can make tone color seem lively.Generally, the quality of sounding tonequality can be made of frequency response curve and judge, good frequency response curve sounds that subjective sensation is good.

During usual video calling, people can use headset mode or hands-free mode to carry out speech exchange.For hands-free mode, each user in use mobile terminal is incomplete same to the locus (such as Distance geometry angle) of the number of people, and at present mobile terminal is all placed on by fixed position for the debugging of the speech quality in video calling and calibration and carries out, in the prior art, the sensitivity of the voice signal that the low-pass filter that in video calling, hands-free mode Speech processing uses exports and frequency response set according to normal place (such as just to the position of mobile terminal camera 20cm) test, but because the difference of user's use habit, hold mobile terminal and have very big difference to the angle and distance of face, so the frequency response of mobile terminal user impression relative to sensitivity changes when non-standard location (other angle and distances).So actual use sense lacks by high pitch during to be exactly mobile terminal to the locus of face be bordering on standard value, and sound articulation and stereovision are not strong; Mobile terminal is distal to standard value during to the locus of face, then bass lacks, and sound sounds that dullness is full not, is that bass or alt frequency lack and all can affect user's hearing effect.

As can be seen here, mobile terminal is all placed on fixed position and carries out by the debugging of the speech quality in prior art in video calling hands-free mode and calibration, but along with mobile terminal is to the change of the angle and distance of face, mobile terminal all can change the frequency response of the voice signal of face and sensitivity, and such difference just have impact on the effect of speech quality.Therefore the difference needing a kind of method of Speech processing to bring to make up human factor, makes the speech quality in video calling can reach better hearing effect, thus improves Consumer's Experience effect, add the competitiveness of product in market.

Summary of the invention

The technical issues that need to address of the present invention are to provide a kind of method, system and the mobile terminal that improve speech quality, hands-free frequency response and sensitivity can be kept substantially to remain on a preferable states substantially constant, compensate for because mobile terminal to the distance of face or angle change and make the situation that frequency response curve low frequency or medium-high frequency lack, thus the speech quality in lifting video calling, promote user's hearing effect.

In order to solve the problems of the technologies described above, the invention provides a kind of method improving speech quality, comprising:

The face of acquisition for mobile terminal user is to the locus of mobile terminal;

Determine the audio frequency parameter that described face is corresponding to the locus of described mobile terminal;

The audio frequency parameter of the speech processing module of described mobile terminal is adjusted to the audio frequency parameter that described face is corresponding to the locus of described mobile terminal;

The speech processing module of the voice signal received from network side after adjustment audio frequency parameter is exported.

Further, described face comprises to the locus of described mobile terminal: described face is to the Distance geometry angle of described mobile terminal, and the angle ranging from angle just to the left or to the right to mobile terminal, described angle is less than or equal to 90 degree.

Further, described acquisition user face to mobile terminal locus before, described method also comprises:

Pre-set the locus of described face to described mobile terminal and the corresponding relation of the human face data collected.

Further, described in pre-set the locus of described face to described mobile terminal and the corresponding relation of human face data, comprising:

Described face is set to the maximal value of the distance of described mobile terminal and minimum value, arrange angle maximal value to the left be just to mobile terminal left avertence 90 degree and angle maximal value to the right for just to mobile terminal right avertence 90 degree;

According to the distance interval of presetting and angle intervals, from distance minimum value and left avertence 90 degree, gather the different distance of face to mobile terminal and the human face data of different angles successively to the right;

Preserve described different distance and human face data corresponding to different angles.

Further, described acquisition user face, to the locus of mobile terminal, comprising:

Gather the human face data of active user, the human face data of more described active user and the human face data of preservation, when the difference of the human face data of described active user and the human face data of described preservation is less than predetermined threshold value, then using locus corresponding for the human face data of described preservation as active user's face to the locus of mobile terminal.

Further, described determine described face to the audio frequency parameter that the locus of described mobile terminal is corresponding before, also comprise: the locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter;

Describedly determine the audio frequency parameter that described face is corresponding to the locus of described mobile terminal, comprising:

Determine to the locus of mobile terminal and the corresponding relation of audio frequency parameter the audio frequency parameter that described face is corresponding to the locus of described mobile terminal according to pre-configured described face.

Further, the described locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter, comprising:

According to the distance interval of presetting and angle intervals, from distance minimum value and left avertence 90 degree, measure sensitivity and the frequency response of the voice signal that same voice signal is exported to different distance and the different angles of described mobile terminal by described speech processing module process at described face to the right successively;

Calculate different distance and different angles are exported the sensitivity of voice signal and the audio frequency parameter of frequency response in critical field;

Preserve different distance and audio frequency parameter corresponding to different angles.

In order to solve the problems of the technologies described above, present invention also offers a kind of system improving speech quality, comprising:

Locus identification module, for the locus of the face to mobile terminal that obtain user;

Audio frequency parameter determination module, for determining the audio frequency parameter that described face is corresponding to the locus of described mobile terminal;

Speech processing module, for its audio frequency parameter being adjusted to the described face audio frequency parameter corresponding to the locus of described mobile terminal, then exports the voice signal received from network side.

Further, also comprise:

Configuration module, for described acquisition user face to mobile terminal locus before, pre-set described face to the locus of described mobile terminal and the corresponding relation of human face data that collects.

Further, described configuration module, for pre-setting the locus of described face to described mobile terminal and the corresponding relation of human face data, comprising:

Further, described locus identification module, for obtaining the locus of user's face to mobile terminal, comprising:

Further, described configuration module, also for before determining described face to the audio frequency parameter that the locus of described mobile terminal is corresponding, the also locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter;

Described audio frequency parameter determination module, for determining the audio frequency parameter that described face is corresponding to the locus of described mobile terminal, comprising:

Further, described configuration module, also for the locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter, comprising:

In order to solve the problems of the technologies described above, present invention also offers a kind of mobile terminal, comprising: the system improving speech quality as above.

Compared with prior art, the method of the raising speech quality that the embodiment of the present invention provides, system and mobile terminal, after video calling opens hands-free voice pattern, the frequency response that user hears and sensitivity can carry out dynamic conditioning along with user's face to the change of the locus (angle and distance) of mobile terminal, hands-free frequency response and sensitivity can be kept substantially to remain on a preferable states substantially constant, compensate for because mobile terminal to the distance of face or angle change and make the situation that frequency response curve low frequency or medium-high frequency lack, thus the speech quality improved in video calling, and improve the hearing effect of user.

Accompanying drawing explanation

Fig. 1 is the method flow diagram improving speech quality in embodiment;

Fig. 2 be in embodiment face to the schematic diagram of the angle and distance of mobile terminal;

Fig. 3 pre-sets the locus of face to mobile terminal and the process flow diagram of the corresponding relation of the human face data collected in embodiment;

Fig. 4 pre-sets the process flow diagram of face to the corresponding relation of the locus of mobile terminal and audio frequency parameter in embodiment;

Fig. 5 is the structural drawing of the system improving speech quality in embodiment.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly understand, hereinafter will be described in detail to embodiments of the invention by reference to the accompanying drawings.It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combination in any mutually.

Embodiment:

As shown in Figure 1, present embodiments provide a kind of method improving speech quality, comprise the following steps:

S101: the face of acquisition for mobile terminal user is to the locus of mobile terminal;

Wherein, described face comprises to the locus of described mobile terminal: described face is to the Distance geometry angle of described mobile terminal, and the angle ranging from angle just to the left or to the right to mobile terminal, described angle is less than or equal to 90 degree.

In actual use, the image that camera obtains be can be shown to mobile terminal display screen above.Because the mobile terminal camera generalized case of each model is on the plane top, display screen place of mobile terminal, but right position all can some difference, and have plenty of to the left, what have is to the right, also have in centre.

When the facial image seen from the display screen of mobile terminal is just in middle, the plane that actual face axis and the axis of mobile terminal vertical direction are formed also is not orthogonal to the plane at the display screen place of mobile terminal.

As shown in Figure 2, when the facial image seen above the display screen of mobile terminal is just in middle, when the face axis of display and the axis of mobile terminal vertical direction overlap, the straight line AB that at this moment actual mouth center A point (on face axis) and its are projected between the B point of mobile terminal display screen place plane and these 2 is benchmark; After face moves left and right, mouth center is changed to C point and projects the D point of mobile terminal display screen place plane.In the present embodiment, face refers to the angle of mobile terminal: the angle α between AB and BC two straight lines, face to the distance of mobile terminal is: the length d of line segment AB ₁or the length d of line segment CD ₂.

Wherein, before step S101, also comprise: pre-set described face to the locus of described mobile terminal and the corresponding relation of human face data collected, as shown in Figure 3, specifically comprise the following steps:

S201: described face is set to the maximal value of the distance of described mobile terminal and minimum value, arrange angle maximal value to the left be just to mobile terminal left avertence 90 degree and angle maximal value to the right for just to mobile terminal right avertence 90 degree;

S202: according to the distance interval of presetting and angle intervals, gathers the different distance of face to mobile terminal and the human face data of different angles successively to the right from distance minimum value and left avertence 90 degree;

S203: preserve described different distance and human face data corresponding to different angles.

Such as, the distance of face to mobile terminal is set from 10cm to 50cm, distance interval can be set to 10cm, angle is from left avertence 90 degree to right avertence 90 degree, angle intervals can be set to 10 degree, the locus of the combination of different distance and different angles can be obtained, gather the human face data of these different spatial respectively.

When specifically measuring, can the human face data of first measurement standard locus, according to 0 degree, gauged distance 20cm (namely just to mobile terminal, face is 20cm to the distance of mobile terminal), camera catches a frame human face data in this state, using these frame data as standard value;

Then, increase distance to 30cm, 40cm, 50cm, angle is constant, catch a frame human face data by camera again, facial contour at this time can diminish than entirety under gauged distance, but position is still placed in the middle, there is no angle deviating, these data and corresponding distance, angle are stored.Reduce distance to 10cm, angle is constant, catches a frame human face data equally by camera, and facial contour at this time can be larger than change overall under gauged distance, do not have angle deviating, same distance, the angle storing these data and correspondence;

Then, according to gauged distance 20cm, angle change to the left 10 degree, a frame human face data is caught by camera, facial contour is at this time compared with gauged distance, and size is constant, but overall can to the right from the angle of camera be to the right, what the view data but exported showed on the display screen of mobile terminal still offsets left to some extent, and these data and corresponding Distance geometry angle are stored.Again change to the right for angle 10 degree, catch a frame human face data by camera, facial contour is at this time compared with gauged distance, and size is constant, but overall meeting offset to the right to some extent, and these data and corresponding Distance geometry angle are stored;

Obtain human face data corresponding to the locus of multiple Distance geometry angle combinations according to which, these data are stored in a memory in the mobile terminal together with corresponding distance, angle.

Wherein, described acquisition user face, to the locus of mobile terminal, comprising:

S102: determine the audio frequency parameter that described face is corresponding to the locus of described mobile terminal;

Wherein, before determining described face to the audio frequency parameter that the locus of described mobile terminal is corresponding, also comprise:

The locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter, step S102 specifically comprises: determine to the locus of mobile terminal and the corresponding relation of audio frequency parameter the audio frequency parameter that described face is corresponding to the locus of described mobile terminal according to described face.

Wherein, pre-set the locus of face to mobile terminal and the corresponding relation of audio frequency parameter, as shown in Figure 4, specifically comprise:

Step S301 and step S201 are identical;

S302: according to the distance interval of presetting and angle intervals, measure successively to the right from distance minimum value and left avertence 90 degree same voice signal described face to the different distance of described mobile terminal and different angles by described speech processing module process after the sensitivity of voice signal that exports and frequency response;

Here it should be noted that, in the present embodiment, the granularity of division of the locus that the locus that audio frequency parameter is corresponding is corresponding with human face data and the concrete angle of division and position and angle are all consistent with the combination of position, so that the human face data collected when carrying out video calling can find corresponding audio frequency parameter.If distance interval or the angle intervals of setting are thinner, the combination of Distance geometry angle is more, and so the adjustment of audio frequency parameter is more careful.

S303: calculate different distance and different angles are exported the sensitivity of voice signal and the audio frequency parameter of frequency response in critical field;

The standard of reference is VDF about audio hands terminal receiving sensitivity and standard corresponding to frequency response, and VDF is the standard that mobile voice business Vodafone receives about audio hands terminal.

S304: preserve different distance and audio frequency parameter corresponding to different angles.

These positions and the audio frequency parameter corresponding to distance values can be obtained by experimental data, use audio test system, according to the many groups positional distance numerical value set in advance, change mobile terminal to HATS (head shoulder simulator, replace the face of user) locus, measure and obtain same voice signal different distance and angle by the sensitivity corresponding to the voice signal that exports after low-pass filter and frequency response then according to the sensitivity of tested many groups and frequency response adjustment low-pass filter, ensure that the frequency of the voice signal that low-pass filter exports can be positioned at critical field, critical field is 300 ~ 3000Hz, thus obtain different distance and audio frequency parameter corresponding to different angles numerical value.

S103: the audio frequency parameter of the speech processing module of described mobile terminal is adjusted to the audio frequency parameter that described face is corresponding to the locus of described mobile terminal;

In the present embodiment, in step S101 ~ S102, if mobile terminal cannot get the locus of face to described mobile terminal, such as the distance of face distance mobile terminal is beyond the distance maximal value preset, or not before camera overleaf, then audio frequency parameter is now chosen as default value, such as, according to 0 degree, gauged distance 20cm is (namely just to mobile terminal, face is 20cm to the distance of mobile terminal) corresponding audio frequency parameter, this audio frequency parameter is worth by default.

S104: the process of the voice signal received from network side by the described speech processing module after adjustment audio frequency parameter is exported by loudspeaker.

Wherein, preferably, speech processing module is low-pass filter.

As shown in Figure 5, present embodiments provide a kind of system improving speech quality, comprising:

Locus identification module, for obtaining the locus of user's face to mobile terminal;

Wherein, this system also comprises:

Wherein, described configuration module, for pre-setting the locus of described face to described mobile terminal and the corresponding relation of human face data, comprising:

Wherein, described locus identification module, for obtaining the locus of user's face to mobile terminal, comprising:

Wherein, described configuration module, also for before determining described face to the audio frequency parameter that the locus of described mobile terminal is corresponding, the also locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter;

Wherein, described configuration module, also for the locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter, comprising:

Wherein, preferably, speech processing module is low-pass filter.

The present embodiment additionally provides a kind of mobile terminal, comprising: the system improving speech quality as above.

Wherein, the application scenarios of the present embodiment is after user carries out video calling unlatching hands-free voice pattern, when the voice signal received from network side being detected, illustrate that user carries out video calling with the other side, now can obtain the locus of face to mobile terminal of user, and then improve the speech quality after opening hands-free voice pattern, improve user's impression.

As can be seen from above-described embodiment, relative to prior art, the method of the raising speech quality provided in above-described embodiment, system and mobile terminal, after video calling opens hands-free voice pattern, by adjusting the audio frequency parameter of speech processing module, make the frequency of the voice signal exported from speech processing module can ensure to be positioned at critical field, therefore, the frequency response that user hears and sensitivity can carry out dynamic conditioning along with user's face to the change of the locus (angle and distance) of mobile terminal, hands-free frequency response and sensitivity can be kept substantially to remain on a preferable states substantially constant, compensate for because mobile terminal to the distance of face or angle change and make the situation that frequency response curve low frequency or medium-high frequency lack, thus the speech quality improved in video calling, and improve the hearing effect of user.

The all or part of step that one of ordinary skill in the art will appreciate that in said method is carried out instruction related hardware by program and is completed, and described program can be stored in computer-readable recording medium, as ROM (read-only memory), disk or CD etc.Alternatively, all or part of step of above-described embodiment also can use one or more integrated circuit to realize.Correspondingly, each module/unit in above-described embodiment can adopt the form of hardware to realize, and the form of software function module also can be adopted to realize.The present invention is not restricted to the combination of the hardware and software of any particular form.

The foregoing is only the preferred embodiments of the present invention, be not intended to limit protection scope of the present invention.According to summary of the invention of the present invention; also can there be other various embodiments; when not deviating from the present invention's spirit and essence thereof; those of ordinary skill in the art are when making various corresponding change and distortion according to the present invention; within the spirit and principles in the present invention all; any amendment of doing, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. improve a method for speech quality, comprising:

2. the method for claim 1, is characterized in that:

Described face comprises to the locus of described mobile terminal: described face is to the Distance geometry angle of described mobile terminal, and the angle ranging from angle just to the left or to the right to mobile terminal, described angle is less than or equal to 90 degree.

3. method as claimed in claim 2, is characterized in that: described acquisition user face to mobile terminal locus before, described method also comprises:

4. method as claimed in claim 3, is characterized in that:

Describedly pre-set the locus of described face to described mobile terminal and the corresponding relation of human face data, comprising:

5. method as claimed in claim 4, is characterized in that:

Described acquisition user face, to the locus of mobile terminal, comprising:

6. method as claimed in claim 2, is characterized in that:

Described determine described face to the audio frequency parameter that the locus of described mobile terminal is corresponding before, also comprise: the locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter;

7. method as claimed in claim 6, is characterized in that:

The described locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter, comprising:

8. improve a system for speech quality, comprising:

9. system as claimed in claim 8, is characterized in that:

10. system as claimed in claim 9, is characterized in that: also comprise:

11. systems as claimed in claim 10, is characterized in that:

Described configuration module, for pre-setting the locus of described face to described mobile terminal and the corresponding relation of human face data, comprising:

12. systems as claimed in claim 11, is characterized in that:

Described locus identification module, for obtaining the locus of user's face to mobile terminal, comprising:

13. systems as claimed in claim 9, is characterized in that:

Described configuration module, also for before determining described face to the audio frequency parameter that the locus of described mobile terminal is corresponding, the also locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter;

14. systems as claimed in claim 13, is characterized in that:

Described configuration module, also for the locus of pre-configured face to mobile terminal and the corresponding relation of audio frequency parameter, comprising:

15. 1 kinds of mobile terminals, comprising: the system of the raising speech quality as described in claim 8 ~ 14.