CN103200309A

CN103200309A - Entertainment audio file for text-only application

Info

Publication number: CN103200309A
Application number: CN2013100541309A
Authority: CN
Inventors: O·基尔克比
Original assignee: Nokia Oyj
Current assignee: Nokia Oyj
Priority date: 2007-04-28
Filing date: 2007-04-28
Publication date: 2013-07-10

Abstract

The invention discloses a method for generating audio files aiming at text-only applications. The method for generating the audio files comprises the following steps: adding a label to an input text, wherein the label can be used for adding a sound effect to a generated audio file; processing the label so as to form an instruction which is used for generating the audio file; and generating the audio file with the sound effect based on the instruction and presenting the text. According to an entertainment audio file for the text-only application, entertainment value is added to the text application, a form which is more compact than a general multimedia is provided, and entertainment sound is utilized so that the text-only application such as short messaging service SMS) and electronic mails are more interesting and entertaining.

Description

Be used for the only entertainment audio of the application of text

The application is to be on April 28th, 2007 applying date, and application number is 200710107719.5, and denomination of invention is divided an application for the application for a patent for invention of " only be used for the entertainment audio of the application of text ".

Technical field

Relate generally to of the present invention uses and generates audio frequency in the application of the only text of for example SMS, Email, books and newspaper.

Background technology

Although to the continuous concern of the multimedia capabilities of mobile device, most content of text unlikely is upgraded to and comprises figure and sound.For example " file " form of books and newspaper and for example the message format of SMS and Email will in long time, keep being favored with their current form.Currently begin to develop such technology, namely multimedia attraction can be added to be not too infusive text formatting to this technology in itself.

Be the content of multimedia that adds is stored with original content of text and/or to be transmitted at the most apparent method of this problem.Yet this has increased at least one order of magnitude with data volume, because text formatting is more compacter than figure and sound.United States Patent (USP) NO.7103548 discloses a kind of for the system that text message is converted into audio form, its text messages has emotion designator and the characteristic type indication of embedding, and the latter is used for any will being used to of definite multiple audio form expression characteristics type and expresses the emotion of being indicated by described emotion designator with the audio form of text message.Current MSN Messenger allows transmit leg writing labels in text in addition, and then the text is translated into picture at the receiving terminal place.Yet, prepare content in advance and eliminated possibility with " windfall effect " of environmental correclation.In addition, if when the soundscape (for example patter of rain and sound of the wind) around certain is added to voice and carries out playback by the single loud speaker in the conventional mobile device, it sounds just as the background noise that disturbs and has reduced intelligibility.

Some kinds of forms that are suitable for storing and presenting content of multimedia are arranged.What know the most is SMIL (synchronous multimedia integrate language).For being intended to disclosed material on the World Wide Web (WWW), ACSS (audio cascading style sheets) can be used for defining some attribute of sound.In conjunction with SSML (SSML is recommended by W3), can carry out some basic real-time renderings (rendering) of sound and voice.

Therefore, also do not exist and be suitable in text based is used, carrying out SGML or the corresponding software architecture that the synthetic and audio of real-time sound is played up (especially stereo or 3D sound).

Summary of the invention

The purpose of this invention is to provide a kind of use amusement sound (especially stereo or 3D audio frequency) so that for example the text application of SMS and Email is more interesting and the method for amusement.

To achieve these goals, the invention provides and a kind ofly generate the method for audio frequency at the application of text only, this method comprises: add label to input text, described label can be used for adding audio to the audio frequency that generates; Handle label to be formed for generating the instruction of audio frequency; Generate the audio frequency that has described audio based on instruction, present text simultaneously.

The present invention also provides a kind of and generates the equipment of audio frequency at the application of text only, and this equipment comprises: the label adder, be used for adding label to input text, and described label can be used for adding audio to the audio frequency that generates; Tag processor, for the treatment of label to be formed for generating the instruction of audio frequency; The audio frequency maker is used for generating the audio frequency that has described audio based on instruction, presents text simultaneously.

The present invention also provides a kind of communication terminal that can generate audio frequency at the application of text only, this communication terminal comprises: tag processor, for the treatment of being added in the input text and can be used for adding to the audio frequency that generates the label of audio, thereby be formed for generating the instruction of audio frequency; The audio frequency maker is used for generating the audio frequency that has audio based on instruction, presents text simultaneously.

Communication terminal can comprise the label adder in addition, is used for adding label to input text.

Use of the present invention can produce the audio frequency of the form of the enhancing of 3D, spatial impression and effect.For example, use stereo or the 3D audio frequency allows sound to be added to voice incessantly, if make soundscape be processed into stereo or 3D effect and by stereophone or two loud speaker playback that the interval is very near, then it can be come spatialization in the mode of not disturbing voice.For example, if the listener hears the patter of rain and sound of the wind in both sides, and voice are in central authorities, and then intelligibility can not be affected.

In addition, the present invention is intended to by adding " windfall effect or the value " that randomness increases environmental correclation to the audio frequency effect that generates, make for example when generating audio frequency in real time, playing up algorithm can consider about time (morning/day/night, working day/weekend, summer/winter) or the information of customer location (room/automobile/office, country).

In addition, the present invention can allow the text application customization fully and add entertainment value, and adds multimedia " class " to plain text.The present invention also provides and is compared to the very compact form of conventional multimedia.Because the present invention is not specific to platform, equipment of the present invention determines how to play up.

Description of drawings

According to the detailed description of the exemplary execution mode of reading below in conjunction with accompanying drawing, above-mentioned and other purposes, feature and advantage of the present invention will become obvious.

Fig. 1 is the flow chart that generates the method for audio frequency at the application of text only according to of the present invention; And

Fig. 2 is the block diagram that generates the equipment of audio frequency at the application of text only according to of the present invention.

Embodiment

With reference to the accompanying drawings, describe the present invention now in detail.

Fig. 1 represents the flow chart that generates the method for audio frequency at the application of text only according to of the present invention.

In step 100, import for example text application of SMS, Email, audio frequency books etc.In step 110, generate label from input text.Preferably, handle two groups of labels of generation (describing after a while) at audio frequency.Under specific situation, these labels can insert by hand, generate for example by user's input, or by terminal, and described terminal comprises mobile phone, PDA (personal digital assistant), laptop computer and can add any other equipment of label in the text.In order to implement this step, can use multiple SGML, these SGMLs include but not limited to VoiceXML (the speech UI and the audio frequency that are used for webpage are played up), JSML (JSpeech SGML (java of Sun Microsystems)), STML (speech text SGML), Sable (attempting in conjunction with JSML and STML), SSML (by the SSML of W3 recommendation), SMIL (being used for the synchronous multimedia integrate language that multimedia presents).In this step, also can comprise ACSS (audio cascading style sheets).It can be used to define the attribute of number voice, regulation phonetic synthesis and audio frequency, and carry out speech and audio frequency overlapping.In addition, ACSS has some space audio features (for example orientation, highly).According to the present invention, new SGML (the audio frequency XML form that for example comprises the label that is applied to voice, music and audio frequency effect) can be established in order to add for example stereo or 3D audio to audio frequency.For example, input message is that " sorry, I do not listen to, and a phone call for you.I play table tennis at that time, and I have won ".Exemplary pseudo-label is:＜continue to play: background music sorry, I do not listen to yours＜audio frequency replaces: phone.I play table tennis＜the audio frequency icon at that time: table tennis〉I have won!＜audio frequency icon: pyrotechnics〉＜finish to play: background music 〉.

In step 120, the label that adds in step 110 is converted to the instruction that can be used to synthetic video and generates the message that the control audio frequency is handled, and any one in the two can be used as the input that audio frequency is handled.Synthetic for sound, can use MIDI message.For phonetic synthesis, then can use the extended version (thereby with reference to the SSML+ among the figure 1) of SSML.Step 120 can comprise a kind of feature: randomization (randomization).For the listener, accurately repeat sound and will make the people become bored very soon, in addition horrible.In the audio frequency design of recreation, for example, can record the personage who repeatedly repeats same line usually, thereby the user does not need repeatedly accurately to listen to identical sample.Can be permitted different ways and be inserted randomness.Some examples are as follows:

General

Change the rudimentary parameter (speech, musical instrument) of playing up

Change the selection of " the sound's icon " (the very brief sound that is equal to " laugh ")

Change spatial impression effect and reprocessing

Voice

Definition

Synchronous (voice rhythm, the time-out) of change event

Revise text but do not revise the meaning

Music

Use the music of algorithm to generate

Revise tone and/or the speed of sample sound

Effect

Differently play up similar sound

Audio frequency is played up and can be supported some to play up the rudimentary control of parameter (for example, being embedded in the value in the MIDI message), and for example footsteps can change at synchronous, tone and duration, makes sound sound that the difference of the same event of total picture takes place.

Randomized advantage is significantly, and it has added unexpected value and has prevented the user owing to accurately repeat to become and be weary of or be sick of, and has stoped the audio frequency of playing up to be predicted too easily, and has obtained for the complete possibility of adjusting setting according to the personal like.

In step 130, processed with output audio from the input of step 120.For phonetic synthesis, can use TTS (Text To Speech) engine that the text of labelization (for example, SSML+) is converted into voice.Tts system is significantly improved in the past few years.Goods (artifact) make voice sound " becoming more meticulous " rather than " robot formula ".It is very natural that the quality of voice can be done, but the TTS of good quality means the bigger calculating of intensity aspect MIPS and storage.Synthesize for audio frequency, need comprise two types Composite tone of music and effect (for example footsteps, beach and chirm).Be applicable to that the MIDI as control language can comprise effect setting (reverberation, chorus etc.), priority (SP-MIDI), timestamp and influence the low-level parameters of sound.The wave table that uses in MIDI is synthetic can carry out music and effect well.Wave table Compositing Engine (audio frequency Compositing Engine) (referring to Fig. 1) is to obey GM1's (General MIDI) and can make it obey GM2, supports DLS (can download sound) and all main sample rates.

Then flow process advances to step 140, further handles the output audio from step 130.

With reference now to Fig. 2,, it illustrates according to of the present invention and generates the equipment of audio frequency at the application of text only, and this equipment is the method in the flow chart of execution graph 1 correspondingly.After the application that receives text only, the label adding set is organized label at the text generation of input more.Under specific situation, these labels can insert by hand, generate for example by user's input, or by terminal, and described terminal comprises mobile phone, PDA (personal digital assistant), laptop computer and can add any other equipment of label in the text.Preferably, can generate two groups of labels by the label adding set.One group of label is used for tts engine effectively, for this purpose, can use for example form of SSML.Another group label can be used for the audio frequency Compositing Engine effectively, and this engine can generate audio and music.Such form can be represented as audio frequency XML (referring to Fig. 2).Under the situation that for example SMS uses, the label adding set may operate on transmit leg or recipient's the terminal.

Then label processing device can be converted into label the rudimentary instruction that can be used to synthetic video and generate the message that the control audio frequency is handled, and interpolation " unexpected value ".Synthetic for sound, can use MIDI message.For TTS, then can use the extended version (thereby with reference to the SSML+ among the figure 2) of SSML.Label processing device has to operate on listener's the terminal.Label processing device can comprise a kind of feature: randomization.Utilize the sound Compositing Engine, can implement delicate variation by the little change in the rudimentary instruction.For example footsteps can change at synchronous, tone and duration, takes place thereby sound sounds the difference of the same event of total picture.

Audio frequency generating apparatus (referring to the dotted portion of Fig. 2) receives the output from label processing device.For phonetic synthesis, it is favourable using tts engine to carry out and handling.Synthetic for audio frequency, then use the wave table Compositing Engine to carry out music well and effect is favourable.

Apparatus for processing audio is used to carry out for example 3D algorithm and reprocessing from the output of TTS and audio frequency Compositing Engine.Apparatus for processing audio can be carried out down at least one in the surface function: 3dpa, Mono are to 3d space sense enhancing, stereo amplification, reverberation, equilibrium (equalizer) and DRC (dynamic range control).In addition, apparatus for processing audio is supported the real time altering (3D position, at the T60 of reverberation) of sample rate conversion, mixing, parameter alternatively.

Equipment of the present invention can be applied in and can generate at the application of text only in the communication terminal of audio frequency, this communication terminal comprises label processing device, for the treatment of being added in the input text and can be used for adding to the audio frequency that generates the label of audio, thereby be formed for generating the instruction of audio frequency; The audio frequency generating apparatus is used for generating the audio frequency that has audio based on instruction, presents text simultaneously.Alternatively, communication terminal can comprise the label adding set in addition, is used for adding label to input text.Communication terminal for example is portable terminal.

Although disclose specific implementations of the present invention, it will be appreciated by those skilled in the art that and to make a change and can not depart from the spirit and scope of the present invention at specific execution mode.The present invention is absorbed in audio frequency, but also can carry out at the situation that is equal to that is used for adding to the application of text figure.Therefore, the invention is not restricted to specific execution mode, and intention is that claims comprise any and all such application, modification and the execution mode in the scope of the present invention.

Claims

One kind at the application of text only to the method that the text of input adds label, comprising:

Text to input adds (110) label, and wherein said label can be used for adding audio to the audio frequency that generates in real time when generating audio frequency.
2. method according to claim 1 comprises producing two groups of labels, and these two groups of labels comprise first group and second group of second form of first form.
3. method according to claim 1, first group of wherein said first form produce at speech synthesis engine, and described second group of described second form produces at the audio frequency Compositing Engine.
4. method according to claim 1 wherein according to the described label of the text generation of described input, or is inserted described label by hand.
5. method according to claim 1, wherein said audio comprises the 3D audio.
6. one kind generates the method for audio frequency at the application of text only, comprising:

Handle (130) and add the label of the text of input to, to be formed for generating the instruction of audio frequency;

Generate the audio frequency that (140) have audio in real time based on described instruction, present described text simultaneously.
7. method according to claim 6 comprises audio frequency execution 3D algorithm and back effect to the generation that has described audio.
8. method according to claim 6 is wherein carried out in listener's terminal and is handled described label to be formed for generating the instruction of audio frequency.
9. method according to claim 6 is wherein handled described label and is further comprised interpolation randomness.
10. method according to claim 9 is wherein added randomness and is comprised that consideration is about the information of time or user's position.
11. method according to claim 9 is wherein added randomness and realized by the change in the described instruction, wherein said change changes mode or the parameter that is used for the described audio frequency of generation.
12. method according to claim 10, wherein said mode or parameter comprise at least one in the following content: rudimentary selection, spatial impression effect and reprocessing of playing up parameter, the sound's icon, definition, event synchronously, revise text but do not revise the meaning, use the music generation of algorithm and differently play up similar sound.
13. method according to claim 6 wherein generates the audio frequency have audio and also comprises and utilize tts engine to carry out phonetic synthesis.
14. method according to claim 6 wherein generates the audio frequency that has audio and comprises that also utilizing the audio frequency Compositing Engine to carry out audio frequency synthesizes.
15. an equipment comprises:

The label adding set is used for adding label to the text of the only input of the application of text, and described label can be used for adding audio to the audio frequency that generates in real time when generating audio frequency.
16. equipment according to claim 15 comprises for the device of carrying out according to any described method of claim 2 to 5.
17. an equipment comprises:

Label processing device, for the treatment of label to be formed for generating the instruction of audio frequency;

The audio frequency generating apparatus is used for generating the audio frequency that has audio in real time based on described instruction, presents described text simultaneously.
18. equipment according to claim 17 comprises for the device of the audio frequency of the generation that has described audio being carried out 3D algorithm and back effect.
19. equipment according to claim 17, the device that wherein is used for the described audio frequency of generation uses described instruction to come synthetic speech and control audio frequency to handle.
20. equipment according to claim 17 comprises for the device of carrying out according to Claim 8 any described method of 14.