CN101552002A

CN101552002A - Media broadcasting device and media operating method

Info

Publication number: CN101552002A
Application number: CNA2009100784893A
Authority: CN
Inventors: 须清
Original assignee: Beijing Paragon Technology Co Ltd
Current assignee: Beijing Paragon Technology Co Ltd
Priority date: 2009-02-25
Filing date: 2009-02-25
Publication date: 2009-10-07
Anticipated expiration: 2029-02-25
Also published as: CN101552002B

Abstract

The present invention provides a media broadcasting device and media operating method. The media broadcasting device comprises the following components: an information memory stack which is stored at least one piece of multimedia information, a multimedia signal output component, a sound input component, a characteristic extracting component, a medium information characteristic storing component, a characteristic similarity calculating component, a characteristic similarity determining component and a multimedia information selecting component. The piece of multimedia information corresponding with the information segment with maximum similarity in the information memory stack is selected and transmitted to the multimedia signal output component. The media information with maximum similarity with the input sound is selected from the media player through sound input. The operating mode of prior media player is changed. A more accurate media information positioning is provided. Furthermore the manual operation is not required in most of conditions. The selection of media information can be executed and the broadcasting starting point of media can be controlled directly through speaking or singing. The operating difficulty of user is greatly reduced. The operation of media player even can be realized by the blind or the user with little player operation.

Description

A kind of media playing apparatus and media manipulation method

Technical field

The present invention relates to a kind of media playing apparatus and media manipulation method, particularly relate to media playing apparatus and the method for operating of selecting required melody by the sound input.

Background technology

Multimedia player, quite universal as MP3, MP4, portable terminal, computer etc., these equipment all have bigger information storage capacity usually, have stored a lot of bar multimedia messagess.The method that will select required multimedia messages to play from these multimedia messagess mainly is earlier multimedia messages to be classified by certain rule, is selected by operation interface by the operator in the mode of choice menus then.When the multimedia messages content is a lot, will make the menu level of operation interface a lot, also be the comparison difficulty to such an extent as to choose required multimedia messages.And the shown menu option of operation interface only shows caption of multimedia information usually usually, when the multimedia messages content is a lot, people often see that it is required content that title can not know whether sometimes, often choose back audition or try that it is not required that the back is found, situation about reselecting again.

In today of internet development, the content of multimedia in the network is very huge especially, therefrom searches for required content and is not easy, and particularly search is difficult to acquisition especially when people do not remember the title of content of multimedia clearly.

In recent years, speech recognition technology and also many based on the research of sound control operation electronic equipment, also some commercializations on some mobile terminal devices are as selecting to carry out making call operation etc. by sound.U.S. patent documents No.4,277,644 and No.6,101,467 have covered the various aspects of speech recognition software.And the method that is used to characterize audio content also has description.Particularly U.S. Patent No. 6,054, and 646 and No.6,173,250 have covered the method that is used for characterizing by features such as beat, energy, pitches music.

Although have at the progress of the characterizing method of speech recognition, audio signal analysis and musical features and development in recent years, and realize voice control on some electronic equipments, it uses the demand that can not satisfy people under many circumstances.For example, often a kind of situation that occurs is, some difficulties appear in people when using the content of multimedia that multimedia player selects oneself to like, perhaps can groan out certain segment or a certain sentence of melody in this multimedia messages at that time or only can groan out the approximate content of a trifle of melody melody, but be exactly title that can not remember content of multimedia, thereby can't find required media content effectively.

On open was on July 13rd, 2005, and publication number is to have mentioned by go to select the content in desired signal source behind the phonetic feature that extracts signal source with phonetic feature in the Chinese invention patent application disclosure of CN1639975A.Disclosed watchdog function (Watch Dog) in the disclosure file particularly: the user can sing or groan out one style (pattern) to voice-frequency sender-player sound intermediate frequency analyzer, voice-frequency sender-player can monitor different channels at that specific tone then, and the user can input to voice-frequency sender-player with said speech by voice recognition software, and voice-frequency sender-player can be at some or all dialogue and the monologue and monitor different channel that comprises these speech then.Adopt senior matching algorithm, promptly when twice or three times appear in predetermined second in the number, announce Matching Algorithm when phrase.When coupling occurring, can produce a control event, the switching of control channel.

But the technology of foregoing description has its defective when being applied to the multimedia player with larger capacity.Because using multimedia player is not all to be the professional, during the content of the segment of singing out or groaning out or content or melody melody, it often is not the content of standard, also different as the beat of melody melody tone also different or the melody melody, but the content of its content of groaning out or singing out and desired selection has certain similarity again.Such as a first melody is that C transfers, and the multimedia messages of recording is based on also that C transfers, but the content that people groan out or sing out can transfer or C rising tune or C falling tone with F, but the rhythm of melody is similar substantially, and it is same first melody that people can judge; Perhaps a first melody is 2/4 beat, but the content of groaning out or singing out may be 4/4 beat, but the rhythm of melody is similar substantially, and it is same first melody that people can judge.In this case, prior art does not have fine solution.

On the other hand, for media player, also the situation of Cun Zaiing is, people wish that certain bar media information begins to play from certain point, prior art normally adopts fast forward button or fast backward button, but this mode of operation can only be estimated with the operator and advance or going-back position, usually inaccurate, need repeatedly just can finish by fast forward button or fast backward button.Though existing Digital Media recording mode provides the broadcast of catalogue formula menu selection mode with selected certain bar medium, but still can not solve problem by people's expectation quick control media play starting point.

In addition, the copyright problem of music, also more and more receive publicity, particularly report is arranged during the plagiarism problem of music, and some plagiarism persons are in order to hide the leak of corresponding legal provisions, the tone or the beat of melody adjusted slightly, make its with former song from different in form, but the entity content is still similar, and how this similarity is considered to plagiarize, and prior art does not propose corresponding method yet.

Summary of the invention

Technical matters to be solved by this invention is the starting point of how more effectively selecting required multimedia messages from the media store body or from the internet and arbitrarily controlling media play effectively.The present invention adopts the method for phonetic feature extraction, stage extraction, similarity calculating, similarity determination to realize that sound is controlled electronic equipment or network operation realizes obtaining automatically required multimedia messages.Technology of the present invention can also be used for or realize that melody is plagiarized or the automatic judgement of similarity simultaneously.

The term explanation: the phonetic feature of indication is and the relevant characteristic information of the rhythm of importing voice herein, and the rhythm is based on each discernible syllable, that is to say, one section multimedia comprises a lot of syllables, and be to serve as the feature that the basis obtains with each syllable when extracting phonetic feature, the feature of each syllable is combined the whole rhythm or the melody that has promptly constituted this section multimedia messages in order, therefore the characteristics combination of extracting can intercept wherein any one section, as carrying out the right basis of aspect ratio among the present invention.When one section phonetic entry comprises a plurality of melody, can only extract the theme feature or extract the feature of all melody.Phonetic feature and characteristic information are identical meanings in this article.

The term explanation: the media information of indication of the present invention has identical implication with multimedia messages, all is meant the combination in any of the voice messaging, music information, video information, data message or these information that include acoustic information.

The term explanation: indication similarity of the present invention is meant the data of the expression information correlativity that the employing relevance algorithms draws between two information, and described relevance algorithms comprises linear dependence computing method or non-linear dependencies computing method.Linear dependence computing method and non-linear dependencies computing method have provided a variety of mathematical models and computing method in existing art of mathematics and expeimental physics, quote as prior art related to the present invention.

For addressing the above problem, the technical solution of proposition is:

1, first kind of scheme of a kind of multimedia playing apparatus comprises storage medium and the multi-media signal output block of having stored at least one multimedia messages, it is characterized in that also comprising:

Sound input component both can be by the sound transducer sound import, also can be one section audio files input information of making in advance;

Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input or information;

The medium information characteristic memory unit has been stored second characteristic information corresponding to described every multimedia messages;

The characteristic similarity calculating unit is used for judging the similarity of any segment information of second characteristic information of described first characteristic information and described every multimedia messages;

The characteristic similarity decision means is chosen the similarity maximal value from described similarity data;

The multimedia messages alternative pack selects that multimedia messages at the peaked message segment of similarity place to be sent to described multi-media signal output block from described storage medium.

2, second kind of scheme of a kind of multimedia playing apparatus comprises storage medium and the multi-media signal output block of having stored at least one multimedia messages, it is characterized in that also comprising:

The medium information characteristic calculating unit calculates second characteristic information corresponding to described every multimedia messages;

3, for the extracting method of first characteristic information and second characteristic information, the song of being familiar with everybody is an example, can extract the theme characteristic information of this first song, as representing with numbered musical notation or staff, has comprised the information of tempo and tone in the numbered musical notation.Can be the theme characteristic information as second characteristic information of the present invention; And different people is when singing out or groaning out this first song, its tempo and/or tone may be different with tempo, tone that this first song itself is determined, also may be different with tempo, the tone of second characteristic information in the message segment of record into multimedia messages, if but all be at same first singing songs, their theme is to have very big similarity.Therefore after carrying out beat adjustment and/or tone adjustment for second characteristic information, carry out similarity with first characteristic information again and calculate.Described melody also can be represented with staff or other melody.In the multimedia messages of music was handled, wherein a kind of music media form was a music score file, and this file is with the data mode stored sound of expression note, musical instrument and sharpness information, and most popular data layout is the MIDI data layout.The MIDI file comprises standard how to reproduce sound, can be considered to a music score of electronically readable form, the sound channel that will consider when it comprises the represented music score of relevant data of storing in each MIDI file of resetting, used device and the information of the parameter of entering a higher school.Collective term " parameters,acoustic " expression for example defines, and pitch, note or its residual value are respectively the description that responds grade, velocity of sound, tone color or special-effect such as trill or reverberation.Therefore described MIDI file has comprised second characteristic information of wanting required for the present invention, can be at each bar or the pairing MIDI file of each first multimedia messages as second characteristic information of the present invention, accordingly, same procedure is also adopted in the extraction of first characteristic information, and the MIDI file that extracts the input voice is as first characteristic information.Perhaps carry out one of features such as data extract removal musical instrument, response grade, tone color trill, reverberation or several back again as second characteristic information of the present invention for each bar or the pairing MIDI file of each first multimedia messages, accordingly, same procedure is also adopted in the extraction of first characteristic information, and the MIDI file that extracts the input voice is removed one of features such as velocity of sound, musical instrument, response grade, tone color trill, reverberation or several back as first characteristic information.

In U.S. Patent No. 6,054, provided in 646 by from voice signal, extracting the method for characteristic signal, comprised cepstral coefficients method (MFCC:Mel Frequency Cepstral Coefficients), linear predict code (LPC:Linear Predictive Coding).Also provided simultaneously the parameter maps description that the MFCC feature is converted into the MIDI file.The present invention quotes in full U.S. Patent No. 6,054 here, 646 content.Can be easy to find the sound waveform file (WAVE) that will gather to be converted to the software of MIDI file in addition in existing internet, to also have the MIDI file conversion be the software of numbered musical notation and be the software of staff with the MIDI file conversion.Therefore the present invention is realizing on the basis of these existing knowledge that content of the present invention is with the relevance between the multimedia messages of the acoustic information of judging input and storage.A kind of implementation can be described below:

Voice signal for input extracts the MFCC coefficient, generates the MIDI file with the MFCC coefficient then, is the numbered musical notation file with the MIDI file conversion again, with the numbered musical notation file as first characteristic information; Multimedia messages for storage adopts identical method to extract the MFCC coefficient, generates the MIDI file with the MFCC coefficient then, is the numbered musical notation file with the MIDI file conversion again, with the numbered musical notation file as second characteristic information; Calculate the similarity of first characteristic information and second characteristic information then, just can realize the function that will reach required for the present invention according to similarity result of calculation.In different application requires, can also carry out further conversion for first characteristic information and second characteristic information, the numbered musical notation file of multimedia messages correspondence further generates as described in also comprising as second characteristic information based on the combination of the numbered musical notation file of various big accent, if promptly the numbered musical notation file of original multimedia information is a c major, can further generate the part of such as the numbered musical notation file of the big accent of D, the big accent of E, the big accent of G etc. as second characteristic information; Second characteristic information the combination that comprises also that the numbered musical notation file of described multimedia messages correspondence further generates for another example based on the numbered musical notation file of various beats, if promptly the numbered musical notation file of original multimedia information is 2/4 bat, can further generate the part of such as the numbered musical notation file of 4/4 bat, 6/8 bat etc. as second characteristic information; For another example each tone of numbered musical notation file is all used a numeral, adjacent same tone is merged into a tone, and then carry out similarity and calculate, the sound that can get rid of input is because of being out of tune or the beat difference causes the difference of similarity.

In a kind of optional implementation, first characteristic information and second characteristic information can be exactly MFCC coefficient or LPC coefficient, directly carry out similarity calculating for MFCC coefficient or LPC coefficient; Can also be exactly the MIDI file, directly carry out similarity and calculate for the MIDI file.

4, described first characteristic information comprises acoustic tones information and/or inflection information; Described second characteristic information comprises acoustic tones information and/or the inflection information that comprises in the multimedia messages.

5 or described first characteristic information comprise sound pitch information and/or change in pitch information; Described second characteristic information comprises sound pitch information and/or the change in pitch information that comprises in the multimedia messages.

6, first kind of scheme of a kind of multimedia messages system of selection selected required multimedia messages from the storage medium of second characteristic information of at least one multimedia messages and every multimedia messages correspondence, it is characterized in that comprising the steps:

The first step: by the sound input component input audio signal;

Second step: from the voice signal of described sound input component input, extract first characteristic information;

The 3rd step: the similarity data of calculating any segment information in second characteristic information of described first characteristic information and described every multimedia messages;

The 4th step: from described similarity data, choose the similarity maximal value;

The 5th step: second characteristic information from described storage medium under the peaked message segment of selection similarity;

The 6th step: from storage medium, retrieve pairing that multimedia messages according to the second affiliated characteristic information.

7, this method also comprises the step with pairing that multimedia messages output.

8, this method also comprises in storage medium the step of input multimedia messages, is input to multimedia messages in the storage medium from other media or connects by network by wired or wireless mode multimedia messages is downloaded in the storage medium.

9, further be this method, also comprise the multimedia messages of described input is calculated the step of the second corresponding characteristic information and is stored in the described storage medium.

10 or this method also comprise directly the step of in the storage medium input multimedia messages and corresponding second characteristic information.

11, the length of any segment information in described second characteristic information is identical with the length of described first characteristic information, any segment information in perhaps described second characteristic information by the beat adjustment after and/or after the tone adjustment length with described first characteristic information identical.

12, described second characteristic information and described first characteristic information be music the rhythm or melodic information.

13 or described second characteristic information and described first characteristic information be the rhythm or the melodic information of having removed beat length.

14, the computing method in described the 3rd step are the linear dependence computing method.A kind of first characteristic information and second characteristic information of realizing being based on numbered musical notation is because usually numbered musical notation can be with three octave notes and beat perfect representation.Because all being 1 to 7 numeral, note adds that high note or low note and pause sound (representing with 0 usually) represent.Can carry out following processing when being converted into characteristic information of the present invention.For high pitch (the 3rd octave) with 8 to 15 totally 7 numerals, for bass (first octave) with-7 to-1 totally 7 numerals, for middle pitch (second octave) with 1-7 totally 7 numerals, the pause sound is represented with 0, therefore characteristic information of the present invention has been transformed into numerical information in this implementation, the corresponding numeral of each beat.With linear dependence degree computing method, can be easy to calculate the similarity of first characteristic information and second characteristic information, even first characteristic information is different with the pitch or the tone of second characteristic information, if but exist similarly, then each beat pitch or tone all can correspondingly change.As second characteristic information is that C in the music transfers, and second characteristic information can be that B transfers, because the numeral of each beat all takes place correspondingly to change according to determined accent, though the numeral of each beat is different, the similarity of calculating is but very high.The mathematic calculation of linear similarity belongs to known algorithm, just repeats no more here.Sometimes it is different with the beat of second characteristic information of multimedia messages the represented beat of first characteristic information of voice also can to occur importing, as second characteristic information is 2/4 beat, and first characteristic information is 4/4 beat, but the theme of its expression may be similar, therefore calculates similarity and need adjust for the beat of first characteristic information and/or second characteristic information before.One of method of adjustment be with the data of a beat with identical beat of data expansion, be 5 can be adjusted into two beats as the data of certain beat, each beat all is 5; Two of method of adjustment is that two continuous beats that data are identical are reduced to a beat, all is 5 can be adjusted into a beat as the data of certain two continuous beat, and beat data is 5.

15, second of a kind of multimedia messages system of selection kind of scheme selected required multimedia messages from the storage medium of having stored a multimedia messages at least, it is characterized in that comprising the steps:

The first step: by the sound input component input audio signal;

The 3rd step: second characteristic information that calculates every multimedia messages correspondence;

The 4th step: the similarity data of calculating any segment information in second characteristic information of described first characteristic information and described every multimedia messages;

The 5th step: from described similarity data, choose the similarity maximal value;

The 6th step: retrieve pairing that multimedia messages according to second characteristic information under the peaked message segment of similarity.

The difference of second kind of scheme and first kind of scheme is that second characteristic information of every multimedia messages is to be stored in the memory bank in advance, or just calculates during application need.

16, a kind of first kind of scheme of multimedia messages player operation method, from the storage medium of second characteristic information of at least one multimedia messages and every multimedia messages correspondence, select required multimedia messages to play, it is characterized in that comprising the steps:

The first step: by the sound input component input audio signal;

The 6th step: from storage medium, retrieve pairing that multimedia messages and play output according to the second affiliated characteristic information.

Second characteristic information of every multimedia messages correspondence can adopt the MIDI file, perhaps extracts the partial element of MIDI file out.

17, second kind of scheme of a kind of multimedia messages player operation method selected required multimedia messages from the storage medium of having stored a multimedia messages at least, it is characterized in that comprising the steps:

The first step: by the sound input component input audio signal;

The 6th step: retrieve pairing that multimedia messages according to second characteristic information under the peaked message segment of similarity and play output.

Technology of the present invention can also be used to judge the similarity of two songs, is judging music has bigger use in whether plagiarizing.

18, a kind of music similarity determination methods of carrying out is carried out the similarity judgement for the first music and second music, it is characterized in that comprising the steps:

The first step: from the multimedia messages of described first music, extract first characteristic information;

Second step: a plurality of message segments that described first characteristic information resolved into the certain-length that begins with any starting point;

The 3rd step: from the multimedia messages of described second music, extract second characteristic information;

The 4th step: calculate described a plurality of message segments any one section with described second characteristic information in the similarity data of any segment information;

The 6th step: judge that whether the similarity maximal value surpasses the threshold values of setting, if surpass the threshold values of setting then judge described first music and the described second music similarity height, otherwise described first music and described second music similarity are low.

In a plurality of message segments of above-mentioned certain-length, for the regulation of certain-length can with the definition of relevant legal document carry out related, as stipulate continuous 7 beats similarly be identified as plagiarism, described certain-length can be set at the length of 7 beats.

The implication of the threshold values of above-mentioned setting is according to determining for the strict degree of the execution of relevant law.Similar just calculation for strictness plagiarized, and then the threshold values of She Dinging is just very high, near 1; Just can suitably reduce when carrying out the threshold values of setting when strict degree reduces, as be 0.8 or 0.9.

19, a kind ofly carry out the music similarity determination methods, it is characterized in that comprising the steps: for the music in the internet

The first step: from the multimedia messages of described first music, extract first characteristic information behind the multimedia messages of first characteristic information of the multimedia messages of input first music or input first music;

The 3rd step: from the internet, download second music multimedia messages second characteristic information or from the internet, download the multimedia messages of second music after from the multimedia messages of described second music, extract second characteristic information;

Adopt technology of the present invention can also be used for the internet and carry out media information search, a kind of effective more a kind of search system and searching method are provided.

20, a kind of a kind of scheme of network searching system comprises remote server component and proximal piece, it is characterized in that:

Described proximal piece comprises:

Sound input component;

Characteristic extracting component is extracted first characteristic information from the voice signal of described sound input component input;

The information transmit block arrives described remote server component with first characteristic information by network delivery;

Information is accepted parts, receives the multimedia messages that described remote server component sends over;

Described remote server component comprises:

Information is accepted parts, receives first characteristic information that sends over from described proximal piece;

The media information memory unit has been stored at least one multimedia messages;

The medium information characteristic memory unit calculates and stores or stored in advance second characteristic information corresponding to described every multimedia messages;

The multimedia messages alternative pack selects one or more multimedia messages of the second characteristic information correspondence at the peaked message segment of described similarity place to be sent to described proximal piece from described media information memory bank.

21, second of a kind of network searching system kind of scheme comprises remote server component and proximal piece, it is characterized in that:

Described proximal piece comprises:

Sound input component;

Download parts, download second characteristic information of every multimedia messages from described remote server component;

The near-end memory unit, storage is from second characteristic information of every multimedia messages downloading parts

Alternative pack takes out pairing second characteristic information of similarity maximal value from described memory unit;

The information transmit block arrives described remote server component with selected second characteristic information of alternative pack by network delivery;

Described remote server component comprises:

Information is accepted parts, receives second characteristic information that sends over from described proximal piece;

The multimedia messages alternative pack, pairing one or more multimedia messages of selecting to accept to receive in the parts with described information from described media information memory bank of second characteristic information is sent to described proximal piece.

22, described multimedia messages is one of following message or its combination: literal, picture, sound, melody, film, TV.

Technology of the present invention can also be used for the device according to sound input automatic page turning, as concert performer's music score page turning etc.

23, a kind of automatic page turning device comprises media information memory bank and the display unit of having stored at least one multimedia messages, it is characterized in that also comprising:

Sound input component;

The characteristic similarity calculating unit, the part of the multimedia messages that is used to judge that described first characteristic information and described display unit are shown the current location of the pairing multimedia messages of message segment of the corresponding second characteristic information similarity maximum;

The page turning decision means is when the current location of described multimedia messages is following one page content that the then described display unit of ending of the part of the shown multimedia messages of described display unit shows described multimedia messages;

24, a kind of a kind of scheme of the assistant learning system of singing comprises the memory bank of having stored at least one multimedia messages, it is characterized in that also comprising:

Sound input component;

The media information alternative pack selects to prepare certain bar multimedia messages of study;

Medium information characteristic is extracted parts, extracts second characteristic information of selected multimedia messages;

The characteristic similarity calculating unit is used for calculating and judging the corresponding second characteristic information similarity with selected multimedia messages of described first characteristic information;

The information indicating parts provide the difference of the sound and the multimedia messages of input according to the information similarity.

25, second of a kind of assistant learning system of singing kind of scheme comprises the memory bank of having stored at least one multimedia messages and second characteristic information corresponding with every multimedia messages, it is characterized in that also comprising:

Sound input component;

The characteristic similarity calculating unit is used to calculate and judges described first characteristic information and the selected corresponding second characteristic information similarity of multimedia messages;

Beneficial effect of the present invention: adopt technology of the present invention can realize more effectively selecting required multimedia messages from the media store body or from the internet, by the relevant part phonetic feature information of input medium, as the segment of humming certain first song can retrieve the complete information of this first song; Whether technology of the present invention can also more effectively be distinguished a first melody simultaneously other melodies of plagiarism.The present invention adopts the method for phonetic feature extraction, stage extraction, similarity calculating, similarity determination to realize that sound is controlled electronic equipment or network operation realizes obtaining automatically required multimedia messages, realize that perhaps melody is plagiarized or the automatic judgement of similarity, and can also realize melody automatic page turning function, singing assisted learning function.The present invention realizes that media player can realize selecting and the media information of input voice with very big similarity by phonetic entry from media player, changed the mode of operation of existing media player fully, has media information location feature more accurately, and most applications is without the operation of hand, directly by saying or sing the broadcast starting point that the selection that just can carry out media information also can be controlled medium, greatly reduce user's operation easier, even can realize the operation of media player for blind person or the user that is ignorant of player operation.The present invention realizes that the media research system can realize the media information that has very big similarity with the input voice by selecting in the various media servers of phonetic entry from internet or LAN, changed the way of search of existing network search engine or research tool fully, has media information location feature more accurately, and most applications is without the operation of hand, directly by saying or just singing and to carry out the search of media information, simplified user's operation easier greatly, even can realize the search of media information for blind person or the user that is ignorant of computation.

Description of drawings:

Fig. 1 is first kind of system works principle schematic that realizes multimedia information retrieval of the present invention.

Fig. 2 is second kind of system works principle schematic that realizes multimedia information retrieval of the present invention.

Fig. 3 is that first characteristic information and second characteristic information carry out first kind of algorithm principle of work synoptic diagram that similarity is calculated among the present invention.

Fig. 4 is that first characteristic information and second characteristic information carry out second kind of algorithm principle of work synoptic diagram that similarity is calculated among the present invention.

Fig. 5 is that first characteristic information and second characteristic information carry out the third algorithm principle of work synoptic diagram that similarity is calculated among the present invention.

Fig. 6 is the present invention chooses multimedia messages by the sound input a workflow synoptic diagram.

Fig. 7 is that the present invention chooses multimedia messages by sound from the internet first kind of system realizes synoptic diagram.

Fig. 8 is that the present invention chooses multimedia messages by sound from the internet second kind of system realizes synoptic diagram.

Fig. 9 is that the present invention realizes music score automatic page turning system principle synoptic diagram.

Figure 10 is the present invention's assistant learning system principle schematic that realizes singing.

Figure 11 is that the present invention realizes the media player principle schematic.

Embodiment:

Core point of the present invention is, the acoustic information of input is handled, and extracts first characteristic information, adopts second characteristic information in special algorithm and the multimedia messages to carry out similarity calculating then.That multimedia messages of selecting the similarity maximum is as the desired selected multimedia messages of being imported of sound.When the input of multimedia messages and sound and processing element thereof concentrate in the embedded system, can design based on portable sets such as media player of the present invention, palm PC, portable terminal, notebook computers.When multimedia information storage in server, and the sound input is in client, the acoustic information system parts not only can be integrated in the server but also can be integrated in the client, server is connected by LAN or internet with client, can design based on media research of the present invention system, music infringement decision-making system, sing learning system, music score automatic page turning device.

Further describe specific embodiments of the present invention below in conjunction with accompanying drawing.

Fig. 1 is based on first kind of implementation of media play system that the present invention realizes that the sound input is selected.In this scheme, characteristic similarity calculating unit 105 comprise two inputs: one is to extract parts 103 from first characteristic information, and it is by handling the voice messaging from phonetic entry parts 101, therefrom characteristic information extraction; Another is that its feature by taking out media information from medium information characteristic memory unit 102 intercepts any one section characteristic information then from any segment information intercepting parts 104 in second characteristic information.A plurality of similarity data that characteristic similarity calculating unit 105 will calculate are exported to characteristic similarity decision means 106, screen comparison by these parts, therefrom choose the second affiliated characteristic information of that section characteristic information of similarity maximum and from storage medium 107, select required multimedia messages as multimedia messages alternative pack 108.The media information of second characteristic information that medium information characteristic memory unit 102 is stored and storage medium 107 storages is one to one, and promptly a media information in medium information characteristic memory unit 102 second characteristic information and the storage medium 107 is one to one.This corresponding relation also is stored in the medium information characteristic memory unit 102 or in the storage medium 107.In the specific implementation, medium information characteristic memory unit 102 and storage medium 107 can be merged into by a memory unit, and wherein second characteristic information can adopt the data form file layout with the corresponding of media information, also can adopt the file layout of database.Typical phonetic entry parts 101 specific implementations are such as being to be made of microphone, microphone signal treatment circuit and digitization of speech signals Acquisition Circuit.First characteristic information extracts aspect ratio that parts 103 the extract prosodic information in the voice, pitch information etc. in this way from the voice of being imported, and further can be converted into music-book information, as feature.Realization as media play system, in the specific design, wherein any segment information intercepting parts 104, characteristic similarity calculating unit 105, characteristic similarity decision means 106 and the multimedia messages alternative pack 108 in first characteristic information extraction parts 103, second characteristic information all realized by software by the processor of media player.The effect of its realization is, when people wish media renderer plays bar media information, can be facing to phonetic entry parts 101 hum the segment of the music information that this media information comprises by microphone, utilize method media player of the present invention just can select automatically with the institute segment of hum nearest like media information play, thereby save because of forgetting the media information title or carrying out the worry that multilevel menu is operated too much because of media information.Even the melody segment that the user hummed of media player is very inaccurate, has only that the basic rhythm is similar to get final product, thereby have very big practicality, adaptability, operability.The present invention realizes that media player can realize selecting and the media information of input voice with very big similarity by phonetic entry from media player, changed the mode of operation of existing media player fully, has media information location feature more accurately, and most applications is without the operation of hand, directly by saying or just singing and to carry out the selection of media information, simplified user's operation easier greatly, even can realize the operation of media player for blind person or the user that is ignorant of player operation.

Fig. 2 is based on second kind of implementation of media play system that the present invention realizes that the sound input is selected.The difference of this scheme and scheme shown in Figure 1 is that second characteristic information is not to be stored in the memory bank in advance, but calculates second characteristic information by medium information characteristic calculating unit 202 by the media information that reads in the storage medium 107.This implementation is than the benefit of first kind of scheme, can utilize people to improve the efficient of the feature of extracting or the feature that adjustment is extracted for the further achievement in research of phonetic feature by the algorithm that upgrades medium information characteristic calculating unit 202 at any time.

Fig. 3 is based on first characteristic information of the present invention and second characteristic information carries out the first method principle schematic that similarity is calculated.In the figure, the length of supposing first characteristic information is 4 bytes, feature 302 each byte location are labeled as a, b, c, d respectively, the length of first characteristic information 301 is 16 bytes, and the position mark of each byte is 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16.The method that intercepts second characteristic information is to be the byte number of starting point intercepting equal length with arbitrary byte, saves the not enough intercepting value of byte length.Obtain 13 intercepting sections like this, the byte location of each intercepting section is respectively 1,2,3,4; 2,3,4,5; 3,4,5,6; 4,5,6,7; 5,6,7,8; 6,7,8,9; 7,8,9,10; 8,9,10,11; 9,10,11,12; 10,11,12,13; 11,12,13,14; 12,13,14,15; 13,14,15,16.Each intercepting section is carried out similarity with first characteristic information respectively and is calculated result of calculation 303, and result of calculation 303 comprises 13 numerical value, is expressed as R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13.For n bar multimedia messages, the second characteristic information length of supposing every multimedia messages is identical, all be 16 bytes, then calculate total 13*n value by above-mentioned similarity, from 13*n value, choose maximal value again, according to the pairing second characteristic information section of this maximal value, just can learn this second characteristic information, the corresponding relation according to the media information and second characteristic information retrieves corresponding that media information then.

Because everyone hums identical tune, or say the words of identical content, humming or the speed of speaking are not necessarily identical, the prosodic features of therefore humming or saying content may be than the length difference of the same segment of the prosodic features of media information, as certain syllable is single 1/4 to clap again in the media information, and the humming or the prosodic features of saying this syllable may be two 1/4 and clap; Perhaps certain syllable in the media information is two 1/4 again and claps, and the humming or the prosodic features of saying this syllable may be single 1/4 to clap.Therefore in order to improve compatibility and the reliability that similarity is calculated, in carrying out similarity calculating, comprise the attribute byte of situation merge into to(for) the adjacent same characteristic features byte of first characteristic information and/or second characteristic information.Fig. 4 is based on first characteristic information of the present invention and second characteristic information carries out the second method principle schematic that similarity is calculated.Among this figure, except pressing the mode of Fig. 3,, calculate similarity result 403 in that first characteristic information 402 and second characteristic information 401 are not done the merging processing, result of calculation 403 comprises 13 numerical value, is expressed as R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13.Second characteristic information 401 has two place's adjacent feature identical among the figure, be feature 2 and feature 6, we merge into a feature with adjacent same characteristic features, at this moment, second characteristic information becomes the pooling information 404 of second characteristic information, first characteristic information 402 obtains result 405 with pooling information 404 by identical similarity calculating method then, and result of calculation 405 comprises 10 numerical value, is expressed as R14, R15, R16, R17, R18, R19, R20, R21, R22, R23.For n bar multimedia messages, do above-mentioned identical processing and calculating, choose maximal value again, according to the pairing second characteristic information section of this maximal value, just can learn this second characteristic information, the corresponding relation according to the media information and second characteristic information retrieves corresponding that media information then.

Fig. 5 is based on first characteristic information of the present invention and second characteristic information carries out the third method principle schematic that similarity is calculated.Compare with Fig. 4, first characteristic information among this figure exists needs to merge the adjacent same characteristic features of handling.Earlier do similarity and calculate result 503 by former first characteristic information 502 and second characteristic information 501, result of calculation 503 comprises 13 numerical value, be expressed as R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, then the pooling information 504 of first characteristic information and second characteristic information 501 are done similarity and calculate result 505, result of calculation 503 comprises 13 numerical value, is expressed as R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27.For n bar multimedia messages, do above-mentioned identical processing and calculating, choose maximal value again, according to the pairing second characteristic information section of this maximal value, just can learn this second characteristic information, the corresponding relation according to the media information and second characteristic information retrieves corresponding that media information then.

When all having the characteristic information that can merge for first characteristic information and second characteristic information, then comprise four kinds of situations and calculate, promptly first characteristic information and second characteristic information directly calculate similarity; Similarity is calculated in the pooling information of first characteristic information and second characteristic information; The pooling information of first characteristic information and second characteristic information calculate similarity; Similarity is calculated in the pooling information of the pooling information of first characteristic information and second characteristic information.

Fig. 6 is based on the present invention and adopts the sound input to select the treatment scheme synoptic diagram of multimedia messages.This figure further provides prior example, after being converted to the MIDI file behind the extraction MFCC coefficient, is converted to numbered musical notation information again as characteristic information.Idiographic flow is: carry out the voice signal input in step 601, as hum one section, voice signal for input extracts the MFCC coefficient when the step 602, in step 603 the MFCC coefficient that obtains is converted to the MIDI file, be converted to numbered musical notation information in step 604 then, generate first characteristic information in step 605; Suppose the MIDI file of having stored every multimedia messages correspondence in the media bank, if do not have, can change out earlier the MIDI file, enter the MIDI file that step 606 reads article one multimedia messages, be converted to numbered musical notation information in step 607, generate second characteristic information in step 608, calculate the similarity of first characteristic information and second characteristic information then in step 609; Judge whether the last item multimedia messages in step 610? if not then entering the MIDI file that step 614 reads next bar multimedia messages, continue the processing of step 607, step 608, step 609, step 610, judge the corresponding MIDI file of similarity maximal value if then enter step 611, step read 612 get with the similarity maximal value the multimedia file that is associated of corresponding MIDI file, at last at the selected multimedia file of step 613 output.

Fig. 7 is based on first kind of principle schematic that the present invention realizes the media research system.The media research system comprises server end 700 and client 710, and client 710 is by interconnection network or LAN 704 Connection Service device ends 700.Wherein server end 700 comprises media information database 701, media interviews processing element 702, network interface 703; Client 710 comprises information exhibition parts 706, phonetic entry parts 707, voice signal processing element 708, network interface 705.The user is by phonetic entry parts 707 input voice, as hum the segment of melody, or the voice document made in advance of copy, handle by voice signal processing element 708, comprise the digitizing of voice signal, the extraction of voice first characteristic information, first characteristic information that will extract then sends in interconnection network or the LAN 704 by network interface 705, receives first characteristic information by the network interface 703 of server end 700 and delivers to media interviews processing element 702.Media interviews processing element 702 is taken out second characteristic information of every media information from media information database 701, adopt similarity calculating method to calculate each segment of every second characteristic information and the similarity of first characteristic information with first characteristic information of receiving then, choose pairing second characteristic information of similarity maximal value, take out the media information that with similarity maximal value pairing second characteristic information be associated with the corresponding relation of second characteristic information from media information database 701 according to every media information then, and selected media information sent in interconnection network or the LAN 704 by network interface 703, by network interface 705 these media informations of client 710 and deliver to voice signal processing element 708, by voice signal processing element 708 this media information is delivered to information exhibition parts 706 and show.As media information is simple music information, and display member 706 can be voice signal output amplifier and loudspeaker or earphone.As media information is the video that comprises music information, and display member 706 can be the combiner that comprises display screen and voice signal output amplifier and loudspeaker or earphone.As receive that media information comprises many alternative media informations, then information can be presented at by the clauses and subclauses mode on the display screen of display member 706 and select for the user.The present invention realizes that the media research system can realize the media information that has very big similarity with the input voice by selecting in the various media servers of phonetic entry from internet or LAN, changed the way of search of existing network search engine or research tool fully, has media information location feature more accurately, and most applications is without the operation of hand, directly by saying or just singing and to carry out the search of media information, simplified user's operation easier greatly, even can realize the search of media information for blind person or the user that is ignorant of computation.

Fig. 8 is based on second kind of principle schematic that the present invention realizes the media research system.The media research system comprises server end 800 and client 810, and client 810 is by interconnection network or LAN 704 Connection Service device ends 800.Wherein server end 800 comprises media information database 701, media interviews processing element 802, network interface 703; Client 810 comprises information exhibition parts 706, phonetic entry parts 707, voice signal processing element 808, network interface 705 and the local media second characteristic information memory unit 809.Before carrying out phonetic search, client 810 needs to download every pairing second characteristic information of media information by interconnection network or LAN 704 from server end 800 earlier, stores into then in the second characteristic information memory unit 809.The user is by phonetic entry parts 707 input voice, as hum the segment of melody, or the voice document made in advance of copy, handle by voice signal processing element 808, comprise the digitizing of voice signal, the extraction of voice first characteristic information, voice signal processing element 808 reads second characteristic information of every media information from the second characteristic information memory unit 809 then, adopt similarity calculating method to calculate each segment of every second characteristic information and the similarity of first characteristic information with first characteristic information that extracts then, choose pairing second characteristic information of similarity maximal value, second characteristic information of choosing is sent in interconnection network or the LAN 704 by network interface 705, receive second characteristic information by the network interface 703 of server end 800 and deliver to media interviews processing element 802.Media interviews processing element 802 is taken out and the receive media information that second characteristic information is associated from media information database 701 with the corresponding relation of second characteristic information according to every media information, and selected media information sent in interconnection network or the LAN 704 by network interface 703, by network interface 705 these media informations of client 810 and deliver to voice signal processing element 808, by voice signal processing element 808 this media information is delivered to information exhibition parts 706 and show.As media information is simple music information, and display member 706 can be voice signal output amplifier and loudspeaker or earphone.As media information is the video that comprises music information, and display member 706 can be the combiner that comprises display screen and voice signal output amplifier and loudspeaker or earphone.As receive that media information comprises many alternative media informations, then information can be presented at by the clauses and subclauses mode on the display screen of display member 706 and select for the user.The present invention realizes that the media research system can realize the media information that has very big similarity with the input voice by selecting in the various media servers of phonetic entry from internet or LAN, changed the way of search of existing network search engine or research tool fully, has media information location feature more accurately, and most applications is without the operation of hand, directly by saying or just singing and to carry out the search of media information, simplified user's operation easier greatly, even can realize the search of media information for blind person or the user that is ignorant of computation.

Fig. 9 is that the present invention realizes music score automatic page turning system principle synoptic diagram.Music score automatic page turning system comprises music score display unit 901, processing element 902 and phonetic entry parts 903.Wherein processing element 902 comprises the memory bank of memory bank, processor and the stored routine software of storing music-book information.Phonetic entry parts 903 comprise microphone and voice digitization collection and the memory circuit of collecting voice.Music score display unit 901 is electronic displaing parts, as LCD, organic light emission pipe display unit, Electronic Paper display unit etc.When playing music, music score display unit 901 shows first page of content of the music score of corresponding melody under the control of processing element 902, in playing procedure, phonetic entry parts 903 are constantly gathered the sound that input is played, the rhythm that is extracted sound by processing element 902 is as first characteristic information and make similarity with the segment of second characteristic information of institute's playing music of storage in advance and calculate, can judge the position of having played music score according to the similarity maximal value, in case played and finish then automatically following one page content of music score is presented on the display unit 901 thereby processing element 902 analyzes the music score content that is presented at display unit 901, avoided the player manually to carry out the short interruption that the music score page turning causes performance.Usually the melody of concert performer's performance and the speed of music score are very approaching, and therefore carrying out not to need to merge the adjacent same characteristic features of processing when similarity is calculated.

Figure 10 is the present invention's assistant learning system principle schematic that realizes singing.The singing assistant learning system comprises display unit 1001, processing element 1002 and phonetic entry parts 1003.Wherein processing element 1002 comprises the memory bank of memory bank, processor and the stored routine software of storing musical composition information.Phonetic entry parts 1003 comprise microphone and voice digitization collection and the memory circuit of collecting voice.Music score display unit 1001 is electronic displaing parts, as LCD, organic light emission pipe display unit, Electronic Paper display unit etc.When singing or playing music, display unit 1001 shows the music score of corresponding melody under the control of processing element 1002, in singing or playing procedure, phonetic entry parts 903 are constantly gathered the sound that input is played, the rhythm that is extracted sound by processing element 902 is as first characteristic information, after one first melody finishes, second characteristic information of first characteristic information that is extracted and institute's playing music of storing is in advance made similarity by syllable to be calculated, the difference of the syllable of each syllable and standard melody when being given in singing or playing music according to similarity result of calculation, thereby processing element 902 is presented at this species diversity on the display unit 901, sing or play happy person and find mistake, and adjust own performance and reach the purpose that study is assisted according to the difference of shown syllable.

Figure 11 is that the present invention realizes the media player principle schematic.Media player 1100 comprises processor main frame 1101, control operation button 1102, earphone 1103 and microphone 1104.Processor main frame 1101 is connected with control operation button 1102, earphone 1103 and microphone 1104 by connecting lead 1105, the signal of this connection is two-way, be that the push button signalling of control operation button 1102 and the voice signal of microphone 1104 inputs can be sent to processor main frame 1101, the output signal of processor main frame 1101 can output to earphone 1103.In other are realized, processor main frame 1101 carries out wireless connections by wireless signal and control operation button 1102, earphone 1103 and microphone 1104, as adopt Bluetooth technology (BlueTooth) or WiFi technology to realize wireless connections, no matter wired connection or wireless connections mode all are existing mature technologies.Processor main frame 1101 comprises the memory bank 1105 and the information processing apparatus 1106 of the media information and second characteristic information thereof.Comprise first button 1107 and second button 1108 on the control operation button 1102.Because when people use media player, when playing certain first medium, can follow the music humming of medium, and player of the present invention also adopts the operator to hum media segment when selecting the broadcast starting point of medium and control medium to carry out, follow in progress medium and hum or hum segment and control media player and reselect medium or play starting point in order to make media player distinguish the user, adopt first button 1107 and 1108 realizations of second button on the control operation button 1102.When pressing first button 1107, the operator represents it is to select medium by the humming segment, when pressing second button 1108, the operator represents it is the broadcast starting point of selecting medium by the humming segment, when first button 1107 and second button 1108 are not all pressed, be to follow in progress medium to hum.The push button signalling of control operation button 1102 is sent to processor main frame 1101, carries out judgment processing by processing host.Press first button 1107 as the operator, information processing apparatus 1106 is by handling the voice messaging from microphone 1104, therefrom characteristic information extraction; Feature from memory bank 1105 taking-up media informations, intercept a plurality of similarity data that any one section characteristic information calculates then and screen comparison, therefrom choose second characteristic information under that section characteristic information of similarity maximum as the foundation of selecting required multimedia messages, choose media information according to second characteristic information and media information corresponding relation then and play.Press second button 1108 as the operator, information processing apparatus 1106 is by handling the voice messaging from microphone 1104, therefrom characteristic information extraction; Feature from memory bank 1105 taking-up media informations, intercept a plurality of similarity data that any one section characteristic information calculates then and screen comparison, therefrom choose second characteristic information under that section characteristic information of similarity maximum as the foundation of selecting required multimedia messages, choose media information and begin broadcast according to second characteristic information and media information corresponding relation then from location point with the second characteristic information segment similarity maximum.The medium selection of media player and the automatic location of media play starting point have so just been realized.

Claims

1, a kind of media playing apparatus comprises storage medium and the multi-media signal output block of having stored at least one multimedia messages, it is characterized in that also comprising:

Sound input component;

2, media playing apparatus according to claim 1 is characterized in that described first characteristic information comprises acoustic tones information and/or inflection information; Described second characteristic information comprises acoustic tones information and/or the inflection information that comprises in the multimedia messages.

3, media playing apparatus according to claim 1 is characterized in that described first characteristic information is one of following information or combination: MIDI data, numbered musical notation, staff, cepstral coefficients method (MFCC), linear predict code (LPC) data; Described second characteristic information is one of following information or combination: MIDI data, numbered musical notation, staff, cepstral coefficients method (MFCC), linear predict code (LPC) data, the prosodic information of music, the melodic information of music.

4, according to claim 1 or 2 or 3 described media playing apparatus, it is characterized in that described characteristic extracting component, medium information characteristic memory unit, characteristic similarity calculating unit, characteristic similarity decision means, multimedia messages alternative pack, storage medium are realized by the information processing apparatus that comprises processor, adopt wired lead to be connected between described information processing apparatus and multi-media signal output block, the sound input component or adopt the wireless signal connection.

5, media playing apparatus according to claim 4 is characterized in that also comprising the control knob parts, adopts wired lead to be connected between described information processing apparatus and the described control knob parts or adopts the wireless signal connection.

6,, it is characterized in that described storage medium and medium information characteristic memory unit are independently memory bank or shared memory bank according to claim 1 or 2 or 3 or 5 described media playing apparatus; Described storage medium and/or medium information characteristic memory unit have also been stored every multimedia messages and its second characteristic information corresponding relation.

7, media playing apparatus according to claim 4 is characterized in that described storage medium and medium information characteristic memory unit are independently memory bank or shared memory bank; Described storage medium and/or medium information characteristic memory unit have also been stored every multimedia messages and its second characteristic information corresponding relation.

8, a kind of media playing apparatus comprises storage medium and the multi-media signal output block of having stored at least one multimedia messages, it is characterized in that also comprising:

Sound input component;

9, media playing apparatus according to claim 8 is characterized in that described first characteristic information comprises acoustic tones information and/or inflection information; Described second characteristic information comprises acoustic tones information and/or the inflection information that comprises in the multimedia messages.

10, media playing apparatus according to claim 8 is characterized in that described first characteristic information is one of following information or combination: MIDI data, numbered musical notation, staff, cepstral coefficients method (MFCC), linear predict code (LPC) data; Described second characteristic information is one of following information or combination: MIDI data, numbered musical notation, staff, cepstral coefficients method (MFCC), linear predict code (LPC) data, the prosodic information of music, the melodic information of music.

11, according to Claim 8 or 9 or 10 described media playing apparatus, it is characterized in that described characteristic extracting component, medium information characteristic calculating unit, characteristic similarity calculating unit, characteristic similarity decision means, multimedia messages alternative pack, storage medium are realized by the information processing apparatus that comprises processor, adopt wired lead to be connected between described information processing apparatus and multi-media signal output block, the sound input component or adopt the wireless signal connection.

12, media playing apparatus according to claim 11 is characterized in that also comprising the control knob parts, adopts wired lead to be connected between described information processing apparatus and the described control knob parts or adopts the wireless signal connection.

13, a kind of multimedia messages method of operating adopts each described media playing apparatus in the claim 1 to 12, it is characterized in that comprising following operation steps:

Step 1: the sound input component sound import information of utilizing described media playing apparatus;

Step 2: described media playing apparatus extracts first characteristic information of described acoustic information;

Step 3: described media playing apparatus calculates the similarity of second characteristic information of every media information storing in described first characteristic information and the media playing apparatus;

Step 4: described media playing apparatus judges that the pairing multimedia messages of described similarity maximal value is as the multimedia messages of choosing;

Step 5: the multimedia messages that described media playing apparatus output or broadcast are chosen.

14, method of operating according to claim 13 is characterized in that also comprising the step of storing the correspondence relationship information of second characteristic information of at least one media information, every media information correspondence, every media information and second characteristic information to described media playing apparatus before described step 1.

15, method of operating according to claim 13 is characterized in that also comprising before described step 3 to described media playing apparatus and stores the step of at least one media information and calculate the step of the correspondence relationship information of second characteristic information of every media information correspondence, every media information and second characteristic information.

16, according to claim 13 or 14 or 15 described methods of operating, the starting point that it is characterized in that in the described step 5 output or play the multimedia messages of being chosen be in the start information point of media information or the media information with the information point of the described first characteristic information similarity maximum.