CN101165779A

CN101165779A - Information processing apparatus and method, program, and record medium

Info

Publication number: CN101165779A
Application number: CNA200710162893XA
Authority: CN
Inventors: 小林由幸
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2006-10-20
Filing date: 2007-10-22
Publication date: 2008-04-23
Anticipated expiration: 2027-10-22
Also published as: CN101165779B; JP5007714B2; JP2009058970A

Abstract

An information processing apparatus is disclosed. An analyzing section chronologically continuously analyzes sound data which chronologically continue in each of predetermined-frequency bands. A continuous characteristic quantity extracting section which extracts a continuous characteristic quantity which is a characteristic quantity which chronologically continues from an analysis result of the analyzing section. A cutting section cuts the continuous characteristic quantity into regions each of which has a predetermined length. A regional characteristic quantity extracting section extracts a regional characteristic quantity which is a characteristic quantity represented by one scalar or vector from each of the regions into which the continuous characteristic quantity has been cut. A target characteristic quantity estimating section estimates a target characteristic quantity which is a characteristic quantity which represents one characteristic of the sound data from each of the regional characteristic quantities.

Description

Signal conditioning package and method, program and recording medium

The cross reference of related application

The present invention comprises the theme of the Japanese patent application JP 2006-296143 that is involved in the Japanese patent application JP 2006-286261 that submitted to Jap.P. office on October 20th, 2006 and submits to Jap.P. office on October 31st, 2006, and its full content is incorporated herein by reference.

Technical field

The present invention relates to a kind of signal conditioning package and method, a kind of program and a kind of recording medium, specifically, relate to the signal conditioning package and method, program and the recording medium that allow to extract data characteristics.

Background technology

Process information and represent that from extracting data the technology of the characteristic quantity of data characteristics is known in a predefined manner.In these technology, represent that the characteristic quantity of feature of the data of continuous in chronological order presumptive area can be extracted.

Correlation technique with reference in, in the implementation of the result's who uses voice recognition to handle information processing, it is carried out target that voice recognition handles changes.The voice recognition environment that voice recognition is handled is provided with according to target and changes.After this, according to being provided with after changing the target after changing is carried out voice recognition and handle (for example, seeing the open 2005-195834 of Japanese Patent Laid).

Summary of the invention

Yet, when data are divided into a plurality of zones in advance and extract each regional feature, will be difficult to consider the influence of region in front (or a plurality of zone) to current region.

When as far as possible increasing the resolution of the final characteristic quantity that obtains, be necessary to increase the overlapping of the data that to be divided.As a result, treatment capacity and resolution increase pro rata.

When data in real time is imported, because whenever just execution processing of the data of having stored scheduled volume, so complicated more from the algorithm of extracting data feature, the time delay up to final acquisition characteristic quantity after data are transfused to is just long more.

In other words, the time that is transfused to of the data by the zone and the processed time sum of data obtain after data are transfused to the time delay (stand-by period) that the characteristic quantity up to final acquisition is output.Therefore, complicated more from the algorithm of extracting data feature, required time of deal with data is just long more, that is, time delay (stand-by period) is long more.

In addition, when from the data of expression continuous quantity, directly extracting feature, be necessary that design specialized model and more teacher's data (teacher data) are used for the parameter of learning characteristic extraction equipment.In correlation technique, do not use generic features to extract equipment.In addition, a small amount of teacher's data of no use are come learning parameter.

Consider top problem, expectation provides the feature that allows data by easily and the technology of extracting apace.

According to embodiments of the invention, provide a kind of signal conditioning package.This signal conditioning package comprises analysis part, Characteristic Extraction part, partitioning portion, provincial characteristics amount are extracted part and target signature amount evaluation part continuously.Analysis part is analyzed continuous in chronological order voice data in chronological order continuously in each of predetermined frequency band.The Characteristic Extraction part is extracted continuous characteristic quantity from the analysis result of described analysis part continuously, and described continuous characteristic quantity is continuous in chronological order characteristic quantity.Partitioning portion is divided into a plurality of zones with described continuous characteristic quantity, and each in described a plurality of zones has predetermined length.The provincial characteristics amount is extracted in part each from a plurality of zones that described continuous characteristic quantity is divided into and is extracted the provincial characteristics amount, and described provincial characteristics amount is the characteristic quantity by a scalar or vector representation.Target signature amount evaluation part is the assessment objective characteristic quantity from each of described provincial characteristics amount, and described target signature amount is the characteristic quantity of a feature of expression voice data.

Can be by study by continuous in chronological order voice data be illustrated in teacher's data that the characteristic quantity of a proper characteristics of expression voice data in each zone in a plurality of zones that continuous characteristic quantity is divided into constitutes and pre-create target signature amount evaluation part.

Analysis part can be parsed into continuous in chronological order voice data the sound of the interval (musical interval) of 12 equal temperances on each octave rank in chronological order continuously.Continuously the Characteristic Extraction part can be from the continuous characteristic quantity of extracting data of the energy of the interval of and 12 equal temperances (12 equal temperament) that represent each octave rank that obtain as the analysis result of described analysis part.

Target signature amount evaluation part can be assessed the target signature amount that music or talk is designated the feature of voice data.

Signal conditioning package also can comprise smooth, is used for coming level and smooth target signature amount by the sliding average that obtains the target signature amount.

Signal conditioning package also can comprise storage area, and the mark that is used for identifying the feature of being shown by the target signature scale of being assessed adds voice data to, and tagged voice data has been added in storage.

Signal conditioning package also can comprise algorithm establishment part, is used for creating the algorithm that extracts continuous characteristic quantity from continuous in chronological order voice data according to GA (genetic algorithm) or GP (genetic planning).

According to embodiments of the invention, provide a kind of information processing method.In each of predetermined frequency band, analyze continuous in chronological order voice data in chronological order continuously.Extract continuous characteristic quantity from analysis result, described continuous characteristic quantity is continuous in chronological order characteristic quantity.Described continuous characteristic quantity is divided into a plurality of zones, and each in described a plurality of zones has predetermined length.Extract the provincial characteristics amount in from a plurality of zones that described continuous characteristic quantity is divided into each, described provincial characteristics amount is the characteristic quantity by a scalar or vector representation.According to each assessment objective characteristic quantity of described provincial characteristics amount, described target signature amount is the characteristic quantity of a feature of expression voice data.

According to embodiments of the invention, provide a kind of program of carrying out by computing machine.In each of predetermined frequency band, analyze continuous in chronological order voice data in chronological order continuously.Extract continuous characteristic quantity from the analysis result of analytical procedure, described continuous characteristic quantity is continuous in chronological order characteristic quantity.Described continuous characteristic quantity is divided into a plurality of zones, and each in described a plurality of zones has predetermined length.Extract the provincial characteristics amount in from a plurality of zones that described continuous characteristic quantity is divided into each, described provincial characteristics amount is the characteristic quantity by a scalar or vector representation.According to each assessment objective characteristic quantity of described provincial characteristics amount, described target signature amount is the characteristic quantity of a feature of expression voice data.

According to embodiments of the invention, a kind of recording medium is provided, on described recording medium, write down the program of carrying out by computing machine.In each of predetermined frequency band, analyze continuous in chronological order voice data in chronological order continuously.Extract continuous characteristic quantity from analysis result, described continuous characteristic quantity is continuous in chronological order characteristic quantity.Described continuous characteristic quantity is divided into a plurality of zones, and each in described a plurality of zones has predetermined length.Extract the provincial characteristics amount in from a plurality of zones that described continuous characteristic quantity is divided into each, described provincial characteristics amount is the characteristic quantity by a scalar or vector representation.According to each assessment objective characteristic quantity of described provincial characteristics amount, described target signature amount is the characteristic quantity of a feature of expression voice data.

According to embodiments of the invention, in each of predetermined frequency band, analyze continuous in chronological order voice data in chronological order continuously.Extract continuous characteristic quantity from analysis result, described continuous characteristic quantity is continuous in chronological order characteristic quantity.Described continuous characteristic quantity is divided into a plurality of zones, and each in described a plurality of zones has predetermined length.Extract the provincial characteristics amount in from a plurality of zones that described continuous characteristic quantity is divided into each, described provincial characteristics amount is the characteristic quantity by a scalar or vector representation.Assessment objective characteristic quantity from each of described provincial characteristics amount, described target signature amount are the characteristic quantities of a feature of expression voice data.

According to embodiments of the invention, can be from the extracting data feature.

According to embodiments of the invention, can be easily and apace from the extracting data feature.

By following detailed description of the preferred embodiment of the present invention to going out as shown in drawings, it is more obvious that these and other purposes, features and advantages of the present invention will become.

Description of drawings

In conjunction with the accompanying drawings, from following detailed, will more fully understand the present invention, wherein similarly label is represented similar elements, wherein:

Fig. 1 describes the synoptic diagram that obtains feature from every part of continuous data with predetermined length;

Fig. 2 illustrates the block scheme of the structure of signal conditioning package according to an exemplary embodiment of the present invention;

Fig. 3 is a process flow diagram of describing the processing of extracting the target signature amount;

Fig. 4 describes the synoptic diagram that extracts continuous characteristic quantity;

Fig. 5 describes the synoptic diagram of cutting apart continuous characteristic quantity;

Fig. 6 describes the synoptic diagram that extracts the provincial characteristics amount;

Fig. 7 is a synoptic diagram of describing the assessment objective characteristic quantity;

Fig. 8 describes to determine that at place, unit interval voice data be music or the synoptic diagram of talking;

Fig. 9 is the block scheme that illustrates according to another structure of the signal conditioning package of the embodiment of the invention;

Figure 10 describes the process flow diagram that mark is added to the processing of voice data;

Figure 11 is the synoptic diagram of description time-interval data;

Figure 12 be describe from time-synoptic diagram of the continuous music characteristic quantity of interval extracting data;

Figure 13 describes the synoptic diagram of cutting apart continuous music characteristic quantity;

Figure 14 describes the synoptic diagram that extracts the provincial characteristics amount;

Figure 15 describes to determine that frame is music or the synoptic diagram of talking;

Figure 16 describes the smoothed synoptic diagram of definite result that every frame is music or talk;

Figure 17 illustrates the synoptic diagram that adds tagged exemplary voice data;

Figure 18 describes the synoptic diagram that algorithm is created the processing summary of part;

Figure 19 describes the synoptic diagram that algorithm is created the processing summary of part;

Figure 20 describes the synoptic diagram that algorithm is created the processing summary of part;

Figure 21 illustrates the block scheme that algorithm is created the functional structure of part;

Figure 22 describes algorithm to create the process flow diagram of handling;

Figure 23 describes exemplary algorithm to create the synoptic diagram of handling;

Figure 24 describes the synoptic diagram of carrying out with the processing of gene (gene) expression;

Figure 25 describes the synoptic diagram of estimating gene;

Figure 26 is the block scheme that the exemplary configurations of personal computer is shown.

Embodiment

Then, embodiment of the present invention will be described.Relation between the embodiment that describes in ingredient of the present invention and this instructions of the present invention is as follows.Description in this part expresses support for the inventive embodiment of setting forth in the instructions and is described in this instructions.Therefore, even some embodiment are not middle in this section as being described with the corresponding embodiment of ingredient of the present invention, do not mean that these embodiment are not corresponding with this ingredient yet.On the contrary, even embodiment is described as ingredient in this section, do not mean that these embodiment are not corresponding with the part outside the ingredient yet.

According to embodiments of the invention, signal conditioning package comprises that analysis part (for example, shown in Figure 9 time-interval analysis part 81), the Characteristic Extraction part is (for example continuously, continuous music Characteristic Extraction part 82 shown in Figure 9), partitioning portion (for example, frame partitioning portion 83 shown in Figure 9), the provincial characteristics amount (is for example extracted part, provincial characteristics amount shown in Figure 9 is extracted part 84) and target signature amount evaluation part (for example, shown in Figure 9 music/talk determining section 85).Analysis part is analyzed by continuous voice data of time in each of predetermined frequency band in chronological order continuously.The Characteristic Extraction part is extracted the conduct continuous characteristic quantity of continuous characteristic quantity in chronological order from the analysis result of analysis part continuously.Partitioning portion is divided into a plurality of zones with continuous characteristic quantity, and each zone has predetermined length.The provincial characteristics amount is extracted part and is extracted from each of zone that continuous characteristic quantity is divided into as the provincial characteristics amount by the characteristic quantity of a scalar or vector representation.Target signature amount evaluation part from each of provincial characteristics amount assessment as the target signature amount of the characteristic quantity of a feature of expression voice data.

Signal conditioning package also can comprise smooth (for example, data smoothing part 86 shown in Figure 9), and this smooth comes level and smooth target signature amount by the sliding average that obtains the target signature amount.

Signal conditioning package also (for example can comprise storage area, sound store part 87 shown in Figure 9), this storage area adds mark to voice data and tagged voice data is added in storage, the feature that described mark sign is shown by the target signature scale of being estimated.

Signal conditioning package (for example also can comprise algorithm establishment part, algorithm shown in Figure 18 is created part 101), this algorithm is created the part establishment and is created the algorithm that extracts continuous characteristic quantity from continuous in chronological order voice data according to GA (genetic algorithm) or GP (genetic planning).

According to embodiments of the invention, in information processing method and program, in each predetermined frequency band, analyze continuous in chronological order voice data (for example, at the step S51 shown in Figure 10) in chronological order continuously.From analysis result, extract the conduct continuous characteristic quantity of continuous characteristic quantity (for example, at the step S52 shown in Figure 10) in chronological order.Continuous characteristic quantity is divided into a plurality of zones, and each zone has predetermined length (for example, at the step S53 shown in Figure 10).From each zone in a plurality of zones that continuous characteristic quantity is divided into, extract the provincial characteristics amount (for example, at step S54 Figure 10 shown in) of conduct by the characteristic quantity of a scalar or vector representation.Assessment is as the target signature amount (for example, at the step S55 shown in Figure 10) of the characteristic quantity of a feature of expression voice data from each of provincial characteristics amount.

At first, as shown in Figure 1, the automated characterization extraction algorithm is applied to as the continuous data of continuous in chronological order data and according to this algorithm technology with the interval acquisition feature of predetermined length from continuous data describing.For example, for example obtain feature with the interval of predetermined length the continuous data of Wave data from the conduct of continuous input as one of A, B and C.

Fig. 2 is the block scheme that illustrates according to the structure of the signal conditioning package 11 of the embodiment of the invention.Feature is extracted at signal conditioning package 11 interval with predetermined length from continuous data.Signal conditioning package 11 is made up of continuous Characteristic Extraction part 31, continuous feature partitioning portion 32, provincial characteristics amount extraction part 33 and target signature amount evaluation part 34.

Continuously Characteristic Extraction part 31 obtains as extracting by the continuous data of continuous data of time and from the continuous data that is obtained as the continuous characteristic quantity of continuous characteristic quantity in chronological order from the outside input.Characteristic Extraction part 31 is extracted at least one continuous characteristic quantity from continuous data continuously.Characteristic Extraction part 31 is provided to continuous feature partitioning portion 32 in succession with the continuous characteristic quantity that is extracted continuously.

In other words, the order that is extracted by them as the continuous characteristic quantity of continuous in chronological order characteristic quantity is provided to continuous feature partitioning portion 32.

Continuously each of the feature partitioning portion 32 continuous characteristic quantity that will provide from continuous Characteristic Extraction part 31 is divided into a plurality of zones, and each zone has predetermined length.In other words, at least one zone of each in the continuous characteristic quantity of continuation property partitioning portion 32 establishments.Continuously feature partitioning portion 32 order that is divided into a plurality of zones by each characteristic quantity in the continuous characteristic quantity is provided to the provincial characteristics amount in succession with each the zone in the continuous characteristic quantity and extracts part 33.

Extract the provincial characteristics amount of conduct in each of the zone that the continuous feature partitioning portion 32 of each characteristic quantity quilt of provincial characteristics amount extraction part 33 from continuous characteristic quantity is divided into by the characteristic quantity of a scalar or vector representation.In other words, extract at least one provincial characteristics amount in each of the zone of each characteristic quantity of provincial characteristics amount extraction part 33 from continuous characteristic quantity.The provincial characteristics amount is extracted part 33 and by the order that the provincial characteristics amount is extracted the provincial characteristics amount of being extracted is offered target signature amount evaluation part 34.

Target signature amount evaluation part 34 is evaluated at the target signature amount that obtains the most at last in each zone with predetermined length.In other words, target signature amount evaluation part 34 is extracted assessment objective characteristic quantity the provincial characteristics amount that part 33 extracts from the provincial characteristics amount, and described target signature amount is the characteristic quantity that has a feature of expression data in the zone of predetermined length at each.34 outputs of target signature amount evaluation part are by the target signature amount of target signature amount evaluation part 34 assessments.

Then, with reference to process flow diagram shown in Figure 3, will the processing of extracting the target signature amount be described.At step S11, extract the continuous characteristic quantity of at least one continually varying the continuous data of the continuous Characteristic Extraction part 31 of signal conditioning package 11 continuous in chronological order data of input from conduct from the outside.

For example, as shown in Figure 4, Characteristic Extraction part 31 is extracted three continuous characteristic quantities of continually varying from continuous data continuously, for example continuous characteristic quantity 1, continuous characteristic quantity 2 and continuous characteristic quantity 3.

More particularly, when continuous data is voice data, continuously Characteristic Extraction part 31 from continuous data, extracts each wave volume constantly of expression continuous characteristic quantity 1, represent each 12 equal temperance constantly interval sound (for example, the sound of Do, Re or Mi) continuous characteristic quantity 2 and represent the right-channel signals in each moment and the continuous characteristic quantity 3 of the balance of left channel signals.

When continuous data is motion image data, continuously Characteristic Extraction part 31 from continuous data, extracts each brightness of moving image constantly of expression continuous characteristic quantity 1, represent each momental continuous characteristic quantity 2 and represent continuous characteristic quantity 3 of the color of each moment moving image constantly.

Characteristic Extraction part 31 is provided to continuous feature partitioning portion 32 by the order that continuous characteristic quantity is extracted in succession with the continuous characteristic quantity that is extracted continuously.

At step S12, feature partitioning portion 32 is divided into a plurality of zones with at least one continuous characteristic quantity continuously, and each in described a plurality of zones has predetermined length.

For example, continuously feature partitioning portion 32 for example continuous data continuous characteristic quantity 1, continuously characteristic quantity 2 and continuously each in the continuous characteristic quantity of characteristic quantity 3 and so on be divided into a plurality of zones, in described a plurality of zone each has the predetermined length of being represented by the adjacent perpendicular line shown in Fig. 5, and each of the continuous characteristic quantity of for example continuous characteristic quantity 1 of feature partitioning portion 32 generals, continuous characteristic quantity 2 and connection features amount 3 and so on is divided into a plurality of zones continuously, and each in described a plurality of zones has predetermined length.

A plurality of continuous characteristic quantities are cut apart in the divided mode of identical length in identical position with them.

In this example, length can be based on the predetermined unit (for example, frame) of the data volume or the continuous data of time, continuous data.

Continuously feature partitioning portion 32 can be divided into each continuous characteristic quantity in a plurality of zones that each zone has predetermined length, makes that each cut zone and adjacent cut zone are overlapping.

More particularly, for example, continuously the continuous characteristic quantity 1 of feature partitioning portion 32 each wave volume constantly of expression that will from continuous data, extract as voice data, represent each constantly the sound of the interval of 12 equal temperances continuous characteristic quantity 2 and represent each constantly the continuous characteristic quantity 3 of the balance of right-channel signals and left channel signals be divided into a plurality of zones, each zone had 5 seconds, 10 seconds or the voice data of 15 seconds length.

As an alternative, for example, continuously the continuous characteristic quantity 1 of each the moving image brightness constantly of expression that will from continuous data, extract of feature partitioning portion 32 as motion image data, represent each constantly momental continuous characteristic quantity 2 and represent each constantly the continuous characteristic quantity 3 of the color of moving image be divided into a plurality of zones, each zone has the motion image data of 30 frames, 150 frames or 300 frame lengths.

Feature partitioning portion 32 is provided to provincial characteristics amount extraction part 33 with a plurality of zones that continuous characteristic quantity has been divided into by their divided order continuously.

At step S13, the provincial characteristics amount is extracted part 33 and is extracted and corresponding at least one the provincial characteristics amount by a scalar or vector representation of at least one continuous characteristic quantity that is divided into each a plurality of zone, and each of described a plurality of zones has predetermined length.

For example, the provincial characteristics amount is extracted part 33 and will at least a predetermined processing be applied in a plurality of zones that each characteristic quantity in the continuous characteristic quantity is divided into each, with extraction from each of continuous characteristic quantity as at least one provincial characteristics amount by the characteristic quantity of at least one scalar or vector representation.

Provincial characteristics amount is a scalar or a vector with the character representation in a zone.

For example, as shown in Figure 6, the mean value of the continuous characteristic quantity 1 of each wave volume constantly of the expression first area that provincial characteristics amount extraction part 33 is extracted from the continuous data as voice data.Therefore, the provincial characteristics amount is extracted part 33 and is extracted the 0.2 provincial characteristics amount as the first area.Similarly, the provincial characteristics amount is extracted part 33 obtains each wave volume constantly in the expression second of extracting and the 3rd zone from the continuous data as voice data the mean value of continuous characteristic quantity 1.Therefore, the provincial characteristics amount is extracted part 33 and is extracted-0.05 and 0.05 respectively as the provincial characteristics amount in the second and the 3rd zone.

In addition, the provincial characteristics amount is extracted part 33 obtains each wave volume constantly in the expression of extracting first, second and the 3rd zone from the continuous data as voice data the variance of continuous characteristic quantity 1.As a result, the provincial characteristics amount is extracted part 33 and is extracted 0.2,0.15 and 0.1, respectively as the provincial characteristics amount in first, second and the 3rd zone.

In addition, the provincial characteristics amount is extracted part 33 obtains the continuous characteristic quantity 1 of each wave volume constantly in the expression of extracting first, second and the 3rd zone from the continuous data as voice data gradient.Therefore, the provincial characteristics amount is extracted part 33 and is extracted 0.3 ,-0.2 and 0.0, respectively as the provincial characteristics amount in first, second and the 3rd zone.

Similarly, the provincial characteristics amount is extracted the provincial characteristics amount that part 33 is extracted mean value, variance and the gradient of the continuous characteristic quantity 1 of representing the 4th zone and back region.

In addition, the provincial characteristics amount of mean value, variance and the gradient of the continuous characteristic quantity 2 of expression in each zone that 33 extractions of provincial characteristics amount extraction part are extracted from the continuous data as voice data and the provincial characteristics amount of representing mean value, variance and the gradient of continuous characteristic quantity 3, the sound of the interval of described each moment 12 equal temperances of continuous characteristic quantity 2 expressions, the balance of described continuous each moment right-channel signals of characteristic quantity 3 expressions and left channel signals.

When continuous data is motion image data, the provincial characteristics amount is extracted part 33 is extracted mean value, variance and the gradient of the continuous characteristic quantity 1 of expression in each zone of extracting, continuous characteristic quantity 2 and continuous characteristic quantity 3 from continuous data provincial characteristics amount, described each brightness of moving image constantly of continuous characteristic quantity 1 expression, described each amount of exercise constantly of continuous characteristic quantity 2 expressions, the color of described continuous characteristic quantity 3 each moving images constantly of expression.

At step S14, target signature amount evaluation part 34 is assessed each regional target signature amount according to the provincial characteristics amount.After this, finish dealing with.

In other words, at step S14, the target signature amount that 34 assessments of target signature amount evaluation part are extracted from each regional provincial characteristics amount of extracting at step S13 the most at last.For example, as shown in Figure 7, when extracting provincial characteristics amount 1 for example to the provincial characteristics amount of provincial characteristics amount 7 and so on, for example extracted 0.2 as provincial characteristics amount 1,0.2 as provincial characteristics amount 2,0.3 as provincial characteristics amount 3 ,-0.5 as provincial characteristics amount 4,1.23 as provincial characteristics amount 5,0.42 as provincial characteristics amount 6 and 0.11 during as provincial characteristics amount 7, target signature amount evaluation part 34 is according to provincial characteristics amount 1 to 7 assessment objective characteristic quantity.

When continuous data was voice data, the target signature scale showed having or not of the having or not of the having or not of sound, predetermined instrument performance, noise or the like.

When continuous data was motion image data, the target signature scale was leted others have a look at the having or not of predetermined motion (for example, whether object is dancing) or the like of the having or not of the having or not of (group), predetermined object, object.

Therefore, at step S14,34 assessments of target signature amount evaluation part are as the target signature amount of following characteristic quantity, and described characteristic quantity is represented a feature from the data of the provincial characteristics amount in each zone.

In other words, the processing that target signature amount evaluation part 34 will be scheduled to is applied to the provincial characteristics amount in each zone, and assesses the target signature amount in each zone.

For example, target signature amount evaluation part 34 pre-creates by teacher's data that study is made of provincial characteristics amount and target signature amount, and described target signature scale shows a proper characteristics of the data in each zone.In other words, target signature amount evaluation part 34 pre-creates by teacher's data that study is made of data continuous in chronological order in each zone and target signature amount, from described continuous in chronological order extracting data provincial characteristics amount, described target signature scale shows a proper characteristics of whole data in each zone.

For example, according to creating target signature amount evaluation part 34 by machine learning teacher data as recurrence, classification, SVM (support vector machine) or GP technology such as (genetic plannings).

By this way, can extract the feature of the continuous data in the presumptive area.

From continuous in chronological order continuous data, extract continuous in chronological order continuous characteristic quantity.From continuous characteristic quantity, cut apart zone with predetermined length.From the zone of the continuous characteristic quantity from continuous characteristic quantity, cut apart, extract as provincial characteristics amount by the characteristic quantity of a scalar or vector representation.The target signature amount is the characteristic quantity of a feature of continuous data in each zone of expression.Therefore, can be easily and in each zone, extract the feature of continuous data apace.

Then, embodiments of the invention will more specifically be described.

As shown in Figure 8, automatic music/talk is determined algorithm application in as the input of the voice data of continuous data in chronological order, is music or talks and export that voice data is the result of music or talk in each unit interval to determine voice data in each unit interval.

For example, in each unit interval of the voice data with predetermined length sound, definite result of the voice data of the Wave data of expression sound waveform is outputted as talk (T), talk (T), talk (T), talk (T), music (M), music (M), music (M), music (M), music (M) and music (M).

Fig. 9 is the block scheme that illustrates according to the structure of the signal conditioning package 51 of the embodiment of the invention.In each unit interval, signal conditioning package 51 determines that the voice data of input is music or talks.Signal conditioning package 51 by time-interval analysis part 81, continuously music Characteristic Extraction part 82, frame partitioning portion 83, provincial characteristics amount are extracted part 84, music/talk determining section 85, data smoothing part 86 and sound storage area 87 and are constituted.

Time-interval analysis part 81 analyzes by continuous voice data of time in each of predetermined frequency band in chronological order continuously.For example, time-interval analysis part 81 analyzes continuous in chronological order voice data on the interval of 12 equal temperances on each octave rank and time two axles.Time-interval analysis part 81 obtain each octave rank of expression 12 equal temperances interval energy and as analysis result continuous in chronological order time-the interval data, and with time-the interval data are provided to continuous music Characteristic Extraction part 82 by its analyzed order.Continuous in chronological order time-interval data are provided to continuous music Characteristic Extraction part 82, make them continuous in time by their analyzed order.

Continuously music Characteristic Extraction part 82 from time-the interval extracting data is as the continuous music characteristic quantity of continuous characteristic quantity in chronological order, described time-interval data be from time-continuous in chronological order data that interval analysis part 81 provides.Music Characteristic Extraction part 82 is provided to frame partitioning portion 83 with the continuous music characteristic quantity that is extracted by its order that is extracted continuously.Continuous music characteristic quantity as continuous in chronological order characteristic quantity is provided to frame partitioning portion 83, makes them by continuous in time by the order of its extraction.

Frame partitioning portion 83 will be divided into a plurality of frames from the continuous music characteristic quantity that continuous music Characteristic Extraction part 82 provides, and each in described a plurality of frames has predetermined length.The continuous music characteristic quantity that frame partitioning portion 83 will be divided into frame is provided to provincial characteristics amount extraction part 84 as the continuous music characteristic quantity based on frame by its order that is divided into frame.

The provincial characteristics amount extract part 84 from based on extract the continuous music characteristic quantity of frame as in every frame by the provincial characteristics amount of the characteristic quantity of a scalar or vector representation.Provincial characteristics amount extraction part 84 is extracted the provincial characteristics amount of being extracted by them order is provided to music/talk determining section 85.

Music/talk determining section 85 is according to each the assessment objective characteristic quantity that is extracted the provincial characteristics amount of part 84 extractions by the provincial characteristics amount, and the feature of each frame that described target signature amount is a voice data and expression are used to identify the feature of music or talk.In other words, in the every frame of music/talk determining section 85 assessments music or talk are designated the target signature amount of a feature of voice data.

Music/talk determining section 85 will represent that the definite result of the music/talk based on frame of the feature of the sign music that is obtained or every frame of talking is provided to data smoothing part 86 as assessment result.

The sliding average based on the definite result of music/talks of frame that provides from music/talk determining section 85 is provided for data smoothing part 86, and according to the level and smooth target signature amount of the sliding average that is obtained.Data smoothing part 86 obtains continuous music/talk and determines the result as level and smooth result, and continuous music/talk is determined that the result is provided to sound store part 87.

Sound store part 87 determines that according to the continuous music/talk that provides from data smoothing part 86 result creates the mark of sign music or talk, and adds the mark of being created to voice data.Sound store part 87 for example stores the voice data that is labeled into the recording medium (not shown).

In other words, the mark of the target signature amount that sound store part 87 is evaluated with expression adds voice data to, and stores the voice data that is labeled that obtains.

Sound store part 87 can be stored the voice data that is labeled in such a way, make sound store part 87 with the voice data record that is labeled to the server (not shown) that is connected to signal conditioning package 11 by network.

Figure 10 describes the process flow diagram that mark is added to the processing of voice data.At step S51, time-interval analysis part 81 is at the waveform of the continuous in chronological order voice data of two axle analyses of interval of 12 equal temperances on time and each octave rank, and according to analysis result creation-time-interval data.

For example, as shown in figure 11, at step S51, time-interval analysis part 81 is divided into a plurality of octave order components with voice data, and obtain the energy of music level of 12 equal temperances on each octave rank, on the interval of 12 equal temperances on each octave rank and time two axles, analyze voice data, and according to analysis result creation-time-interval data.

More particularly, when voice data is stereo data, time-interval analysis part 81 obtains each the energy of interval of 12 equal temperances in the right data of voice datas and each a plurality of octaves rank in the L channel data, and interpolation is from the energy of the L channel data acquisition on each octave rank and the energy that obtains from right data, with creation-time-interval data.

Time-interval analysis part 81 create as continuous in chronological order data time-the interval data.Time-interval analysis part 81 with created time-order that the interval data are created by their is provided to continuous music Characteristic Extraction part 82.

At step S52, continuously music Characteristic Extraction part 82 from time-a plurality of continuous music characteristic quantities of interval extracting data.

For example, at step S52, continuously music Characteristic Extraction part 82 from the energy of the interval of 12 equal temperances of representing each octave rank time-continuous musical features amount that the interval extracting data changes in chronological order, for example music characteristic quantity 1, music characteristic quantity 2 and music characteristic quantity 3 continuously continuously continuously.For example, as shown in figure 12, continuously music Characteristic Extraction part 82 from the energy of the interval of 12 equal temperances of representing each octave rank time-the interval extracting data represent each constantly the level ratio of music scope continuous music characteristic quantity 1, represent each constantly the level difference of R channel and L channel or energy difference continuous music characteristic quantity 2 and expression such as for example sound (attack), decay (decay), keep the continuous music characteristic quantity 3 of (sustain), the envelope parameters such as (release) that disappears.As an alternative, for example, continuously music Characteristic Extraction part 82 from the energy of the interval of 12 equal temperances of representing each octave rank time-the interval extracting data represent each rhythm ratio constantly continuous music characteristic quantity 1, represent the continuous music characteristic quantity 2 of each sound number constantly and represent each constantly and continuous music characteristic quantity 3 acoustic form.

In addition, continuously music Characteristic Extraction part 82 can from the energy of the interval of 12 equal temperances of representing each octave rank time-the interval extracting data represents the continuous music characteristic quantity of sound density, interval variation etc.

Music Characteristic Extraction part 82 is provided to frame partitioning portion 83 with the order that the continuous music characteristic quantity that is extracted is extracted by them continuously.

At step S53, frame partitioning portion 83 is divided into a plurality of frames with in the continuous music characteristic quantity each and obtains continuous music characteristic quantity based on frame.

For example, as shown in figure 13, each in the continuous music characteristic quantity of for example continuous music characteristic quantity 1 of frame partitioning portion 83 generals, continuous music characteristic quantity 2 and continuous music characteristic quantity 3 and so on is divided into a plurality of frames.In this example, frame is the time period between moment of being represented by the perpendicular line shown in Figure 13 and moment of being represented by the perpendicular line adjacent with this perpendicular line.Frame is the time period with predetermined length.

Frame partitioning portion 83 is divided into a plurality of frames with for example continuous music characteristic quantity of continuous music characteristic quantity 1, continuous music characteristic quantity 2 and continuous music characteristic quantity 3 and so on.

Frame partitioning portion 83 is divided into a plurality of frames with a plurality of continuous music characteristic quantities, makes them be cut apart with identical length in identical position.

Frame partitioning portion 83 will be divided into order that the continuous music characteristic quantity based on frame of a plurality of frames is divided by them and be provided to the provincial characteristics amount and extract part 84.

At step S54, the provincial characteristics amount is extracted part 84 and is calculated the mean value and the variance based on the continuous music characteristic quantity of frame of being divided, to extract the provincial characteristics amount in every frame.

The provincial characteristics amount extract part 84 processing that at least one is predetermined be applied to based in the continuous music characteristic quantity of frame each and from based on extracting each of the continuous music characteristic quantity of frame as provincial characteristics amount by the characteristic quantity of at least one scalar or vector representation.

For example, as shown in figure 14, the provincial characteristics amount is extracted part 84 and is obtained each mean value based on first frame of the continuous music characteristic quantity 1 of frame of the level ratio of each music scope constantly of expression.Therefore, the provincial characteristics amount is extracted part 84 and is extracted the 0.2 provincial characteristics amount as first frame.Similarly, the provincial characteristics amount is extracted part 84 and is obtained each mean value based on the second and the 3rd frame of the continuous music characteristic quantity 1 of frame of the level ratio of each music scope constantly of expression.Therefore, the provincial characteristics amount is extracted part 84 and is extracted-0.05 and 0.05 respectively as the provincial characteristics amount of the second and the 3rd frame.

In addition, the provincial characteristics amount is extracted part 84 and is obtained each variance based on first, second and the 3rd frame of the continuous music characteristic quantity 1 of frame of the level ratio of each music scope constantly of expression.Therefore, the provincial characteristics amount is extracted part 84 and is extracted 0.2,0.15 and 0.1 respectively as the provincial characteristics amount of first, second and the 3rd frame.

The provincial characteristics amount is extracted part 84 and is extracted four frame and the mean value of subsequent frames or the provincial characteristics amount of variance of expression based on the continuous music characteristic quantity 1 of frame.

In addition, for example, as shown in figure 14, the provincial characteristics amount is extracted part 84 and is obtained each R channel and make the energy difference of sound channel or the mean value based on first frame of the continuous music characteristic quantity of frame of level difference constantly of expression.Therefore, the provincial characteristics amount is extracted part 84 and is obtained the 0.1 provincial characteristics amount as first frame.Similarly, the provincial characteristics amount is extracted the mean value of part 84 acquisitions based on the second and the 3rd frame of the continuous music characteristic quantity 2 of frame.Therefore, the provincial characteristics amount is extracted part 84 and is extracted 0.4 and 0.5 respectively as the provincial characteristics amount of the second and the 3rd frame.

In addition, the provincial characteristics amount is extracted part 84 and is obtained each variance based on first, second and the 3rd frame of the continuous music characteristic quantity 2 of frame of the energy difference of R channel and L channel or level difference constantly of expression.Therefore, the provincial characteristics amount is extracted part 84 and is extracted 0.3 ,-0.2 and 0.0 respectively as the provincial characteristics amount of first, second and the 3rd frame.

Similarly, the provincial characteristics amount is extracted part 84 and is extracted four frame and the mean value of subsequent frames or the provincial characteristics amount of variance of expression based on the continuous music characteristic quantity 2 of frame.

The provincial characteristics amount is extracted part 84 and extract the provincial characteristics amount from the frame based on the continuous music eigenwert 3 of frame.

The provincial characteristics amount is extracted part 84 the provincial characteristics amount of being extracted is provided to music/talk determining section 85.

At step S55, music/talk determining section 85 determines that according to the provincial characteristics amount every frame is music or talks.

For example, the relative simple calculations that music/talk determining section 85 will be extracted formulate by the target signature amount that pre-creates (for example, four fundamental rules arithmetical operation, exponent arithmetic etc.) at least one provincial characteristics amount in the provincial characteristics amount that is applied to be transfused to, and obtain to determine the result as operation result that described music/talk based on frame determines that the result is the target signature amount of the probability of expression music based on the music/talk of frame.Music/talk determining section 85 pre-stored target signature amounts are extracted formula.

When the target signature scale shows the target signature amount of the probability of music and presumptive area is 0.5 or when bigger, music/talk determining section 85 outputs show that frame is the definite result of music/talks based on frame of music.The target signature amount of showing the probability of music and presumptive area when the target signature scale is less than 0.5 the time, and music/talk determining section 85 outputs show that frame is the definite result of music/talks based on frame who talks.

For example, as shown in figure 15, when provincial characteristics amount 1 for example to the provincial characteristics amount of provincial characteristics amount 7 and so on has been extracted in every frame, music/talk determining section 85 according to as provincial characteristics amount 1 0.2, as provincial characteristics amount 2 0.2, as provincial characteristics amount 3 0.3, as provincial characteristics amount 4-0.5, as provincial characteristics amount 5 1.23, as provincial characteristics amount 6 0.42 and determine that as 0.11 of provincial characteristics amount 7 this frame is music or talks.

For example, be that teacher's data that music or the target signature amount of talking constitute pre-create music/talk determining section 85 by study by provincial characteristics amount in every frame and the every frame of correct expression.In other words, by using by voice data continuous in chronological order in every frame and the every frame of correct expression is that teacher's data learning objective Characteristic Extraction formula that music or the target signature amount of talking constitute pre-creates music/talk determining section 85, extracts the provincial characteristics amount from described continuous in chronological order voice data.

By learning hereditarily by by continuous voice data of time with show that correctly every frame is that teacher's data that music or the target signature amount of talking constitute pre-create the target signature amount that is pre-stored in music/talk determining section 85 and extract formula.

The example that the algorithm of target signature amount extraction formula is created in study comprises recurrence, classification, SVM (support vector machine) and GP (genetic planning).

Music/talk determining section 85 will represent that every frame is that music or the definite result's who talks the music/talk based on frame determine that the result is provided to data smoothing part 86.

At step S56, data smoothing part 86 level and smooth every frames are music or the definite result who talks.

For example, 86 pairs of every frames of data smoothing part are that music or the definite result who talks carry out filtering, with level and smooth definite result.More particularly, data smoothing part 86 is made of moving average filter.At step S56, the sliding average that data smoothing part 86 obtains the definite result of music/talk of frame comes level and smooth music/talk to determine the result.

In Figure 16, the music/talk based on frame of 21 frames determines that the result is talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), music (M), music (M), music (M), talk (T), music (M), music (M), music (M), talk (T), music (M), music (M), music (M), music (M).Therefore, the 13 frame and the 17 frame are to talk (T), and the 12 frame, the 14 frame, the 16 frame and the 18 frame are music (M).Then, this situation will be described.

When the length of every frame fully reduced, the talk frame of predetermined number music frames continuous or predetermined number was continuous.In other words, the front and back of music frames is not the talk frame.Similarly, the front and back of talk frame is not a music frames.Therefore, the represented ground of first sequence as shown in Figure 16,21 frames are arranged in such order: talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), music (M), music (M), music (M), music (M), music (M), music (M), music (M), music (M), music (M), music (M), music (M), music (M).In other words, determine that by the represented music/talk of second sequence shown in Figure 16 the result is included in definite mistake of the talk frame at the 13 frame and the 17 frame place based on frame.

The sliding average that data smoothing part 86 obtains the definite result of music/talk of frame comes level and smooth music/talk to determine the result.The result, data smoothing part 86 obtains the continuous music/talk of 21 frame sequences and determines the result: talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), talk (T), music (M), music (M), music (M), talk (T), music (M), music (M), music (M), talk (T), music (M), music (M), music (M), music (M), here, the 13 frame and the 17 frame are music (M).

Therefore, by level and smooth definite result, can carry out filtering to mistake effectively.

Data smoothing part 86 will be by obtaining to determine result's movement average and the level and smooth definite result of continuous music/talk is provided to sound store part 87 based on music/talks of frame.

At step S57, sound store part 87 is added the mark of sign music or talk every frame of voice data to, and stores the voice data that is labeled.After this, finish dealing with.

For example, as shown in figure 17, sound store part 87 is added the mark of sign music or talk to every frame of voice data.In other words, sound store part 87 is added the mark of sign music to determine the result as continuous music/talks the voice data frame that is confirmed as music, and adds the mark that sign is talked to as continuous music/definite result of talk the voice data frame that is confirmed as talking.Sound store part 87 with added sign music or talk mark voice data record and store hard disk for example into or the recording medium of CD and so on.

When reproducing the music data of the mark added sign music or talk, reference marker has only the music block of voice data or the talk zone can be reproduced.On the contrary, when reproducing the voice data of the mark that has added sign music or talk, reference marker can reproduce voice data: have only music block or talk zone to be skipped in succession by this way from voice data.

As mentioned above, when having extracted the continuous characteristic quantity that the past value that causes being subjected to continuous data owing to the time is constant influences, can obtain to have considered the target signature amount of the past zone of continuous data to the influence of current region.

In the processing that obtains the target signature amount, most of arithmetical operation is used to extract continuous characteristic quantity.Therefore, the raising with the corresponding temporal resolution of increase of the divided overlapping scope of continuous characteristic quantity significantly increases the arithmetical operation of handling.In other words, the temporal resolution of target signature amount can be improving than simple more structure in the past, and do not need to increase the arithmetical operation in the processing.

In the input continuous data, can extract continuous characteristic quantity.Therefore, after the continuous data input in this embodiment up to the stand-by period that obtains feature than prior art in continuous data be divided into a plurality of zones and from a plurality of zones, extract stand-by period of feature little.

No matter be that continuous data according to prior art is divided into a plurality of zones and extracts the situation of feature or extracting continuous characteristic quantity from continuous data, the continuous characteristic quantity that is extracted is divided into a plurality of zones, obtains the situation of feature then from a plurality of zones according to this embodiment of the invention from a plurality of zones, the time delay (stand-by period) that is output up to the characteristic quantity that finally will obtain after continuous data is transfused to all provides the Calais mutually by time period that will be used for the input area data and the time period that is used for deal with data.

When continuous data was divided into a plurality of zones and extract feature from a plurality of zones, the time period that is used for the input area data was less than the time period that is used for deal with data.

On the contrary, when from continuous data, extracting continuous characteristic quantity, continuous characteristic quantity is divided into a plurality of zones and from a plurality of zones, extracts feature, though the time period that is used for area data with continuous data is divided into a plurality of zones and extract from a plurality of zones under the situation of features time period much at one, the time period that is used for deal with data is little.

Therefore, when extract from continuous data continuous characteristic quantity, with the continuous characteristic quantity that is extracted be divided into a plurality of zones, when from a plurality of zones, obtaining feature then, time delay (stand-by period) can be littler than the time delay that continuous data is divided into a plurality of zones and extract under the situation of feature from a plurality of zones.

In addition, as target signature amount evaluation part 34 or music/talk determining section 85, can use simple structure, described simple structure is according to the target signature amount that is obtained the expression correct data by the provincial characteristics amount of scalar or vector representation.Therefore, one of employed all kinds algorithm is created target signature amount evaluation part 34 or music/talk determining section 85 in can handling by common machines study processing or statistical study, and does not need to prepare specific model according to target problem.

In addition, can by the study continuous data and by each constantly (sampling spot) added teacher's data that the continuous data of the mark of representing a proper characteristics constitutes and create automatically and be used for from the continuous data continuous characteristic quantity of extraction and be stored in continuous Characteristic Extraction algorithm in continuous Characteristic Extraction part 31 shown in Figure 1 or shown in Figure 9 time-interval analysis part 81 and continuous music Characteristic Extraction part 82.

Then, with reference to Figure 18 to Figure 25, will processing that create continuous Characteristic Extraction algorithm automatically be described.

When the continuous Characteristic Extraction algorithm of automatic establishment, algorithm shown in Figure 180 is created part 101 and is relocated onto in signal conditioning package shown in Figure 2 11 or the signal conditioning package 51 shown in Figure 9.Algorithm is created part 101 and is created continuous Characteristic Extraction algorithm automatically, and described continuous Characteristic Extraction algorithm extracts continuous characteristic quantity automatically from the continuous data of being imported by the outside.

Specifically, as shown in figure 19, algorithm is created part 101 and is handled by the input continuous data with by each the data based GA of teacher (genetic algorithm) or GP (genetic planning) execution machine learning of representing that constantly the mark of a proper characteristics constitutes of continuous data, the result that establishment is handled as machine learning creates continuous Characteristic Extraction algorithm, and exports the continuous Characteristic Extraction algorithm of being created.

More particularly, as shown in figure 20, algorithm is created the combination that part 101 is created various wave filters (function), according to the accurate grade of estimating the feature that each mark is represented in the continuous data as the continuous characteristic quantity of result's output of the combination of the wave filter created, and from the unlimited combination of wave filter, retrieve the combination of the wave filter of the continuous characteristic quantity of output according to GA (genetic algorithm) or GP (genetic planning), utilize the described continuous characteristic quantity can be with the feature of higher precision evaluation continuous data.

Figure 21 illustrates the block scheme that algorithm is created the functional structure of part 101.Algorithm is created part 101 and is made of first generation gene establishment part 121, gene evaluation portion 122 and the second generation or offspring's gene establishment part 123.

First generation gene is created the first generation gene that part 121 is created the various combinations of expression wave filter.

Gene evaluation portion 122 evaluation precision grades, in described accuracy class, can assess the feature of the continuous data of representing by the mark of teacher's data from the continuous data of teacher's data according to the continuous characteristic quantity that the Filtering Processing of each gene representation of being created the part 121 or the second generation or 123 establishments of offspring's gene establishment part by first generation gene is extracted.Gene evaluation portion 122 is made of operating part 141, evaluation portion 142 and teacher's data storage part 143.

Operating part 141 inputs are stored in the continuous data of the teacher's data in teacher's data storage part 143, carry out the Filtering Processing by each gene representation in succession, and extract the continuous characteristic quantity of input continuous data.Operating part 141 is provided to evaluation portion 142 with the continuous characteristic quantity that is extracted.

As the back with reference to Figure 22 described, the evaluation of estimate of the accuracy class that evaluation portion 142 calculating expressions are assessed, in described accuracy class, the continuous characteristic quantity that can extract from the continuous data of teacher's data according to the operating part 141 of each gene of being created the part 121 or the second generation or 123 establishments of offspring's gene establishment part by first generation gene is assessed the feature of the continuous data of being represented by the mark of teacher's data.Evaluation portion 142 is provided to the second generation or selection part 151, the exchange part 152 of offspring's gene establishment part 123 and the part 153 of suddenling change with gene of being estimated and the information of representing evaluation of estimate.In addition, evaluation portion 142 orders are created part 154 at random and are created the predetermined number genes of interest.After the evolution that evaluation portion 142 definite evaluations of estimate have become stable and gene had restrained, evaluation portion 142 was provided to selection part 151 with these genes and their evaluation of estimate.

143 storages of teacher's data storage part are from teacher's data of outside input.

The second generation or offspring's gene are created part 123 and are created the second generation or offspring's gene.As mentioned above, the second generation or offspring's gene create part 123 by select part 151, exchange part 152, sudden change part 153 and create part 154 at random and constitute.

Described as the back with reference to Figure 22, select part 151 to select to inherit follow-on gene, and selected gene is provided to gene evaluation portion 122 as gene of future generation from the present age according to the evaluation of estimate that obtains by evaluation portion 142.Selection part 151 selects part 151 to select the predetermined number genes of interest from the gene with higher rating value after determining that the evolution of genes has restrained, and output is by the continuous Characteristic Extraction algorithm of filter combination conduct of selected gene representation.

Described with reference to Figure 22 as the back, the part of the wave filter that two genes that exchange part 152 is selected from the gene with higher rating value in the present age by change are represented is exchanged two genes.The gene that exchange part 152 will have been exchanged is provided to gene evaluation portion 122 as gene of future generation.

Described with reference to Figure 22 as the back, the part of the wave filter of the gene that sudden change part 153 is selected from the gene with higher rating value in the present age at random by randomly changing is come mutator.Change part 153 gene that suddenlys change is provided to gene evaluation portion 122 as gene of future generation.

Described as the back with reference to Figure 22, create part 154 at random and create new gene by making up various types of wave filters at random.Create part 154 at random the gene of being created is provided to gene evaluation portion 122 as gene of future generation.

Formation is the time series data of importing in real time by the wave filter of the gene that algorithm production part 101 is created, and promptly is used for the wave filter of continuous data.The example of these wave filters comprise arithmetical operation wave filter (be used for four fundamental rules arithmetical operation, exponent arithmetic, differentiate, integral operation and signed magnitude arithmetic(al)), LPF (low-pass filter), HPF (Hi-pass filter), BPF (bandpass filter), IIR (infinite impulse response) wave filter, FIR (finite impulse response (FIR)) wave filter, balanced wave volume real-time level maximization device, follow the trail of the tone tracker of interval and the level meter of creating the continuous data envelope.

Gene represents that with the form that wave filter is set up described wave filter is provided with by the order that they are performed, for example " tone tracker → differential filter → absolute value wave filter (ABS) → LPF ".

Figure 22 describes the process flow diagram of being created the algorithm establishment processing of part 101 execution by algorithm.

Then, as shown in figure 23, determine that in each unit the voice data of input is that algorithm is created part 101 and created the processing of extracting the continuous music Characteristic Extraction algorithm of continuous music characteristic quantity from voice data in music or the signal conditioning package 51 of talking with illustrating constantly with reference to what Fig. 9 described.In other words, create part 101 and create and shown in Figure 9 time-interval analysis part 81 and processing of music Characteristic Extraction part 82 corresponding continuous Characteristic Extraction algorithms continuously illustrating algorithm.

In step S101, first generation gene is created part 121 and is created first generation gene.Specifically, first generation gene is created various types of wave filters that part 121 share the time series data (that is continuous data) in real-time input by random groups and is created the predetermined number genes of interest.First generation gene is created part 121 gene of being created is provided to gene evaluation portion 122.

In step S102, operating part 141 gene that selection is not also estimated from the gene that first generation gene establishment part 121 provides.In this case, operating part 141 gene that selection is not also estimated from the first generation gene that first generation gene establishment part 121 is created is as evaluation objective.

In step S103, operating part 141 selects also not have processed a slice teacher's data.Specifically, select a slice also not to be used as teacher's data of the gene processing of current evaluation objective in teacher's data of operating part 141 from be stored in teacher's data storage part 143.

In step S104, operating part 141 utilizes the continuous characteristic quantity that extracts selected teacher's data as the gene of evaluation objective.Specifically, operating part 141 by importing selected teacher's data continuous data and carry out in succession by the continuous characteristic quantity that extracts selected teacher's data as the processing of the wave filter of the gene representation of evaluation objective.

When creating continuous music Characteristic Extraction algorithm, as shown in figure 24, by to carrying out as the voice data of teacher's data by processing as the gene representation of evaluation objective, promptly by carrying out in succession by Filtering Processing as the gene representation of evaluation objective, extract waveform as continuous music characteristic quantity, the voice data of described waveform is filtered.

Operating part 141 is provided to evaluation portion 142 with the continuous characteristic quantity that is extracted.

In step S105, operating part 141 determines whether that all teacher's data are processed.There are such teacher's data in teacher's data in being stored in teacher's data storage part 143---for gene as evaluation objective, also not from the continuous characteristic quantity of described teacher's extracting data---the time, operating part 141 determines also not handle all teacher's data.After this, flow process is returned step S103.After this, repeating step S103 is to step S105, up to handle all teacher's data at step S105.

When the definite result among the step S105 showed that all teacher's data are processed, flow process advanced to step S106.

At step S106, evaluation portion 142 is estimated gene.

When continuous music Characteristic Extraction algorithm is created, as shown in figure 25, evaluation portion 142 is calculated the evaluation of estimate of the accuracy class of representation feature amount from filtered waveform, described characteristic quantity is represented the feature of the continuous data represented by the mark of teacher's data, the characteristic quantity that promptly shows music or talk as the target signature scale of signal conditioning package 51, described filtered waveform are the continuous music characteristic quantities that extracts according to the gene as evaluation objective.

Then, the method for calculating evaluation of estimate will be illustrated.

Serve as a teacher data mark value (promptly, during the characteristic quantity of feature of expression continuous data) by continuous numeric representation, for example, the characteristic quantity of representing with correct data sequence is the speed sense of the music represented by the serial number in 0.0 to 1.0 the scope, for example, the absolute value of Pearson correlation coefficient is used as the evaluation of estimate of gene.Specifically, suppose that the value of the mark of teacher's data is represented with variable X, and the corresponding value of characteristic quantity continuously represents that with variable Y the correlation coefficient r of variable X and variable Y obtains by following formula (1).

R=(covariance of variable X and variable Y)/{ (standard deviation of variable X) * (standard deviation of variable Y) }

r = \frac{\frac{1}{n - 1} Σ_{i = 1}^{n} (X_{i} - \overset{&OverBar;}{X}) (Y_{i} - \overset{&OverBar;}{Y})}{\sqrt{\frac{1}{n - 1} Σ_{i = 1}^{n} {(X_{i} - \overset{&OverBar;}{X})}^{2}} \sqrt{\frac{1}{n - 1} Σ_{i = 1}^{n} {(Y_{i} - \overset{&OverBar;}{Y})}^{2}}} . . . (1)

Here,

Be the mean value of X,

Be the mean value of Y.

The value of the continuous characteristic quantity that extracts from continuous data is weak more with the correlativity of the value of the characteristic quantity of the continuous data of being represented by the mark of teacher's data, and correlation coefficient r is more near 0.On the contrary, correlativity is strong more, and correlation coefficient r is more near 1.0 or-1.0.In other words, utilize according to the precision of the characteristic quantity of the continuous data of the continuous characteristic quantity assessment of extracting high more probably by filter combination as the gene representation of evaluation objective, correlation coefficient r approaches 1.0 or-1.0 more, and precision is low more, and correlation coefficient r approaches 0 more.

Serve as a teacher data mark value (promptly, the characteristic quantity of the feature of expression continuous data) is divided into predetermined time-like, illustrational as top institute, the target signature amount is classified into talk or music or sound existence or sound miss status, so for example, the Fisher differentiation is used as evaluation of estimate than (FDR).

For example, when the target signature amount is divided into two time-likes, in other words, when the target signature amount is represented with binary value, the value of the continuous characteristic quantity that extracts in by the processing as the gene representation of evaluation objective is divided into two groups according to the value of the respective markers of teacher's data, represent by group X and group Y for described group, thereby, FDR obtained by following formula (2).

FDR=(mean value of mean value-Y of X) ²/ { (standard deviation of standard deviation+Y of X) }

...(2)

The value of the continuous characteristic quantity that extracts in the processing as the gene representation of evaluation objective and the correlativity of the group under the described value are weak more, promptly the correlativity of the value of the continuous characteristic quantity that extracts in the processing as the gene representation of evaluation objective and the characteristic quantity represented by the mark of teacher's data is weak more, and the value of FDR is more little.On the contrary, the value of the continuous characteristic quantity that extracts in the processing as the gene representation of evaluation objective and the correlativity of the group under the described value are strong more, promptly the correlativity of the value of the continuous characteristic quantity that extracts in the processing as the gene representation of evaluation objective and the characteristic quantity represented by the mark of teacher's data is strong more, and the value of FDR is big more.In other words, the FDR value is big more probably, utilize according to the precision of the characteristic quantity of the continuous data of the continuous characteristic quantity assessment of extracting by filter combination as the gene representation of evaluation objective high more, otherwise the FDR value is more little, precision is low more.

The said method that calculates the evaluation of estimate of gene is exemplary.Or rather, the preferred proper method that uses the characteristic quantity that has the continuous characteristic quantity that in processing, extracts and represent by the mark of teacher's data by gene representation.

When calculating number increases owing to a plurality of samples that have continuous characteristic quantity, if necessary, can the sample number of continuous characteristic quantity be decimated.

In step S107, evaluation portion 142 determines whether that all genes are estimated.When definite result of step S107 showed the evaluation of also not finishing all genes, flow process was returned step S102.Repeating step S102 is to step S107, shows that up to definite result of step S107 all genes are estimated.

When definite result of step S107 showed that all genes have been estimated, in this case, all genes of the first generation were estimated, and flow process advances to step S108.

In step S108, evaluation portion 142 is made comparisons the evaluation of estimate of previous generation's gene with the evaluation of estimate when the former generation gene.In this case, because first generation gene is just being estimated and the evaluation of estimate of previous generation's gene also is not stored, so the maximal value of the evaluation of estimate of evaluation portion 142 storage first generation genes is as the evaluation of estimate of current gene.

In step S109, evaluation portion 142 determines whether evaluation of estimate is updated in predetermined generation.In this case, because evaluation of estimate changes among the step S108, so flow process advances to step S110.

In step S110, select part 151 to select gene.Specifically, evaluation portion 142 is provided to selection part 151 with all genes in the present age and the information of the evaluation of estimate of these genes of expression.Select part 151 from gene, to select the predetermined number genes of interest, and selected gene is provided to gene evaluation portion 122 as gene of future generation with higher rating value.

In step S111, exchange part 152 gene is exchanged.Specifically, evaluation portion 142 is provided to exchange part 152 with all genes in the present age and the information of the evaluation of estimate of these genes of expression.Exchange part 152 from the evaluation of estimate that has than selecting two genes the high gene of predetermined value at random, and between selected gene, exchange wave filter.Therefore, exchange part 152 and exchange two genes by the wave filter that reconfigures by gene representation.The gene that exchanges part 152 exchange predetermined number genes of interest and will exchange is provided to gene evaluation portion 122 as gene of future generation.

In step S112, sudden change part 153 mutators.Specifically, evaluation portion 142 is provided to sudden change part 153 with all genes in the present age and the information of the evaluation of estimate of these genes of expression.Sudden change part 153 is by coming mutator from the evaluation of estimate that has than the part of the wave filter of selecting predetermined number genes of interest and the selected institute of randomly changing gene the high gene of predetermined value at random.Sudden change part 153 is provided to gene evaluation portion 122 with the gene that is suddenlyd change as gene of future generation.

In step S113, create part 154 at random and create gene at random.Specifically, evaluation portion 142 orders are created part 154 at random and are created the predetermined number genes of interest.Create part 154 at random and create the predetermined number genes of interest randomly creating in the identical processing of part 121 with first generation gene.Create part 154 at random the gene of being created is provided to gene evaluation portion 122 as gene of future generation.

After this, flow process turns back to step S102.Repeating step S102 is estimated up to all genes of determining the second generation at step S107 to step S107.

When definite result of step S107 showed that all genes have been estimated, when promptly all genes of the second generation had been estimated, flow process advanced to step S108.

At step S108, in this case, the maximal value in the evaluation of estimate of evaluation of estimate of the previous generation gene that evaluation portion 142 will have been stored (that is the evaluation of estimate of first generation gene) and second generation gene is made comparisons.When the maximal value in the evaluation of estimate of second generation gene during greater than the evaluation of estimate of first generation gene, the maximal value of the evaluation of estimate of evaluation portion 142 usefulness second generation genes is upgraded the evaluation of estimate of current gene.When the maximal value in the evaluation of estimate of second generation gene was equal to or less than the evaluation of estimate of first generation gene, evaluation portion 142 was upgraded the evaluation of estimate of current gene without the maximal value of the evaluation of estimate of second generation gene, and used the evaluation of estimate of current gene.

Repeating step S102 is to step S113, up to determine at step S109 predetermined number in generation evaluation of estimate be not updated.In other words, create and estimate gene of new generation, the maximal value of the evaluation of estimate of the evaluation of estimate of previous generation gene and gene of new generation is made comparisons, when the maximal value of the evaluation of estimate of gene of new generation during greater than the evaluation of estimate of the gene of previous generation, upgrade the evaluation of estimate of contemporary gene, the evaluation of estimate of gene is not updated in the generation at predetermined number.

When the definite result among the step S109 showed that the evaluation of estimate of gene in generation at predetermined number is not updated, promptly the evaluation of estimate of gene was stable and the evolution of gene when having restrained, and flow process advances to step S114.

As an alternative, at step S109, can determine whether the maximal value of the evaluation of estimate of contemporary gene is equal to or greater than predetermined threshold value.In this case, the maximal value of evaluation of estimate that shows contemporary gene as the definite result among the step S109 is during less than predetermined threshold value, when promptly utilizing precision by the characteristic quantity of the combined evaluation of the wave filter of contemporary gene representation not satisfy expectation value, flow process advances to step S110.On the contrary, when definite result of step S109 shows that the maximal value of the evaluation of estimate of contemporary gene is equal to or greater than predetermined threshold value, when promptly utilizing the precision value of meeting the expectation by the characteristic quantity of the combination evaluation of the wave filter of contemporary gene representation, flow process advances to step S114.

In step S114, select part 151 to select to be used for the gene of continuous Characteristic Extraction algorithm.After this, algorithm is created and is finished dealing with.Specifically, evaluation portion 142 is provided to selection part 151 with all genes in the present age and the evaluation of estimate of these genes.The gene of selecting part 151 from all genes in the present age, to select predetermined number (at least one) to have maximum evaluation of estimate, and output is by the continuous Characteristic Extraction algorithm of combination conduct of the wave filter of selected gene representation.

As an alternative,, can from all genes in the present age, select evaluation of estimate all genes higher of having, and can export by the combination of the wave filter of selected gene representation as Characteristic Extraction algorithm continuously than predetermined threshold value at step S114.

By such mode, create the continuous Characteristic Extraction algorithm that from continuous data, extracts continuous characteristic quantity that uses in signal conditioning package 11 shown in Figure 2 or the signal conditioning package 51 shown in Figure 9.

Because the Characteristic Extraction algorithm is created automatically according to GA or GP continuously, so can be from the combination of extracting the wave filter of the continuous characteristic quantity that is more suitable for the assessment objective characteristic quantity than acquisition the more filter combination of the algorithm of manual creation.Therefore, can expect to improve the assessment precision of target signature amount.

In signal conditioning package shown in Figure 2 11 or signal conditioning package 51 shown in Figure 9, can only create part 101 and create the continuous Characteristic Extraction algorithm that extracts continuous characteristic quantity by algorithm.Perhaps, can the continuous Characteristic Extraction algorithm of manual creation.Perhaps, can use algorithm to create the continuous Characteristic Extraction algorithm of part 101 establishments and the continuous Characteristic Extraction algorithm of manual creation side by side.

In the description in front, for example understand the signal conditioning package of the continuous data handle voice data for example or motion image data and so on.Yet, as embodiment, the present invention can be applicable to write down and reproduce data recording/reproducing device, recording voice data or the motion image data of voice data or motion image data pen recorder, reproduce transcriber of voice data or motion image data or the like.More particularly, as embodiment, the present invention can be applied to have the record player of built in light disk drive or hard disk, the portable recorder with embedded semiconductor memories or player, digital video camera, mobile phone or the like.

In the description in front, the target signature scale shows for example music or the talk of feature that obtains the most at last.Perhaps, the target signature amount can be the value of the probability of the feature that the most at last obtain of expression such as the probability of music or talk.

When handle creating by study that the target signature amount is extracted formula and extracting formula and carry out arithmetical operation, can extract the feature of data according to the target signature amount.When in each predetermined frequency band, analyzing continuous in chronological order voice data continuously in chronological order, from analysis result, extract continuous characteristic quantity as continuous in chronological order characteristic quantity, continuous characteristic quantity is divided into a plurality of zones, in the described zone each has predetermined length, from each zone, extract the provincial characteristics amount of conduct by the characteristic quantity of a scalar or vector representation, and assessment thereby can be easily and extract the feature of voice data apace as the target signature amount of characteristic quantity of a feature of expression voice data from the provincial characteristics amount.

Can carry out the processing sequence of front by hardware or software.When carrying out the processing sequence by software, the program that constitutes software is built in the specialized hardware of computing machine or from program recorded medium for example was installed to general purpose personal computer, general purpose personal computer was carried out various types of functions according to the various programs of installation it on.

Figure 26 illustrates the block scheme of exemplary configurations of carrying out the personal computer of aforementioned processing sequence according to program.CPU (CPU (central processing unit)) 201 carries out various types of processing according to the program that is stored in ROM (ROM (read-only memory)) 202 or the storage area 208.In case of necessity, RAM (random access memory) 203 storages make CPU 201 carry out the program of processing, data etc.By bus 204 CPU 201, ROM 202 and RAM 203 are interconnected.

IO interface 205 also is connected to CPU 201 by bus 204.The importation 206 that constitutes by keyboard, mouse, microphone etc. and be connected to IO interface 205 by the output 207 that display, loudspeaker etc. constitutes.CPU 201 is according to the various types of processing of command execution of 206 inputs from the importation.CPU 201 outputs to output 207 with result.

The storage area 208 that is connected to IO interface 205 for example is made of hard disk.Storage area 208 storages make CPU 201 carry out program and the various types of data of handling.Communications portion 209 is by the network and the external device communication of for example the Internet or LAN (Local Area Network) and so on.

Perhaps, program can obtain by communications portion 209, and is stored in the storage area 208.

When for example removable medias 211 such as disk, CD, magnetooptical disc, semiconductor memory were attached to the driver 210 that is connected in IO interface 205, driver 210 made removable media 211 be read and therefrom obtains program, data etc.In case of necessity, program that is obtained and data are sent to storage area 208 and are stored in the storage area 208.

As shown in figure 26, storage is installed to computing machine and the program recorded medium of the program carried out by computing machine is made of removable media 211, removable media 211 be encapsulation medium for example disk (comprising floppy disk), CD (comprising CD-ROM (compact disk-ROM (read-only memory)), DVD (digital universal disc), magnetooptical disc) or semiconductor memory, interim or permanent storage program ROM202 or constitute the hard disk of storage area 208.In case of necessity, program by router for example or modulator-demodular unit and so on as the communications portion 209 of interface or the wired or wireless communication medium memory by for example LAN (Local Area Network), the Internet or digital satellite broadcasting and so on to program recorded medium.

In this instructions, the step of describing the program in the program recorded medium that is stored in is processed in chronological order with the order that they are described.Perhaps, these steps can be carried out side by side or discretely.

It will be understood by those skilled in the art that according to designing requirement and other factors and can make various modifications, combination, sub-portfolio and change, as long as they are within the scope of claims or its equivalent.

Claims

1. signal conditioning package comprises:

Analytical equipment, each that is used at predetermined frequency band analyzed continuous in chronological order voice data in chronological order continuously;

Characteristic amount extraction device is used for extracting continuous characteristic quantity from the analysis result of described analytical equipment continuously, and described continuous characteristic quantity is continuous in chronological order characteristic quantity;

Segmenting device is used for described continuous characteristic quantity is divided into a plurality of zones, and each in described a plurality of zones has predetermined length;

Provincial characteristics amount extraction element is used for extracting the provincial characteristics amount from each of a plurality of zones that described continuous characteristic quantity is divided into, and described provincial characteristics amount is the characteristic quantity by a scalar or vector representation; And

Target signature amount apparatus for evaluating is used for each the assessment objective characteristic quantity according to described provincial characteristics amount, and described target signature amount is the characteristic quantity of a feature of expression voice data.

2. signal conditioning package as claimed in claim 1,

Wherein, by study by continuous in chronological order voice data be illustrated in teacher's data that the characteristic quantity of a proper characteristics of expression voice data in each zone in a plurality of zones that described continuous characteristic quantity is divided into constitutes and pre-create described target signature amount apparatus for evaluating.

3. signal conditioning package as claimed in claim 1,

Wherein, described analytical equipment is analyzed continuous in chronological order voice data in chronological order continuously as the sound of the interval of 12 equal temperances on each octave rank, and

Wherein, described continuous characteristic amount extraction device is from the continuous characteristic quantity of extracting data of the energy of the interval of and 12 equal temperances that represent each octave rank acquisition as the analysis result of described analytical equipment.

4. signal conditioning package as claimed in claim 1,

Wherein, described target signature amount apparatus for evaluating assessment is designated music or talk the target signature amount of the feature of voice data.

5. signal conditioning package as claimed in claim 1 also comprises:

Smoothing apparatus is used for coming level and smooth target signature amount by the sliding average that obtains the target signature amount.

6. signal conditioning package as claimed in claim 1 also comprises:

Memory storage, the mark that is used for identifying the feature of being shown by the target signature scale of being assessed adds voice data to, and tagged voice data has been added in storage.

7. signal conditioning package as claimed in claim 1 also comprises:

The algorithm creation apparatus is used for creating the algorithm that extracts continuous characteristic quantity from continuous in chronological order voice data according to GA (genetic algorithm) or GP (genetic planning).

8. information processing method comprises following step:

In each of predetermined frequency band, analyze continuous in chronological order voice data in chronological order continuously;

Extract continuous characteristic quantity from the analysis result of described analytical procedure, described continuous characteristic quantity is continuous in chronological order characteristic quantity;

Described continuous characteristic quantity is divided into a plurality of zones, and each in described a plurality of zones has predetermined length;

Extract the provincial characteristics amount in from a plurality of zones that described continuous characteristic quantity is divided into each, described provincial characteristics amount is the characteristic quantity by a scalar or vector representation; And

According to each assessment objective characteristic quantity of described provincial characteristics amount, described target signature amount is the characteristic quantity of a feature of expression voice data.

9. program of carrying out by computing machine, described program comprises following step:

Extract continuous characteristic quantity from the analysis result of analytical procedure, described continuous characteristic quantity is continuous in chronological order characteristic quantity;

10. a recording medium has write down the program of being carried out by computing machine on the described recording medium, and described program comprises following step:

Assessment objective characteristic quantity from each of described provincial characteristics amount, described target signature amount are the characteristic quantities of a feature of expression voice data.

11. a signal conditioning package comprises:

Analysis part, described analysis part are analyzed continuous in chronological order voice data in chronological order continuously in each of predetermined frequency band;

Continuous Characteristic Extraction part, described continuous Characteristic Extraction part is extracted continuous characteristic quantity from the analysis result of described analysis part, and described continuous characteristic quantity is continuous in chronological order characteristic quantity;

Partitioning portion, described partitioning portion is divided into a plurality of zones with described continuous characteristic quantity, and each in described a plurality of zones has predetermined length;

The provincial characteristics amount is extracted part, and described provincial characteristics amount is extracted in part each from a plurality of zones that described continuous characteristic quantity is divided into and extracted the provincial characteristics amount, and described provincial characteristics amount is the characteristic quantity by a scalar or vector representation; And

Target signature amount evaluation part, described target signature amount evaluation part are according to each assessment objective characteristic quantity of described provincial characteristics amount, and described target signature amount is the characteristic quantity of a feature of expression voice data.