WO2001061683A1

WO2001061683A1 - Identification of structure in time series data

Info

Publication number: WO2001061683A1
Application number: PCT/GB2001/000635
Authority: WO
Inventors: Michael Turner; Simon Moss; Paul Zanelli
Original assignee: Pc Multimedia Limited
Priority date: 2000-02-16
Filing date: 2001-02-16
Publication date: 2001-08-23
Also published as: AU2001233861A1

Abstract

A method and system for identifying structure in time series data. The data is represented by a plurality of elements from a set of possible elements. The method includes the steps of: (i) determining an upper bound on the probability that the structure to be identified includes a one of the possible elements; (ii) calculating a threshold probability; and (iii) eliminating the element from the set of possible elements if the upper probability bound is less than the threshold probability. A system is provided including a memory for storing data and processing means operating on data to carry out the method.

Description

Identification of Structure in Time Series Data

The present invention relates to pattern recognition, and in particular to the identification of structure or objects in time ordered sequences of data.

The identification of objects or structure in time-series data is a longstanding problem in the field of pattern recognition. Given a ID time-varying signal the aim is to identify and extract salient and meaningful structural descriptions of the data, possibly at multiple levels of description.

For example and with reference to Figure 1, in the field of speech recognition, the aim is to extract structural components from sound. At the lowest level of description these structural components or objects may be phonemes, i.e. the basic sound units of speech, which have corresponding morphemes, i.e. speech elements having a meaning or grammatical function that cannot be subdivided into further such elements . At an intermediate level of description the goal is to extract syllables or words, in which case a successful identification provides a speech-to-text converter. At the highest level of description, semantic and syntactic information is present in the time-series data, and can be used, so as to extract meaning from the data, on which to base future actions.

For instance in Figure 1, line 110 represents the output of a microphone with time as the words 'the cat sat on the mat' are spoken. The time series -data comprises a sequence of phonemes 120, with their respective morphemes. The phonemes group together to form syllables 130, which independently, or in combination, form words 140. The words themselves have syntactic and semantic concepts 150 associated with them, from which further information can be extracted, such as the subject of the phrase, from the rules of grammar.

These different levels of description of the time series data are not independent. The identification of an object at one level of representation depends on the identification of other objects throughout the system at the same, or different, level of description. For example, the identification of a word is dependent not only upon the existence of a series of phonemes, but also upon its context at the word level, and top-down conceptual information.

Most prior art methods for time-series analysis follow a sequential approach in which low-level features are first extracted from the signal, and these used to build up more complex descriptions. For example, and with reference to figure 2, there are standard methods for identifying best- guess phonemes 220 from speech data 210. These best guess phonemes are used to identify best-guess syllable-based descriptions 230, which in turn are used to determine best word descriptions 240, and so on, up the processing hierarchy, all the way up to levels at which syntactic and semantic constraints can be imposed.

The methods employed at each level of speech processing vary with the level of description. At low levels, frequency and template matching techniques are often used to find the best match between stored descriptions of phonemes and the data at hand. At intermediate levels statistical methods such as hidden Markov models and neural networks are commonplace. These may make use of not only the current data, but also information about the interpretation of the signal so far, such as contextual information. At higher levels methods such as Artificial Intelligence or syntactic grammars are employed to impose syntactic and semantic constraints on descriptions of the data.

The major reason for the problems with conventional approaches is the limited pattern recognition they employ. With reference to figure 2, the identification of structure in time-series data is realised through a chain of operations, in which only the best-guess description of the data at lower levels are passed on.

That is, best-guess information is passed up the processing hierarchy. This approach depends critically on obtaining good initial solutions, but this is not possible in general, as illustrated by the phrase "thick hat saturn dim at" being identified instead of "the cat sat on the mat" for the identical time-series data.

The result is that errors introduced early on necessarily pass on to subsequent stages, causing mistakes there and thereby leading to a non-optimal solution. Attempts to improve on the recovered solution may be subsequently made, but in essence, these simply make minor refinements to the current best-guess solution (i.e. perform a local gradient- based search) and are incapable of recovering from non- trivial errors.

The present invention relates to an approach to recognising objects and structural information in time-series data which is fast and gives good solutions under a wide range of realistic conditions.

According to a first aspect of the present invention, there is provided a method of identifying structure in time series data, in which the data corresponds to a plurality of elements from a set of possible elements, the method including the steps of :

(i) determining an upper bound on the probability that the structure to be identified includes a one of the possible elements;

(ii) calculating a threshold probability; and

(iii) eliminating the element from the set of possible elements if the upper probability bound is less than the threshold probability.

The method considers all possible elements that can correspond to the time series data, and rather than selecting the most likely elements, eliminates the least likely elements. In this way the method retains all possible solutions to the identification problem at early stages of processing in a manner which is computationally tractable .

Preferably steps (i) and (iii) are iterated so as to eliminate all non-plausible elements from the set of possible elements so as to identify the structure. In this way, the set of possible elements is caused to decrease until the structure emerges. Alternatively, once a sufficiently small number of possible elements have been identified by the method, an exhaustive search technique can be used.

Preferably, the method includes the step determining the one of the possible elements, by comparing time series data with known data. In order to determine the element for which the upper bound is to be calculated, a part of the time-series data can be matched with or compared to data relating to a known element. This allows the time-series data to be converted into a representation in terms of various possible elements .

The set of possible elements can include a plurality of subsets. Each of the plurality of subsets can have a different type of element in it and the elements in each subset can be of the same type .

A one of the plurality of subsets can comprise simplest elements. This allows the time-series data to be broken down into its most basic elements .

A one of the plurality of subsets can comprise compound elements in which each compound element comprises elements of a different subset. Different subsets can comprise different types of compound element. A compound element of one subset can comprise at least two compound elements from a different subset or subsets. In this way some of the elements can be made up of combinations of compound elements of the same or different types.

The different subset can be the subset of simplest elements. In this way some of the elements can be made up of combinations of the most basic elements.

Preferably each subset has a different meaning. The elements of a subset can have a common type. The subsets can each have a different level of information associated with them so that there is a hierarchy of meaning that the subsets provide. The hierarchy can correspond to the complexity of the elements of the subset.

A one of the plurality of subsets can comprise relational elements which indicate relationships between the elements of another of the plurality of subsets. The relational elements can include the proximity, time order or sequence of elements in the data.

The time series data can represent speech. In this way the method provides a speech recognition method. Preferably, the structure identified is words.

The time series data can represent electronic nose data. In this way the invention can be used to recognise objects or provide a medical diagnosis by recognising structure in the time varying smell data that is provided by an electronic nose device.

The time series data can represent neural signals. In this way significant brain events can be identified by recognising objects and structure in the signal generated by a transducer being used to detect brain activity. The time series data can represent other bio-signals such as heart, pulse, and breathing signals generated by a transducer being used to detect those activities.

The time series data can represent levels of utility supply. For instance, the invention can be used to analyse and predict the electricity, gas and water demand from the time variation of current usage data.

According to a further aspect of the invention, there is provided a system for identifying structure in time series data, in which the data corresponds to a plurality of elements from a set of possible elements, the system including a memory storing time series data and data processing means, in which the data processing means operates on data to: (i) determine an upper bound on the probability that the structure to be identified includes a one of the possible elements;

(ii) calculates a threshold probability; and (iii) eliminates the element from the set of possible elements if the upper probability bound is less than the threshold probability.

The system can include a computer. The system can include a transducer which generates a time series signal or data. The transducer can be a microphone, or any other device which can convert speech into an electrical signal. Preferably, the time series data is digital. Preferably the system stores time data. Preferably, the system includes a database of representations of known data with which the time series data is compared.

According to a further aspect of the invention, there is provided computer program code executable on a computer to carry out a method according to a first aspect of the invention.

According to a yet further aspect of the invention, there is provided a computer readable medium bearing a computer program which when running on a computer carries out the method according to the first aspect of the invention.

An embodiment of the invention will now be described in detail, by way of example only, and with reference to the accompanying drawings, in which:

Figure 1 shows an analysis of the elements of speech time series data; Figure 2 shows a prior art method applied to speech time series data;

Figure 3 shows a system for identifying structure in time series data according to the present invention; and

Figure 4 illustrates the identification of structure in time series data by a method according to the present invention.

In the Figures, the same items share common reference numerals unless indicated otherwise. The following discussion focuses on the application of the invention in the field of speech recognition. However, the invention is not intended to be limited to that field of application, but rather can be applied in any field where it is desired to identify structure in a time dependent signal.

It is considered that it would be within the ability of a man of ordinary skill in the art to write software suitable to implement the method of the invention in light of this description and so suitable software has not been described in any detail.

Figure 3 shows a system 300 including a microphone 310 connected to a computer 320, which includes a processor and memory. A person speaks and generates sound waves 325 which are detected by the microphone and converted into an electrical signal. The signal is digitised by the computer and stored as a digital signal together with time base data. The digitised speech signal 410 provides time series data which is to be processed so as to extract meaning by identifying structure in the data, including objects and the relative position, proximity and sequence of objects. In this case the objects to be identified are the words in the spoken phrase "the cat sat on the mat" .

As discussed in the introductory portion, speech is made up from a number of phonemes 420 (simplest speech noises) which each have a corresponding morphene (simplest speech element) . The computer has a database of data representing the signal associated with each of all the possible phonemes. The computer scans through the digitised signal 410 to identify parts of the signal, eg 412, 414, 416 which relate to plausible phonemes 422, 424, 426, by comparing the signal regions with the database data using a frequency analysis and template matching process.

The set of all possible phonemes, and their respective morphemes, constitute sets of the simplest elements that can be used to represent the time series speech signal, either as the simplest noises or as the simplest indicia, e.g. th, i and k. All possible phonemes are considered.

Next, the computer locates candidate positions in the speech signal for syllables 430, by using a template matching process to compare the candidate morphemes with a database of all possible syllables, and so generates a set of plausible syllables, eg 432, 434. Most of the possible syllables are compound elements which comprises at least two of the simplest morpheme elements . The database provides a set of all possible syllables .

Then, the computer locates candidate word 440 positions in the signal by using a template matching process to compare the candidate syllables 430, with a database of possible words. The set of possible words in the database provides a further set of compound elements, which are made up from syllables, which themselves are compound elements, or equivalently from morphemes which are simple elements.

The computer then identifies candidate concepts 450, by comparing the candidate word elements identified with a database of syntactic and semantic rules in order to identify concepts that can be associated with particular parts of the speech signal. For instance the candidate word element sat, can be identified as a verb and the candidate word element saturn can be identified as a noun. The set of concept elements provides a set of relational elements, as they relate to the possible relationships between word elements. For instance, unless they are proper nouns, or separated by punctuation, the sequence noun noun is an unlikely relationship in the English language. Further rules of grammar can be applied to the concepts such as a sentence requiring a subject and a verb, so as to determine whether the candidate concepts are plausible or not.

The temporal separation of the phonemes in the signal can also be used to determine the likely presence of punctuation which can also provide a set of elements used in the method. No sound can be represented by a gap phoneme and non-words, such as ^xerm', ^λah' and um' , will have there own phonemes.

Once this analysis has been completed, the computer has a set of candidate or plausible elements which has been derived from the set of all possible elements. The set of all possible elements includes the different subsets of morpheme, syllable, word, concept and relational elements. From this set of possible elements, the computer selects a first element, e.g. morpheme k 426, and uses Bayseian probability theory to compute an upper bound on the probability that the global solution contains that element: ie that the structure which is to be identified includes the phoneme k at that temporal position. This is repeated for each of the plausible elements that have been identified, including an upper probability bound for the word saturn occurring at its temporal position.

A threshold probability is then calculated and the upper bound for each of the plausible elements compared with that threshold probability. If the upper bound for an element falls below the threshold, then that element can be eliminated from the set of plausible elements. Further the elimination of an element has an effect on the other possible elements. For instance, if the noun noun sequence is an unlikely relational element, which can therefore be eliminated, then the word elements hat and saturn may also be eliminated.

Once a plausible element has been eliminated, as discussed above, this inherently affects the likelihood of the other plausible elements being correct and so the upper bound on the probability of the remaining elements being contained in the global solution is recalculated and the procedure repeated. This part of the procedure is iterated until no further elements are eliminated. The computer can then save the results of the structures at all levels of meaning, that have been generated which can be subject to further analysis as required. One of these results will be the words "the cat sat on the mat" once the incorrect identification of the words "thick hat saturn dim at" has been eliminated. If several phrases have survived the elimination process, then rules of grammar can be applied to determine the correct phrase . The method stems from a method of pattern recognition based upon three conditions :

1. Calculations are underpinned by Bayesian probability ' theory.

2. The method requires that all solutions (i.e., all possible structural descriptions of the data) be assessed.

3. Processing is resource-driven such that the calculations that can be performed are constrained by the memory available and the speed of operations required, as defined by the operator.

In brief, the method uses the key conditions as follows.

Given the available computing resources, a suitable means of computing an upper bound probability for regions of the global solution space is defined. Through an iterative process, regions with low upper bounds are eliminated, and then effort is re-applied to those regions that remain. As more and more of the solution space is eliminated, so the size of the regions covering the remaining space can be reduced without compromising resources, and more accurate upper bounds can be evaluated. In this way, good solutions are identified through a process of exclusion.

All plausible descriptions of the time series data are examined, processing being a task of eliminating implausible descriptions so as to hone in on the best solution through a process of exclusion.

At the onset of processing all possible elements (i.e., in the example system, all possible phonemes, syllables, words, and concepts) are available to the system as possibilities, bar those eliminated due to prior knowledge. Processing is then a task of eliminating implausible solutions and seeing how this affects the system. This is an iterative process. For example, elimination of an unlikely phoneme in itself leads to the elimination of implausible words, which itself may feed back down resulting in the elimination of other phonemes.

Through this iterative process of eliminating implausible structures a good global solution is identified by exclusion. This is in contrast to all existing methodologies that attempt to identify a global solution directly through the propagation of best-guess solutions.

A mathematical discussion of the method of the invention will now be given. Consider data x=x(t) which varies as a function of time. The goal is to derive the best global description, s=s(t), of the data where the description consists of a number of levels, 1, _N, and at each level n, the description is denoted s (n) = { s (n, t) , t=l , t} where s(n,t) identifies the object assigned to the data at time t.

From conditions 2 and 3 an holistic, probability theory approach is used, requiring:

(1) s=arg max _{s εΦ}P(s*=s| x)

where Φ is the space of possible global solutions for s.

This aim is not realised directly, i.e., by actively searching for and refining solutions within the global solution space, as this is the approach of existing gradien -based techniques. Rather, the solution is arrived at indirectly, by eliminating bad solutions from Φ. In doing so all of the solution space is implicitly examined, in line with condition 2, as follows.

In order to satisfy condition 3, the honing process begins by grouping solutions together. Examining each individual solution in isolation would be computationally intractable in general .

Grouping is achieved as follows. Consider all solutions that contain the individual description s(n,t)=α, say. That is, the object or element at the nth level description at time t is α.

The maximum probability of any one of these solutions is:

(2) U(s (n, t) =α) =max _s<_εφ. P (s (n, t) =α, s ' | a)

where s' denotes the set of labels at all times and levels bar the particular time and level under consideration, and Φ' is the space of possible solutions for this set.

Now any group of solutions whose lowest upper bound probability is below some known lower bound value, L, cannot contain the optimum solution. Therefore, these groups can be eliminated from consideration. The rule for s^k(n,t)=α at some iteration time k is:

(3) if U(s^k (n, t) =α) < L^(k> eliminate any solution containing the match s^k(n,t)=α

By eliminating sets of poor solutions the size of the space which needs to be considered at the next time step is effectively reduced . That is, the new search space at time k+1, Φ^lk+1) , will not contain these solutions, which will affect future processing. In relation to the description, if the possibility s^k(n,t)=αis excluded, then this will affect the upper bound on other matches at the next iteration.

As processing progresses so fewer solutions remain. This means that the size of the groups reduces. Ultimately, only one solution will remain (unless processing is terminated prematurely) .

The computation of the upper bound has not yet been defined, and in general may be computationally expensive, thereby breaking condition 3. The solution is to identify quantities of the form Y^(k> such that Y^<k)>= U^(k) which can be computed in a given time and using a given amount of memory. The elimination rules then become:

eliminate any solution containing the match s (n, t) =α if

(4) Y^(k) ( s^k(n, t)=α )< L ( )

Y ^(k) is evaluated by combining Bayesian probability theory with rules of inequality. Its form may change over the iterative cycles in order to accommodate condition 3. For example, at the onset of processing Y^(k) may be coarsely and quickly evaluated, but provided it obeys Y^{( )}>= U^(k) then only bad solutions will be eliminated. Towards the end of processing when only a few solutions remain, a more sophisticated and computationally intensive means of computing Y may be employed, such that Y^<k) approximates U^lk> provided condition 3 is not violated. Processing will continue until no solutions fall below the relevant threshold. At any time processing may be re-started by heuristically increasing the threshold, or alternatively, the remaining solutions may be recorded and processed in some manner.

In summary, the global solution space Φ is iteratively reduced by identifying and eliminating implausible elements. Elimination is achieved by comparing an upper bound on the probability of any global solution containing an element against a threshold. Computational overheads are addressed using a coarseness function Y that, whilst not necessarily delivering the lowest upper bound, is sufficient for identifying inappropriate regions of the solution space.

Claims

Claims :

1. A method of identifying structure in time series data, in which the data corresponds to a plurality of elements from a set of possible elements, the method including the steps of :

(i) determining an upper bound on the probability that the structure to be identified includes a one of the possible elements; (ii) calculating a threshold probability; and

2. A method as claimed in claim 1, in which steps (i) and (iii) are iterated so as to eliminate all non-plausible elements from the set of possible elements so as to identify the structure .

3. A method as claimed in claim 1, and including the step determining the one of the possible elements, by comparing time series data with known data.

4. A method as claimed in claim 1, in which the set of possible elements includes a plurality of subsets .

5. A method as claimed in claim 4, in which a one of the plurality of subsets comprises simplest elements.

6. A method as claimed in claim 5, in which a one of the plurality of subsets comprises compound elements in which each compound element comprises elements of a different subset .

7. A method as claimed in claim 6, in which the different subset is the subset of simplest elements.

8. A method as claimed in claim 4, in which each subset has a different meaning.

9. A method as claimed in claim 4, in which a one of the plurality of subsets comprises relational elements which indicate relationships between the elements of another of the plurality of subsets.

10. A method as claimed in claim 1, in which the time series data represents speech.

11. A method as claimed in claim 10, in which the structure identified are words.

12. A system for identifying structure in time series data, in which the data corresponds to a plurality of elements from a set of possible elements, the system including a memory storing time series data and data processing means, in which the data processing means operates on data to:

(i) determine an upper bound on the probability that the structure to be identified includes a one of the possible elements; (ii) calculates a threshold probability; and (iii) eliminates the element from the set of possible elements if the upper probability bound is less than the threshold probability.

13. Computer program code executable on a computer to carry out a method as claimed in claim 1.

14. A computer readable medium bearing a computer program which when running on a computer carries out the method of claim 1.

15. A method substantially as hereinbefore described with reference to Figures 3 to 4.

16. A system substantially as hereinbefore described with reference to the accompanying Figures.