GB2391648A

GB2391648A - Method of and Apparatus for Retrieving an Illustration of Text

Info

Publication number: GB2391648A
Application number: GB0218297A
Authority: GB
Inventors: Philip Glenny Edmonds; Victor Poznanski
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2002-08-07
Filing date: 2002-08-07
Publication date: 2004-02-11
Also published as: WO2004015592A1; AU2003250534A1; GB0218297D0

Abstract

A method is provided for retrieving an illustration of a segment of text from a database of illustrations. One or more keywords are identified in the segment and, from the keywords, a set of related terms is derived. The related terms are clustered into subsets such that the terms in each subset have a similar meaning and terms in different subsets have different meanings. A query is formed from each subset and comprises at least some of the terms of the subset optionally together with one or more of the associated keywords. The queries are issued to a search engine for the database of illustrations and illustrations retrieved from the database are presented in any appropriate way to a user for selection so as to illustrate the segment of text. The illustration may comprise an image, an icon, a graphic, an animation a text a video clip or sound clip.

Description

239 1 648

METHOD OF AND APPARATUS FOR RF,TRIF,VING AN ILLUSTRATION OF

TEXT The present invention relates to a method of and an apparatus for retrieving an illustration of a segment of text from a database of illustrations. The invention also relates to a computer program for programming a computer to perform such a method, a storage medium containing such a program, and a computer programmed by such a program. Such techniques may be used, for example, to provide assistance in the authoring of multimedia documents. For example, this technique may be used for automatically finding a set of multimedia items which may be suggested to a user in order to illustrate a short segment of text, such as an SMS text message on a mobile terminal or handset, based on the meaning of the text.

Multimedia documents consist of content from more than one medium-for example text and graphics or text and sound. Multimedia documents can be a much more powerful (and fun) way to communicate than using plain text documents.

Consequently, multimedia authoring is becoming increasingly popular, especially given the increasingly better capabilities of computing systems for generating, playing, and transmitting multimedia data. Multimedia messaging services (MMS) on mobile phone networks are being touted as one of the most important upcoming services on next-

generation mobile phone networks. However, authoring multimedia documents is nevertheless difficult. It can take time and skill in browsing/searching a large database of multimedia content (images, icons, graphics, animations, video clips, sound clips, etc.) to find the right multimedia items that will succinctly convey a message.

On a mobile terminal, composing multimedia messages is particularly difficult because of the small screen (which hampers browsing) and the possibly high bandwidth-costs associated with downloading many possibly irrelevant multimedia items. There is a need to help the user by analysing input text in order to find quickly a small set of multimedia items related to the input text that can then be suggested as possible illustrations of the text. However, there is no known method that can always interpret exactly what the user intended by a text or message.

US 6 021 412 discloses a method of illustrating a concept expressed by a portion of text in a document that has the steps of (a) identifying a "concept" in the document that corresponds to at least one graphic image in a database, (b) showing the corresponding graphic images to the user, and (c) allowing the user to select an image to be inserted into the document. The step (a) identifies the concept by using a static (i.e., preset) list of all words used as annotations of images. The list is consulted for each word in the document. If a word is in the static list, it is considered to be a concept, and all images annotated by the word are suggested to the user. This method also includes an extra step of mapping from the word to its synonyms, which are then looked up in the static list. One problem is that, if there is more than one matching image, then there is no way to rank the images in terms of how related they are to a concept. This would become a serious problem for a very large multimedia database where hundreds of multimedia items might be annotated with the same concept word. A second problem is that concept words can be ambiguous. For example, the word "party" can refer to a political party or to a festive occasion. There is no provision to identify the different interpretations of a word. Finally, there is also no provision for handling multiple input words simultaneously. This would help both in the disambiguation problem above (for example, "birthday party" clearly disambiguates "party") and in finding images that relate to more than one word, and thus in ranking the images. This technique can handle only a single word/concept at a time.

US 5 873 107 discloses a text authoring system comprising, among other components, a keyword extractor, an information source (e.g., a collection of documents), and a search engine for querying the information source. The method is to extract a single keyword from the text being authored (based on words of the text and/or based on the information source). Once a keyword is found, it is issued to the search engine as a query. The results of the query are displayed in a separate pane of the user interface.

The system is dynamic in that it runs continuously while one is authonug text However, it cannot handle ambiguous words and can use only a single keyword at a time. US Patent Application 20010049596 is FunMail's core patent for their "Relevancy Engine" and "Rendering Engine". It is a method of turning a plain text (a short message) into an animated sequence. The method is to I) generate a set of concepts from a text string; 2) select various animation components associated with the concepts (including stories, backgrounds, characters, props, music, and so on); and 3) compose

an animation out of the individual components. The method for stage I is to filter out unimportant words (using a list of unimportant words) and to map the remaining words and/or strings of words (called phrases) to concepts using a table (or "library") that maps words/phrases to concepts. Plural words are first converted to their singular forms. The resulting concepts are prioritised using an undisclosed method. The set of concepts is a separate domain from the set of words (i.e., the phrase "hot dog" might map to the concept FOOD). Stage 2 is accomplished by finding animation components labelled with the top priority concept using a simple table lookup procedure.

This technique provides no method for ranking animation components should more than one be associated with a concept. Similarly, a method is provided for mapping from just a single concept to animation components. A method for using multiple concepts at once is not provided. This technique also suffers from the same problem of misinterpretation when a word/phrase can have several different concepts as interpretations. US Patent 5 926 81 I discloses a method of query expansion that uses a purpose-built "statistical thesaurus". Each term has a record in the thesaurus consisting of terms related by co-occurrence in a particular document collection. Pre-existing methods are used for determining strength of co-occurrence. Each document collection has a separate thesaurus. To do query expansion, the system indexes the thesaurus as if it were a standard document collection and then issues queries to the thesaurus consisting

of terms in the user's query. The resulting terms are used to expand the query, which is then submitted to a search engine indexed on the collection itself.

According to a first aspect of the invention, there is provided a method of retrieving an illustration of a segment of text from a database of illustrations, comprising: (a) identifying at least one keyword from the segment; (b) deriving from the at least one keyword a set of related terms; (c) clustering the set of related terms in to at least one subset, the or each of which comprises a plurality of the related terms of similar meaning; (d) forming from the or each subset a query comprising at least one of the terms of the subset; (e) issuing the or each query to a search engine for the database; and (f) presenting at least one retrieved illustration.

The or each illustration may comprise at least one of an image, an icon, a graphic, an animation, a text, a video clip and a sound clip.

Each related term may comprise at least one of a word, a plurality of words, a phrase and an image. The or each word may be in its base form. The or each word may comprise a noun, a verb or an adjective.

The or each query may comprise at least one of the at least one keyword.

The segment of text may comprise at least one text abbreviation and the step (a) may comprise mapping the at least one text abbreviation to a word.

The step (a) may comprise reducing the at least one keyword to its base form.

The step (a) may comprise labelling the at least one keyword with a part of speech tag.

The step (a) may comprise identifying any phrase.

( The step (c) may comprise clustering the set into a plurality of subsets with the terms in different subsets having different meanings.

The related tempts may be related by at least one of co-occurrence, similarity of meaning and word association.

Each related term may be associated with a score representing its relatedness to the at least one keyword. The step (d) may comprise forming the query from the related terms of highest scores of the or each subset. The step (c) may comprise ranking the or each subset in accordance with the scores of the terms thereof.

The step (d) may comprise forming a plurality of queries and the step (f) may comprise merging lists of retrieved illustrations corresponding to the queries.

The step (f) may comprise presenting the at least one retrieved illustration for user selection. The method may comprise the further steps of selecting at least one retrieved illustration and forming a message comprising the segment of text and the at least one retrieved illustration. According to a second aspect of the invention, there is provided a computer program for programming a computer to perform a method according to the first aspect of the invention. I According to a third aspect of the invention, there is provided a storage medium containing a program according to the second aspect of the invention.

According to a fourth aspect of the invention, there is provided a computer programmed by a program according to the second aspect of the invention.

According to a fifth aspect of the invention, there is provided an apparatus for performing a method according to the first aspect of the invention.

The apparatus may comprise a mobile telephone.

The apparatus may comprise a personal digital assistant.

The apparatus may comprise a multimedia search engine.

It is thus possible to provide a technique which automatically links text to suitable illustrations. This technique can provide a wider and more structured selection of multimedia content than known techniques by identifying various possible interpretations of an input message. Account can be taken of several keywords in the input message and these may then be expanded and clustered to identify different possible interpretations of the message. The result of submitting queries to an information retrieval system can be ranked according to relevancy to the different interpretations of the input message. Analysis of the words of the input message to identify their parts of speech may also be performed so as to reduce the possibility of misinterpretation. The invention will be further described, by way of example, with reference to the accompanying drawing, which illustrates a technique for retrieving an illustration of a segment of text constituting an embodiment of the invention.

One embodiment of the invention takes a short segment of text (e.g., 1-30 words) as input. The input could be a message intended for a recipient in a multimedia messaging service on mobile terminals/phones, or it could be a sentence in a multimedia presentation that an author is writing, among others.

( The method involves the following steps: 1. Receic an input message entered by a user.

2. Identify one or more most important keywords in the message. Optionally pre-

analyse the input message by mapping from "tenting" abbreviations to words such as real words, by reducing words to their base forms, by labelling the words with part-of-speech tags, and/or by identifying phrases such as common phrases. 3. For the combined set of keywords, find a scored and ranked set of N terms related to any of the keywords or any combination of the keywords.

4. Cluster the set of N related terms into M subsets such that the subsets contain terms similar (in meaning) to each other, but different in meaning from the terms in the other subsets. This gives several clusters that represent different interpretations' of the keywords. Optionally rank the clusters according to the scores of terms in the clusters.

5. From each cluster, form a query from (at least) its K top terms and optionally the keywords. 6. Issue each query to a search engine that searches various databases of multimedia content.

7. Optionally merge the result lists of the individual searches into one results list.

8. Suggest one or more items of multimedia content from each results list to the user. (The user selects one or more of them for inclusion in the multimedia message or presentation.) For example, if the message were "It's really cold today", the keywords "cold" and "today" might be identified. The following related clustered terms might be identified: "lice, snow}", "{cold, water, pipe}", ""winter, weather, day)", "{shower, morning}".

A query would be formed from each of these clusters and from the keywords "cold" and "today" and issued to the search engine. Each cluster represents a different interpretation of the message and would lead to different illustrations.

Such an embodiment is described in more detail hereinafter. In a step 1, a user enters a short text part of a multimedia message in the form of a text string. The text string can be any relatively short ( 1-30+ words) string of frecDorm text, grammatical or not. It can even include 'texting'-style abbreviations. The text string may be entered on a mobile terminal such as a mobile phone or PDA, (personal digital assistant). As an alternative.

it may be entered into an application on a standard computer.

A step 2 identifies one or more keywords, which are the most important words or terms in the input string. Various techniques for this are well known in the literature, and any suitable technique may be used. For example' one technique is to choose the terms with the greatest frequency in a large corpus of text (after ignoring very common and "meaningless" words such as "the"). Techniques developed for information retrieval for term weighting may also be used. Terrn weighting techniques, such as inverse document frequency (IDF), compute a weight based on the distribution of a term within a corpus of documents. For example, the importance of a term t could be calculated as log(N/dn, where N is the number of documents in the collection and df is the number of documents containing the tenn t. However, in the preferred embodiment, R1DF (residual inverse document frequency) is used. Here the importance of t is log(N/dn -

log (I/(l -em)), where cf if the number of times t is used in the whole collection, as disclosed in Manning & Schutze 'Foundations of Statistical Natural Language Processing", MIT Press (1999; Chap 15).

After assigning an importance score to each term in the input message, one or more of the top scoring terms are selected as keywords.

Optionally, the input string can be pre-analysed before computing importance scores.

The pre-analysis can include normalization or analysis or both using techniques such as (but not limited to) the following: Normalisation: Map words to normal forTns including converting 'texting'-style abbreviations (e.g., "c u 18r" to "see you later") or converting words to their base forms (c.g., "running' to "run", "apples ' to "apple", "quickly" to "quick".).

Analysis: Label the words with part-of-speech tags using morphological analysis, a part-of-speech tagger, or other methods. Any set of labels/tags can be used.

In a preferred embodiment, all of the above techniques are used and morphological analysis is used to identify part-of-speech tags of verbs, nouns, and adjectives.

In a step 3, terms related to the keywords are found and ranked. This step builds a list of N terms that are most related to the keywords. Terms can be words, phrases, images, or any other type of term used in annotating the multimedia collection. In a preferred embodiment, related terms are base-form nouns, verbs, and adjectives.

First, for each keyword, the method generates a ranked and scored list of terms related to it. Various techniques for finding related terms are also well known in the research literature. For example, it is possible to use words that are related by co-occurrence (e.g., words that co-occur often in sentences such as "dog" and "walk") in a large corpus using any of a number of scoring measures. Such measures include mutual information scores and t-scores as described in Manning & Schutze (1999; Chap 5). It is also possible to use a 'statistical thesaurus', which links words by similarity of usage in a large corpus, for example, as disclosed in Lin, Dekang. 1998. Automatic Retrieval and Clustering of Similar Words. In Proceedings of the 36'h Annual Meeting of the Association for Computational Linguistics (COLING-ACL '98), pp. 768-773. Hand-

built lists of words such as synonym dictionaries and word association lists may also be used. In a preferred embodiment, a technique which merges lists generated from any number of other techniques into a single list is used. Such a technique is disclosed in our copending patent application (SLE01022).

Second, the ranked and scored related-terms lists for each of the keywords are joined into a single list by summing the scores for terms that occur in more than one list. Thus,; the more keywords that a term is related to the higher it will be ranked in the combined list. Finally, the top N words by score are extracted from the list and used in the next step.

In a step 4, the related words are clustered by similarity to each other and to the keywords. The purpose of this step is to generate one or more 'interpretations' of the input message, each of which is represented by a cluster of related terms. A clustering technique is used to generate M clusters out of the N related words such that each cluster contains terms similar to each other and dissimilar to terms in other clusters. In order for the clusters to represent coherent 'interpretations' of the input string, the measure of similarity must be similarity of meaning. In a preferred embodiment, similarity of meaning is computed as similarity of usage in a large corpus. For example, in a sufficiently large corpus of English, "cat" is more similar in usage to "dog" than it is to "appointment" by virtue of the other words that each of "dog", "cat", and "appointment" tend to co-occur with. Lin ( 1998) describes a method based on comparing tables of significantly co-occurring words. Other techniques for computing similarity of meaning also exist; for example, as disclosed in Resnik, Philip. 1999. i Semantic Similarity in a taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. In Journal of Artificial Intelligence Research, vol. I 1, pp. 95-130.

Of course, many clustering algorithms are also known in the literature. In principle, any clustering algorithm may be used. In one embodiment, the following algorithm, based on an algorithm given by Lin (1998), is used: Initialize a similarity tree to consist of a root wo, where we is one (or more) of the keywords (of step 2).

For i=1,2,...,N, insert wi as a child of w; such that w; is the most similar term to w; in {w0,w/,...,w,.}, where {w,...,wn} is the set of related terms.

This process will have formed M subtrees of the root we; the sets of terms in the M subtrees each form a cluster.

A final optional step is to rank the clusters. Each cluster can be scored by using the scores of the related terms in the clusters (as computed in step 3). In a preferred; embodiment, the score of a cluster is the average score of the terms in the cluster, but other methods may be considered such as using the maximum score over terms in the cluster.

In order to find multimedia content relevant to each interpretation, a step 5 formulates a query from both the terms in each cluster and the keywords of the message. The form of the query, taking in its structure, operators, and in query terms, is determined by the information retrieval (JR) system. In a preferred embodiment, the IR system uses a probabilistic matcher and a query is formulated as a list of the top K terms in the cluster, by score, and all of the keywords. Optionally, term weights are assigned. Related terms are assigned a weight twice that of the keywords. The terms themselves might have to be converted to a form that the IR engine is expecting. For example, if the terms have an associated part-of-speech tag, the tag is removed if the IR engine does not match based on tags. In other embodiments, a Boolean or ranked Boolean matcher may be used. Examples of other techniques include information retrieval models and query formulation techniques given by Baeza-Yates, Ricardo and Berthier Ribeiro-Neto.

1999. Modern Information Retrieval. Addison Wesley (chaps. 2 and 4.) j In a step 6, each query is issued to a multimedia IR search engine. The IR engine takes each query and returns a ranked list of multimedia content based on matching the query against annotations of multimedia content. The annotations can take any form that is indexable by the IR engine. In a prefe'Ted embodiment, the annotations are freeform text strings and the IR engine uses stemming. For each query, the IR engine returns a scored list of results, where each result is an item of multimedia content and the score is based on how well the annotation of the item matches the query (based on the IR engine matching formula).

In an optional step 7, all of the results lists from the separate searches of the step 6 are merged into a single results list. The results are merged by taking into account the scores assigned by the IR engine in the step 6. Any techniques for merging results can be used and a survey of such techniques is given in Callan, lamie. 2000. Distributed I Information Retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, pp. 127- 150. In a preferred embodiment,; all of the results are merged and low- scoring results and duplicates are removed.

]2 Finally, one or more items of multimedia content from each results list are suggested to the user in a step 8. The number of items suggested from each list and the number of lists presented depend on the output characteristics of the system the user is using. For mobile terminals with very small screens or with high bandwidth-costs, such as mobile phones or PDAs, just one or two items from each list may be presented. If the interpretations are ranked, then items from only the first few results lists may be presented. The lists may even be presented one at a time, allowing the user to move to the next list.

For larger screens and/or lower bandwidth-costs, a more sophisticated user interface I may be provided. For example, a user interface on a mobile phone may first show the interpretations as text options. After the user chooses an option, a first multimedia item for the interpretation is downloaded to the phone and shown to the user. The user then chooses the item to include in the message, moves to a second item or to a different interpretation. On a larger screen, such as a PDA, the interpretations could be shown as text with the top interpretation 'expanded' to show one or more multimedia items. The user can easily select a different interpretation, which would 'close' the first interpretation and 'expand' the newly chosen one.

This system has applications in at least multimedia messaging services (MMS), multimedia authoring, and multimedia search engines. A preferred application is as a component of a multimedia-message composition tool for a mobile terminal (mobile phone, PDA, etc.) for creating multimedia messages to be sent by a multimedia messaging service (MMS). In current short messaging services (SMS), the user can easily and quickly create and send text-only messages. This system can extend this service to MMS by allowing the user to easily, quickly, and cheaply create and send multimedia messages. This system can help because it automatically analyses a short text message and suggests several multimedia items that could appropriately illustrate the message. Also, since related words are used, the clips will not necessarily be pictures that are explicitly described by the message (as in regular image search engines), but the pictures could be metaphorically related or form puns on the input message. This leads to a fun service.

Other fun messaging services involving games are possible. For instance, given a message such as "hot dog", two multimedia items can be suggested, one for each word, rather than suggesting a single multimedia item for both words (in this case a picture of the sun followed by a picture of a poodle could be suggested, rather than a picture of a hot dog). Automatic illustration can be used in a more general game of automatically generating rebuses from a text message.

Another game involves communities of message senders. Automatic illustration can be used by such communities to generate run-on stories. The first sender creates the first sentence of the story and chooses one of the suggested illustrations. The first sender sends the message to a second person. That person writes the second sentence of the story, illustrates it, and sends it on to a third person, and so on. The texts of the message may be shown or not. At the end, the users may browse the whole story.

In multimedia search engines, the text annotations on images are often short and basic.

It can be difficult to know which keywords ought to be used in a query to find the desired image. This system cart help the user, because it expands the query with a wide selection of related words, which in turn finds more related images. The system also helps the user by structuring the resulting images into clusters associated with different possible interpretations of the user's initial query. Thus it should be easier and more quick to browse the results. This facility can be used in plain multimedia search engines, or within multimedia composition applications (such as Microsoft Powerpoint).

Claims

CLAIMS:

1. A method of retrieving an illustration of a segment of text from a database of illustrations, comprising: (a) identifying at least one keyword from the segment; (b) deriving from the at least one keyword a set of related teens; (c) clustering the set of related terms into at least one subset, the or each of which comprises a plurality of the related terms of similar meaning; (d) forming from the or each subset a query comprising at least one of the terms of the subset; (e) issuing the or each query to a search engine for the database; and (f) presenting at least one retrieved illustration.

2. A method as claimed in claim 1, in which the or each illustration comprises at least one of an image, an icon, a graphic, an animation, a text, a video clip and a sound clip.

3. A method as claimed in claim 1 or 2, in which each related term comprises at least one of a word, a plurality of words, a phrase and an image.

4. A method as claimed in claim 3, in which the or each word is in its base form.

5. A method as claimed in claim 3 or 4, in which the or each word comprises a noun, a verb or an adjective.

6. A method as claimed in any one of the preceding claims, in which the or each query comprises at least one of the at least one keywords.

7. A method as claimed in any one of the preceding claims, in which the segment of text comprises at least one text abbreviation and the step (a) comprises mapping the at least one text abbreviation to a word.

8. A method as claimed in any one of the preceding clams, in which the step (a) comprises reducing the at least one keyword to its base form.

9. A method as claimed in any one of the preceding claims, in which the step (a) comprises labelling the at least one keyword with a part of speech tag.

10. A method as claimed in any one of the preceding claims, in which the step (a) comprises identifying any phrase.

11. A method as claimed in any one of the preceding claims, in which the step (c) comprises clustering the set into a plurality of subsets with the terms in different subsets having different meanings.

12. A method as claimed in any one of the preceding claims, in which the related terms are related by at least one of co-occurrence, similarity of meaning and word association.

13. A method as claimed in any one of the preceding claims, in which each related term is associated with a score representing its relatedness to the at least one keyword.

14. A method as claimed in claim 13, in which the step (d) comprises forming the query from the related terms of highest scores of the or each subset.

15. A method as claimed in 13 or 14, in which the step (c) comprises ranking the or each subset in accordance with the scores of the terms thereof.

16. A method as claimed in any one of the preceding claims, in which the step (d) comprises forming a plurality of queries and the step (f) comprises merging lists of retrieved illustrations corresponding to the queries.

17. A method as claimed in any one ofthe preceding claims, in which the step (f) comprises presenting the at least one retrieved illustration for user selection.

(

18. A method as claimed in any one of the preceding claims, comprising the further steps of selecting at least one retrieved illustration and forming a message comprising the segment of text and the at least one retrieved illustration.

19. A computer program for programming a computer to perform a method as claimed in any one of the preceding claims.

20. A storage medium containing a program as claimed in claim 19.

21. A computer programmed by a program as claimed in claim 19.

22. An apparatus for perfonning a method as claimed in any one of claims I to 18.

23. An apparatus as claimed in claim 22, comprising a mobile telephone.

24. An apparatus as claimed in claim 22, comprising a personal digital assistant.

25. An apparatus as claimed in claim 22, comprising a multimedia search engine.