CN108780462B

CN108780462B - System and method for clustering multimedia content elements

Info

Publication number: CN108780462B
Application number: CN201780016956.2A
Authority: CN
Inventors: I·雷切尔高兹; K·奥迪纳耶夫; Y·Y·泽维
Original assignee: Cortica Ltd
Current assignee: Cortica Ltd
Priority date: 2016-03-13
Filing date: 2017-01-31
Publication date: 2022-11-22
Anticipated expiration: 2037-01-31
Also published as: CN108780462A; WO2017160413A1

Abstract

A system and method for clustering multimedia content. The method comprises the following steps: detecting at least one clustering trigger event related to at least one multimedia content element to be clustered; generating at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; determining at least one multimedia content element cluster based on the generated at least one signature, wherein each multimedia content element cluster comprises a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and adding the at least one multimedia content element to each determined cluster.

Description

System and method for clustering multimedia content elements

Cross Reference to Related Applications

This application claims the benefit of pending U.S. provisional application No. 62/307,515, filed on 3/13/2016. The contents of the above-referenced applications are incorporated herein by reference.

Technical Field

The present disclosure relates generally to organizing multimedia content, and more particularly to clustering based on analyzing multimedia content elements.

Background

As the size and content of the internet continues to grow exponentially, the task of finding relevant and appropriate information becomes more and more complex. Organized information can be browsed or searched more quickly than unorganized information. Therefore, efficient content organization allowing subsequent retrieval becomes increasingly important.

Search engines are often used to search information locally or through the world wide web. Many search engines receive queries from users and use such queries to find and return relevant content. The search query may be in the form of, for example, a text query, an image, an audio query, and so forth.

Search engines often face challenges when searching for multimedia content (e.g., images, audio, video, etc.). In particular, existing solutions for searching multimedia content are typically based on metadata of multimedia content elements. Such metadata may be associated with the multimedia content element and may include parameters such as, for example, size, type, name, short description, tags describing articles or topics of the multimedia content element, and so forth. A tag is a non-hierarchical key or term (e.g., a multimedia content element) assigned to data. The names, tags and short descriptions are typically provided manually by, for example, the creator of the multimedia content element (e.g., the user who takes an image using his smartphone), the person who stores the multimedia content element in storage, and so forth.

Tags have gained widespread popularity due in part to the growth of social networks, photo sharing, and web site bookmarks. Some websites allow users to create and manage tags that classify content using simple keywords. Users of such websites manually add and define descriptions for tags. Some of these websites only allow for tagging of specific portions of multimedia content elements (e.g., the portion of the image that shows a person). Thus, the tags assigned to the multimedia content may not fully capture the content shown therein.

Furthermore, because at least some metadata for a multimedia content element is typically provided manually by a user, such metadata may not accurately describe the multimedia content element or aspects thereof. As examples, the metadata may be misspelled, provided by an image that is different than intended, obscured, or otherwise unable to identify one or more aspects of the multimedia content, and so forth. As an example, a user may provide a file name of "weekend fun" for an image of a cat, which may not accurately indicate what is shown in the image (e.g., the cat). Thus, a query for the term "cat" will not return a "weekend fun" image.

In addition, different users may refer to the same topic or topic with different tags, resulting in some multimedia content elements related to a particular topic with one tag and other multimedia content elements related to a topic with another different tag. For example, one user may tag an image of a tree with the term "plant" while another user tags an image of a tree with the term "tree". Thus, a query based on the label "plant" or label "tree" will return results that include only one of the images, although both images are relevant to the query.

It would therefore be advantageous to provide a solution that would overcome the drawbacks of the prior art.

Disclosure of Invention

The following is a summary of several exemplary embodiments of the present disclosure. This summary is provided to facilitate the reader's basic understanding of the embodiments and does not fully define the scope of the disclosure. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodiments nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later. For convenience, the term "some embodiments" may be used herein to refer to a single embodiment or to multiple embodiments of the disclosure.

Some embodiments disclosed herein include a method for clustering multimedia content. The method comprises the following steps: detecting at least one clustering trigger event related to at least one multimedia content element to be clustered; generating at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; determining at least one multimedia content element cluster based on the generated at least one signature, wherein each multimedia content element cluster comprises a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and adding the at least one multimedia content element to each determined cluster.

Some embodiments disclosed herein also include a non-transitory computer-readable medium having instructions stored thereon for causing processing circuitry to perform a process comprising: detecting at least one clustering trigger event related to at least one multimedia content element to be clustered; generating at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; determining at least one multimedia content element cluster based on the generated at least one signature, wherein each multimedia content element cluster comprises a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and adding the at least one multimedia content element to each determined cluster.

Some embodiments disclosed herein also include a system for clustering multimedia content. The system includes processing circuitry; and a memory containing instructions that, when executed by the processing circuit, configure the system to: detecting at least one clustering trigger event related to at least one multimedia content element to be clustered; generating at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; determining at least one multimedia content element cluster based on the generated at least one signature, wherein each multimedia content element cluster comprises a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and adding the at least one multimedia content element to each determined cluster.

Drawings

The subject matter disclosed herein is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The above and other objects, features and advantages of the disclosed embodiments will become apparent from the following detailed description, which, taken in conjunction with the annexed drawings, discloses herein.

FIG. 1 is a network diagram used to describe various disclosed embodiments.

Fig. 2 is a flow chart illustrating a method for clustering multimedia content elements according to one embodiment.

Fig. 3 is a block diagram depicting the basic information flow in a signature generator system.

FIG. 4 is a diagram illustrating the flow of patch generation, response vector generation, and signature generation in a large scale speech to text system.

FIG. 5 is a block diagram illustrating a clustering system according to one embodiment.

Detailed Description

It is important to note that the embodiments disclosed herein are merely examples of the many advantageous uses of the innovative teachings herein. In general, statements made in the specification of the present application do not necessarily limit any of the various claimed embodiments. Moreover, some statements may apply to some inventive features but not to others. In general, unless otherwise indicated, singular elements may be in the plural and vice versa with no loss of generality. In the drawings, like reference numerals designate like parts throughout the several views.

Various disclosed embodiments include methods and systems for clustering multimedia content elements (MMCE). Clustering allows for organizing and searching multimedia content elements based on common concepts. In an exemplary embodiment, multimedia content elements to be clustered are obtained. For each multimedia content element, at least one signature is generated. Based on the signature generated for each multimedia content element, a search tag may be generated. In one embodiment, multiple search tags may be generated for each multimedia content element. Each multimedia content element is added to the multimedia content element cluster based on the generated at least one signature, the generated tag, or both. Each multimedia content element cluster includes a plurality of multimedia content elements having at least one common concept.

In an exemplary embodiment, a common concept between multimedia content elements of a multimedia content element cluster may be a signature set of elements representing unstructured data and metadata describing the concept. Common concepts may represent items or aspects of multimedia content elements such as, but not limited to, objects, people, animals, patterns, colors, backgrounds, roles, sub-aspects (e.g., aspects indicative of sub-textual information such as activities or actions being performed, relationships between illustrated individuals such as team or organization members, etc.), meta-aspects indicative of information about the multimedia content elements themselves (e.g., aspects indicative of images being "selfies" taken by people in the images), text, sounds, voices, actions, combinations thereof, and the like. The multimedia content elements may share a common concept when each multimedia content element is associated with at least one signature, at least a portion of a signature, at least one tag, or a combination thereof, common to all multimedia content elements sharing the common concept.

In one embodiment, the at least one multimedia content element may be further clustered based on metadata associated with the user. The user may be, but is not limited to, a user of a user device having at least one multimedia content element stored therein. In another embodiment, clustering may include searching clusters that include multimedia content elements that share a common concept based on the generated at least one signature. The search may further include comparing the generated at least one signature to signatures of a plurality of clusters of multimedia content elements to determine a matching signature, wherein the at least one multimedia content element may be added to the cluster associated with the matching signature.

Fig. 1 illustrates an exemplary network diagram 100 for describing various embodiments disclosed herein. The exemplary network diagram includes a user device 110, a clustering system 130, a database 150, and a Deep Content Classification (DCC) system 160 communicatively connected via a network 120.

The network 120 is used to communicate between the different components of the network diagram 100. Network 120 may be the internet, the World Wide Web (WWW), a Local Area Network (LAN), a Wide Area Network (WAN), a Metropolitan Area Network (MAN), and other networks that enable communication between the components of network diagram 100.

User device 110 may be, but is not limited to, a Personal Computer (PC), a Personal Digital Assistant (PDA), a mobile phone, a smart phone, a tablet computer, a wearable computing device, a smart television, and other devices configured to store, view, and transmit multimedia content elements.

The user device 110 may have an application program (app) 115 installed thereon. Application 115 may be downloaded from an application repository, such as but not limited to

Google

Or any other repository storing applications. The application 115 may be pre-installed in the user device 110. The applications 115 may be, but are not limited to, mobile applications, virtual applications, network (web) applications, native applications, and the like. In an exemplary implementation, application 115 may be a network (web) browser.

In one embodiment, the clustering system 130 is configured to cluster multimedia content elements. The clustering system 130 generally includes, but is not limited to, processing circuitry connected to a memory containing instructions that, when executed by the processing circuitry, configure the clustering system 130 to perform at least clustering of multimedia content elements as described herein. In one embodiment, the processing circuitry may be implemented as an array of at least partially statistically independent computing cores, with the characteristics of each core set independently of the characteristics of each other core. An exemplary block diagram of the clustering system 130 is further described herein below with reference to fig. 5.

In one embodiment, the clustering system 130 is configured to initiate clustering of multimedia content elements upon detection of at least one cluster triggering event. The at least one clustering trigger event may include, but is not limited to, receiving a request to cluster a multimedia content element or a plurality of multimedia content elements.

To this end, in one embodiment, the clustering system 130 is configured to receive a request from the user device 110 to cluster a multimedia content element or a plurality of multimedia content elements. Clustering each multimedia content element may include generating a cluster based on two or more multimedia content elements, or adding a multimedia content element to an existing cluster. The request may include, but is not limited to, a multimedia content element or elements, an identifier of one or more multimedia content elements, an indicator of a location of one or more multimedia content elements (e.g., an indicator of a location where one or more multimedia content elements are stored in database 150), a combination thereof, and so forth. As non-limiting examples, the request may include the image, an identifier for finding the image, a location of the image in a storage device (e.g., one of data sources 160), or a combination thereof.

Each multimedia content element may include, but is not limited to, an image, a graphic, a video stream, a video clip, an audio stream, an audio clip, a video frame, a photograph, a signal image (e.g., a spectrogram, a phase map, a scale map, etc.), combinations thereof, portions thereof, and so forth. The multimedia content element may be captured, for example, via user device 110.

In an alternative embodiment, the clustering system 130 is further communicatively coupled to a Signature Generator System (SGS) 140. In another embodiment, the clustering system 130 may be configured to send the multimedia content elements to be clustered to the signature generator system 140. The signature generator system 140 is configured to generate signatures based on multimedia content elements and to send the generated signatures to the clustering system 130. In another embodiment, the clustering system 130 may be configured to generate signatures. The generation of the multimedia content element based signature is further described herein below with reference to fig. 3 and 4. In another embodiment, signatures generated for more than one multimedia content element may be clustered.

In an alternative embodiment, the clustering system 130 is further communicatively coupled to a Deep Content Classification (DCC) system 160. The DCC system 160 may be configured to continuously create a knowledge database for multimedia data. To this end, the DCC system 160 may be configured to initially receive a large number of multimedia content elements to create a knowledge database that is condensed to efficiently store, retrieve, and examine matching conceptual structures. As new multimedia content elements are collected by the DCC system 160, they are effectively added to the knowledge base and conceptual structure such that resource requirements are generally sub-linear rather than linear or exponential. The DCC system 160 is configured to extract patterns from each multimedia content element and select important/prominent patterns in order to create a signature thereof. After the process of mutual matching between patterns performed before clustering, the number of signatures in the cluster is reduced to a minimum to maintain matching and enable promotion to new multimedia content elements. Metadata is collected for each multimedia content element to form a conceptual structure along with the reduced clustering.

In another embodiment, the clustering system 130 may be configured to obtain at least one concept structure from the DCC system 160 that matches each multimedia content element to be clustered. In yet another embodiment, the clustering system 130 may be configured to query the DCC system 160 for at least one matching concept structure. A query may be made regarding the signatures of the multimedia content elements to be clustered. In one embodiment, the multimedia content elements associated with the obtained matching concept structure may be utilized to determine the cluster to which the multimedia content elements to be clustered are added.

In an alternative embodiment, the clustering system 130 is configured to generate at least one tag for each multimedia content element based on the signatures of the multimedia content elements to be clustered. Each tag is a textual index term assigned to the content. The generated tags are searchable (e.g.,by user device 110 or other user device) and may be included in the metadata of the multimedia content element. In one embodiment, a tag may be generated based on the obtained metadata for the at least one concept structure. As a non-limiting example, if the obtained metadata of the concept structure includes text

The generated label may then include the text term

In one embodiment, the clustering system 130 is configured to determine at least one multimedia content element cluster for each multimedia content element to be clustered based on the generated signature, the generated tag, or both. Each determined multimedia content element cluster comprises a plurality of multimedia content elements sharing at least one common concept with each other and with the multimedia content element or elements to be clustered. The common concepts of multimedia content elements may be a set of signatures representing elements of unstructured data and metadata describing the concepts. The common concepts may represent items or aspects of multimedia content elements such as, but not limited to, objects, people, animals, patterns, colors, backgrounds, characters, sub-textual aspects, meta-aspects, words, sounds, voices, actions, combinations thereof, and the like. In another embodiment, the multimedia content elements may share a common concept when each multimedia content element is associated with at least one signature, at least a portion of a signature, at least one tag, or a combination thereof, that is common to multimedia content elements that share the common concept.

It should be noted that multiple clusters of multimedia content elements may be determined for each multimedia content element. As a non-limiting example, for an image showing a "selfie" of a person taken on a beach (i.e., an image showing a person taken by a person), clusters of multimedia content elements including multimedia content elements showing the person, the selfie of the person or other persons, and a beach landscape may be determined, and self-portrait images may be clustered into each determined cluster of multimedia content elements.

In another embodiment, determining the multimedia content element cluster may include comparing the generated signature or generated tag to signatures or tags of the plurality of multimedia content element clusters, respectively. Each determined cluster of multimedia content elements may be, for example, a cluster having signatures or tags that match the generated signatures or tags by more than a predetermined threshold. As a non-limiting example, a signature is generated based on a video showing a single comedy performance by a comedy actor Jerry Seinfeld, and a tag including "Jerry Seinfeld" and "single comedy" is generated based on the generated signature. In yet another embodiment, the determined clusters of multimedia content elements may include one cluster per tag.

In yet another embodiment, one or more of the multimedia content element clusters may be included in or associated with the concept structure such that the comparing may include comparing the generated signature or generated tag to a reduced set of signatures or tags, respectively, of the concept structure. In another embodiment, the multimedia content elements to be clustered may be added to the conceptual structure having the matching multimedia content element clusters.

In another embodiment, if an existing multimedia content element cluster having a common concept with the multimedia content element cannot be found (e.g., if no signature or tag matches the generated signature or tag by more than a predetermined threshold), the clustering system 130 may be configured to generate a multimedia content element cluster that includes the multimedia content elements to be clustered. Generating the multimedia content element cluster may include, but is not limited to, searching among one or more data sources (e.g., user device 110, database 150, or other data sources not shown that may be accessible, for example, over the internet) to identify multimedia content elements that share a common concept with the multimedia content elements. The search may be based on the generated signature, the generated tag, or both. The identified multimedia content elements are clustered with the multimedia content elements to be clustered, and the resulting clusters may be stored, for example, in the database 150. In another embodiment, the generated clusters may also include generated labels.

It should be noted that using signatures for tagging, clustering, or both multimedia content elements ensures more accurate clustering of multimedia content, e.g., than when manually provided metadata (e.g., user-provided tags) is used. For example, in order to cluster images of sports cars into appropriate clusters, it may be desirable to locate a particular style of car. However, in most cases, the style of the car will not be part of the metadata associated with the multimedia content (image). Further, the angle of the car shown in the image may be different from the angle of a particular photograph of the car that may be used as a search term. According to the disclosed embodiments, the signature generated for the image will be able to accurately identify the style of the car, as the signature generated for the multimedia content element allows for identification and classification of the multimedia content element, such as content tracking, video filtering, multimedia classification generation, video fingerprint recognition, speech-to-text conversion, audio classification, element identification, video/image search, and any other application that requires content-based signature generation and matching of large content capacities such as the web and other large databases.

The database 150 stores multimedia content elements, clusters of multimedia content elements, or both. In the exemplary network diagram 100 shown in FIG. 1, the clustering system 130 communicates with a database 150 via a network 120. In other non-limiting configurations, the clustering system 130 may be directly connected to the database 150. The database 150 may be accessible by, for example, the user device 110, other user devices (not shown), or both, allowing the user devices to retrieve clusters from the database 150.

It should also be noted that the signature generator system 140 and DCC system 160 are shown in fig. 1 as being directly connected to the clustering system 130 for simplicity only and are not limiting on the disclosed embodiments. The signature generator system 140, the DCC system 160, or both may be included in the clustering system 130 or communicatively coupled to the clustering system 130 via, for example, the network 120, without departing from the scope of this disclosure.

It should also be noted that clustering is described as being performed by the clustering system 130 for purposes of simplicity only and is not limiting of the disclosed embodiments. Clustering may be performed equally locally by, for example, user device 110, without departing from the scope of this disclosure. In this case, user device 110 may include clustering system 130, signature generator system 140, DCC system 160, or any combination thereof, or may be otherwise configured to perform any or all of the processes performed by such systems. Further, local clustering by user device 110 may be based on multimedia content clusters stored locally on user device 110.

As a non-limiting example of local clustering by the user device 110, clustering may be based on clustering of images in a photo library stored on the user device 110, such that new images may be clustered in real-time and thus subsequently searched by the user of the user device 110. Thus, when, for example, a user of user device 110 takes an image of his dog named "Lucky", user device 110 may cluster the image with other images of dog Lucky stored in user device 110, such that when the user uses the query "Lucky" to search for an image in user device 110, the taken image will be returned along with the images of the other clusters of dog Lucky.

Fig. 2 is an exemplary flowchart 200 illustrating a method for clustering multimedia content elements according to one embodiment. In another embodiment, the method may be performed in response to a request to cluster one or more multimedia content elements.

At S205, a cluster trigger event is detected. The clustering trigger event may be or may include, but is not limited to, receiving a request to cluster at least one multimedia content element.

At S210, at least one multimedia content element to be clustered is obtained. In one embodiment, the at least one multimedia content element may be obtained based on a request to cluster the at least one multimedia content element. The request may include the at least one multimedia content element to be clustered, an identifier of one or more of the at least one multimedia content element, an indicator of a location of one or more of the at least one multimedia content element, or a combination thereof.

At S220, at least one signature is generated for each multimedia content element. Each generated signature may be robust to noise and distortion. In one embodiment, the signature is generated by a signature generator system, as described further herein below with reference to fig. 3 and 4. In another embodiment, S220 may include sending the multimedia content elements to a signature generator system (e.g., signature generator system 140, fig. 1) and receiving at least one signature generated for each multimedia content element from the signature generator system.

At optional S230, at least one tag is generated for at least one multimedia content element based on the generated at least one signature. As described further above, each tag is a text index term assigned to a multimedia content element. As non-limiting examples of labels, the label "i" may be assigned to an image of the user's face, the label "my dog" may be assigned to an image of a dog, and the labels "my dog and i" may be assigned to images that feature both the user and the dog.

In one embodiment, S230 may include comparing the generated at least one signature with signatures of a plurality of multimedia content elements having assigned predetermined tags. In another embodiment, a tag of the multimedia content element having a signature matching one or more of the generated at least one signature may be generated as a tag for the multimedia content element.

In another embodiment, at least one tag may be generated based on metadata of the concept structure that matches at least one multimedia content element to be clustered. To this end, in another embodiment, S230 may further include obtaining at least one conceptual structure matching each multimedia content element to be clustered from a DCC system (e.g., DCC system 160, fig. 1). In yet another embodiment, S230 may further include querying the DCC system for a signature for each multimedia content element to be clustered.

At S240, at least one multimedia content element cluster is determined. Each determined multimedia content element cluster includes a plurality of multimedia content elements that share a common concept. Each of the at least one multimedia content element also shares a common concept of a multimedia content element cluster. The common concepts of multimedia content elements may be a set of signatures representing elements of unstructured data and metadata describing the concepts. Common concepts may represent items or aspects in multimedia content elements such as, but not limited to, objects, people, animals, patterns, colors, backgrounds, characters, sub-textual aspects, meta-aspects, words, sounds, voices, actions, combinations thereof, and the like. When each multimedia content element is associated with at least one signature, at least a portion of a signature, at least one tag, or a combination thereof that is common to all multimedia content elements that share a common concept, the multimedia content elements may share the common concept.

As a non-limiting example, the common concept may represent: such as a labrador retriever shown in an image or video, the sound of an actor Daniel Radcliffe that can be heard in audio or video, an action shown in video including a baseball bat swinging, a submarine phrase to play a chess, an indication that the image is a "selfie," and so forth.

The common concepts may be further based on a level of granularity. For example, the common concepts may relate to cats in general, such that any cat displayed or heard in the multimedia content element is considered a common concept, or may relate to a particular cat, such that only a visual or audio representation of that cat is considered a common concept. Such granularity may depend on, for example, the threshold used to match the signature, the tag, or both, such that a higher threshold produces a finer granularity result.

In another embodiment, the determined at least one multimedia content element may comprise only multimedia content elements of the same type as the obtained multimedia content element. For example, if the obtained multimedia content element is an image, only other images having a common concept may be determined. In yet another embodiment, different types of multimedia content elements may be determined. The determination of which types of multimedia content elements may be based on, for example, one or more rules.

As a non-limiting example of a common concept, for an image showing a person wearing a parachute and having a sky in the background, the label of the image may be "parachuting". The common concept may be the sub-aspect "parachuting," which indicates the activity being performed by the person shown in the image. Other multimedia content elements that show or otherwise demonstrate a person's parachuting may also be associated with the label "parachuting," and thus the sub-textual aspect "parachuting" is a common concept for these multimedia content elements.

As another non-limiting example of a common concept, for an audio clip in which the user narrated the information that the user wished to refer to later, a portion of the signature generated for the audio clip may be related to the meta-aspect "self-annotation". In particular, a portion of the signature may be generated based on the words "self-annotation" spoken at the beginning of the audio clip. Other multimedia content elements may also have a signature portion (e.g., other content exhibiting the word "self-annotation" or similar phrases) that is related to the concept "self-annotation," and thus the meta aspect "self-annotation" is a common concept of multimedia content elements. In another example, only multimedia content elements that are relevant to a particular user that are heard in the obtained multimedia content elements (i.e., multimedia content elements that are characterized by the voice of the user recording the obtained multimedia content elements) may be determined to have a common concept with the obtained multimedia content elements such that the clustering includes only self-annotations made by the same user.

In one embodiment, if an existing multimedia content element cluster having a common concept with the multimedia content element cannot be found (e.g., if no multimedia content element cluster is associated with a signature or tag that matches the generated at least one signature or the generated at least one tag by more than a predetermined threshold), S240 may include generating a new multimedia content element cluster. In another embodiment, generating a new cluster of multimedia content elements may include searching in one or more data sources to identify multimedia content elements that share a common concept with the obtained multimedia content elements. The identified multimedia content elements may be clustered with the obtained multimedia content elements.

At S250, at least one multimedia content element is added to the determined or generated new multimedia content element cluster. In one embodiment, S250 may further include storing the at least one multimedia content element cluster with the added at least one multimedia content element in a storage (e.g., database 150 of fig. 1, a data source such as a web server, etc.). As a non-limiting example, the cluster may be stored in a server of the social media platform, enabling other users to find the cluster during a search. Each cluster may be stored separately such that different groupings of multimedia content elements are stored in the locations. For example, different clusters of multimedia content elements may be stored in different folders.

At S260, it is determined whether further multimedia content elements are to be clustered, and if so, execution continues to S205; otherwise, execution terminates.

Clustering of multimedia content elements allows for organizing multimedia content elements based on topics represented by various concepts. Such organization may be used, for example, to organize photographs taken by a user of a smartphone based on a common theme. As a non-limiting example, images showing dogs, football games and food may be organized into different sets and stored, for example, in different folders on a smartphone. Such organization may be particularly useful for social media or other content sharing applications, as the shared multimedia content may be organized and shared with respect to the content. Further, such organization may be useful for subsequent retrieval, particularly when the organization is tag-based. As described above, using signatures to classify multimedia content elements generally results in more accurate identification of multimedia content elements that share similar content.

It should be noted that the embodiment described above with respect to fig. 2 is discussed as including clustered multimedia content elements for purposes of simplicity only, and is not limiting of the present disclosure. Multiple multimedia content elements may be clustered in parallel without departing from the scope of the present disclosure. Further, the clustering methods discussed above may be performed by the clustering system 130 or locally by a user device (e.g., user device 110, fig. 1).

Fig. 3 and 4 illustrate the generation of a signature of a multimedia content element by the signature generator system 140 according to one embodiment. An exemplary high-level description of a process for large-scale matching is depicted in fig. 3. In this example, the matching is for video content.

The video content segments 2 from the master Database (DB) 6 and the target DB 1 are processed in parallel by a number of independent computational cores 3, the computational cores 3 constituting an architecture for generating signatures (hereinafter referred to as "architecture"). Additional details regarding computing core generation are provided below. The independent core 3 generates a database of robust signatures and signatures 4 for the target content segments 5 and a database of robust signatures and signatures 7 for the main content segments 8. An exemplary and non-limiting process for signature generation by an audio component is shown in detail in FIG. 4. Finally, the target robust signature and/or signatures are effectively matched to the master robust signature and/or signature database by the matching algorithm 9 to find all matches between the two databases.

To illustrate an example of a signature generation process, for simplicity only and not to limit the generality of the disclosed embodiments, assume that the signature is based on a single frame, resulting in some simplification of the computation core generation. The matching system is extensible for signature generation, capturing dynamics between frames.

The generation process of the signature is now described with reference to fig. 4. The first step in the process of generating a signature from a given speech segment is to decompose the speech segment into K patches 14 of random length P and random locations within the speech segment 12. The decomposition is performed by the patch generator component 21. The values of the patch number K, the random length P, and the random location parameter are determined based on optimization, taking into account a tradeoff between accuracy and the number of fast matches required in the flow of the context server 130 and the SGS 140. Thereafter, all K patches are injected in parallel into all the computation cores 3 to generate K response vectors 22, which are fed into a signature generator system 23 to produce a database of robust signatures and signatures 4.

In order to generate a robust signature, i.e. a signature that is robust to additive noise L (where L is an integer equal to or greater than 1), by computing the cores 3, a frame "i" is injected into all the cores 3. Subsequently, kernel 3 generates two binary response vectors:

is a signature vector, and

is a robust signature vector.

To generate a signature that is robust to additive noise such as white gaussian noise, scratches, etc., but not robust to distortion such as clipping, shifting, and rotation, the kernel Ci = { ni } (1 ≦ i ≦ L) may include a single leaky integrate-to-threshold unit (LTU) node or more. The equation for node ni is:

n _i ＝θ(Vi-Th _x )

wherein θ is a Heavisside step function; w is a _ij Is a Coupled Node Unit (CNU) between node i and image component j (e.g., the gray value of a certain pixel j); kj is the image component 'j' (e.g., the gray value of a certain pixel j); th _X Is a constant threshold, where 'x' is 'S' for signatures and 'RS' for robust signatures; and Vi is the coupling node value.

Threshold Th for signature generation and robust signature generation _X Are different. For example, for some distribution of Vi values (for a set of nodes), after optimization, the threshold (Th) of the signature is set according to at least one or more of the following criteria _S ) And threshold (Th) for robust signatures _RS ) Separately setting:

1: for: v _i ＞Th _RS

1-p(V＞Th _S )-1-(1-ε) ^l ＜＜1

That is, assuming that l nodes (kernels) constitute a robust signature for some image I, then the probability that not all of these l nodes will belong to the same, but noisy, signature for the image I

Low enough (according to the accuracy specified by the system).

2：p(V _i ＞Th _RS )≈l/L

That is, according to the above definition, about L of the total L nodes can be found to generate a robust signature.

3: robust signatures and signatures are generated for a particular frame i.

It should be appreciated that the generation of the signature is unidirectional and typically results in lossless compression, where the properties of the compressed data are maintained but the uncompressed data cannot be reconstructed. Thus, a signature may be used for comparison purposes with another signature without comparison with the original data. A detailed description of signature generation may be found in U.S. patent nos. 8,326,775 and 8,312,031, assigned to common assignee, which are incorporated herein by reference for all of the useful information contained therein.

Computational core generation is the process of definition, selection, and adjustment of parameters of a core for a particular implementation in a particular system and application. The process is based on several design considerations, such as:

(a) The kernels should be designed to obtain the maximum independence, i.e. the projections from the signal space should yield the maximum pair-wise distance between the projections of any two kernels into the high-dimensional space.

(b) The kernel should be optimally designed for the signal type, i.e. the kernel should be most sensitive to the spatio-temporal structure of the injected signal, e.g. and in particular to local correlations in time and space. Thus, in some cases, a kernel represents a dynamic system such as in state space, phase space, chaotic edges, etc., that is used exclusively herein to take advantage of their maximum computational power.

(c) The kernel should be optimally designed with respect to invariance to a set of signal distortions of interest in the relevant application.

A detailed description of computational core generation and processes for configuring such cores is discussed in more detail in the above-referenced U.S. patent No. 8,655,801.

FIG. 5 is an exemplary block diagram illustrating a clustering system 130 implemented according to one embodiment. The clustering system 130 includes a processing circuit 510 coupled to a memory 520, a storage 530, and a network interface 540. In one embodiment, the components of the clustering system 130 may be communicatively connected via a bus 550.

The processing circuit 510 may be implemented as one or more hardware logic components and circuits. By way of example, and not limitation, illustrative types of hardware logic that may be used include Field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), general purpose microprocessors, microcontrollers, digital Signal Processors (DSPs), etc., or any other hardware logic that may perform calculations or other operations on information. In one embodiment, processing circuit 510 may be implemented as an array of at least partially statistically independent compute cores. As described further above, the characteristics of each compute core are set independently of those of each other core.

Memory 520 may be volatile (e.g., RAM, etc.), non-volatile (e.g., ROM, flash memory, etc.), or some combination thereof. In one configuration, computer readable instructions to implement one or more embodiments disclosed herein may be stored in storage 530.

In another embodiment, the memory 520 is configured to store software. Software should be construed broadly to mean any type of instructions, whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. The instructions may include code (e.g., in source code format, binary code format, executable code format, or any other suitable code format). The instructions, when executed by the processing circuit 510, cause the processing circuit 510 to perform the various processes described herein. In particular, the instructions, when executed, cause the processing circuit 510 to perform clustering of multimedia content elements as described herein.

Storage 530 may be magnetic storage, optical storage, etc., and may be implemented, for example, as flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVDs), or any other medium that may be used to store the desired information.

The network interface 540 allows the clustering system 130 to communicate with the signature generator system 140 for purposes such as sending multimedia content elements, receiving signatures, and the like. Further, the network interface 540 allows the clustering system 130 to communicate with the user equipment 110 in order to obtain multimedia content elements to be clustered.

It should be understood that the embodiments described herein are not limited to the particular architecture shown in fig. 5, and that other architectures may be equally employed without departing from the scope of the disclosed embodiments. In particular, the clustering system 130 may also include a signature generator system configured to generate signatures as described herein without departing from the scope of the disclosed embodiments.

It should be understood that any reference to an element herein using a name such as "first," "second," etc., does not generally limit the number or order of such elements. Rather, these names are used herein generally as a convenient way to distinguish two or more elements or instances of an element. Thus, reference to a first element and a second element does not imply that only two elements may be employed therein or that the first element must somehow precede the second element. In addition, unless otherwise specified, a set of elements includes one or more elements.

As used herein, the phrase "at least one" followed by a list of items means that any listed item can be used alone or any combination of two or more of the listed items can be utilized. For example, if a step in a method is described as including "at least one of a, B, and C," then the step may include only a; only B is included; only C is included; a combination of A and B; a combination of B and C; a combination of A and C; or a combination of A, B and C.

The various embodiments disclosed herein may be implemented as hardware, firmware, software, or any combination thereof. Further, the software is preferably implemented as an application program tangibly embodied on a program storage unit or computer-readable medium consisting of portions of certain devices and/or combinations of devices. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units ("CPU"), memory, and input/output interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU, whether or not such computer or processor is explicitly shown. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit. Furthermore, a non-transitory computer-readable medium is any computer-readable medium except a transitory propagating signal.

All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the disclosed embodiments and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosed embodiments, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

Claims

1. A method for clustering multimedia content, comprising:

detecting at least one clustering trigger event related to at least one multimedia content element to be clustered;

generating at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element;

determining at least one multimedia content element cluster based on the generated at least one signature, wherein each multimedia content element cluster comprises a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; wherein the at least one common concept refers to at least one of: (a) An aspect indicating sub-textual information, and (b) a meta-aspect indicating information about the multimedia content elements of the cluster themselves; and

adding the at least one multimedia content element to each determined cluster.

2. The method of claim 1, further comprising:

generating at least one tag for the multimedia content element based on the generated at least one signature, wherein the at least one multimedia content element cluster is determined further based on the generated at least one tag.

3. The method of claim 2, wherein each multimedia content element cluster comprises a plurality of multimedia content elements associated with the generated at least one tag.

4. The method of claim 2, wherein generating the at least one label further comprises:

querying a deep content classification system with respect to the generated at least one signature to obtain at least one concept structure matching the at least one multimedia content element, each concept structure comprising signature reduced clusters and metadata, wherein the at least one tag is generated based on the metadata of the obtained at least one concept structure.

5. The method of claim 1, wherein each determined cluster is associated with at least a portion of a signature that is common to multimedia content elements of the cluster and the at least one multimedia content element.

6. The method of claim 1, further comprising:

determining whether an existing multimedia content element cluster can be found that shares a common concept with the multimedia content element based on the generated at least one signature; and

generating multimedia content element clusters when it is determined that an existing multimedia content element cluster sharing a common concept with the multimedia content elements cannot be found, wherein the determined at least one multimedia content element cluster is the generated multimedia content element cluster.

7. The method of claim 1, wherein the at least one signature is generated via a signature generator system, wherein the signature generator system comprises a plurality of at least partially statistically independent computing cores, wherein a characteristic of each computing core is set independently of a characteristic of each other computing core.

8. The method of claim 1, wherein the detected at least one cluster trigger event comprises: receiving a request to cluster the at least one multimedia content element, wherein the request comprises at least one of: the at least one multimedia content element, at least one identifier of the at least one multimedia content element, and at least one location of the at least one multimedia content element.

9. The method of claim 1, further comprising:

storing the at least one cluster including the added at least one multimedia content element in a data storage, wherein each cluster is stored in a separate location of the data storage.

10. A non-transitory computer-readable medium having instructions stored thereon for causing processing circuitry to perform a process, the process comprising:

adding the at least one multimedia content element to each determined cluster.

11. A system for clustering multimedia content, comprising:

a processing circuit; and

a memory containing instructions that, when executed by the processing circuit, configure the system to:

determining at least one multimedia content element cluster based on the generated at least one signature, wherein each multimedia content element cluster comprises a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and

adding the at least one multimedia content element to each determined cluster,

wherein the system is further configured to:

generating at least one tag for the multimedia content element based on the generated at least one signature, wherein the at least one multimedia content element cluster is determined further based on the generated at least one tag;

12. The system of claim 11, wherein each multimedia content element cluster comprises a plurality of multimedia content elements associated with the generated at least one tag.

13. The system of claim 11, wherein each determined cluster is associated with at least a portion of a signature that is common to multimedia content elements of the cluster and the at least one multimedia content element.

14. The system of claim 11, wherein the system is further configured to:

15. The system of claim 11, wherein the at least one signature is generated via a signature generator system, wherein the signature generator system comprises a plurality of at least partially statistically independent computing cores, wherein a characteristic of each computing core is set independently of a characteristic of each other computing core.

16. The system of claim 11, wherein the detected at least one cluster trigger event comprises: receiving a request to cluster the at least one multimedia content element, wherein the request comprises at least one of: the at least one multimedia content element, at least one identifier of the at least one multimedia content element, and at least one location of the at least one multimedia content element.

17. The system of claim 11, wherein the system is further configured to:

18. The system of claim 11, wherein the at least one common concept is an aspect that indicates sub-textual information.

19. The system of claim 11, wherein the common concept is different from a text label.