KR101247401B1

KR101247401B1 - Method and apparatus for hierarchical organization of embro data for supporting efficient search

Info

Publication number: KR101247401B1
Application number: KR1020110026445A
Authority: KR
Inventors: 김상욱; 원정임; 오현교; 장민희
Original assignee: 한양대학교 산학협력단
Priority date: 2011-03-24
Filing date: 2011-03-24
Publication date: 2013-03-25
Also published as: KR20120108504A

Abstract

Methods and apparatus are provided for organizing data. Similarities between all pairs of objects consisting of one or more objects are measured, and a similarity graph is generated based on the measured similarities. The hierarchical structure is formed by performing clustering on the generated similarity graph considering the size of the cluster and the number of objects included in the cluster. The object may be embryo image data representing an embryo.

Description

METHOD AND APPARATUS FOR HIERARCHICAL ORGANIZATION OF EMBRO DATA FOR SUPPORTING EFFICIENT SEARCH}

A method and apparatus for hierarchically organizing data.

A method and apparatus are disclosed for providing efficient retrieval by hierarchically organizing embryo data.

Embryos are the first stages in the development of multicellular organisms such as animals or plants.

The basic system of the future body of an animal or plant is determined at the time of embryo. Thus, embryos are an important subject in the study of the mechanism of development. The study of these embryos is called developmental biology.

In general, biologists in developmental biology have a large embryonic image database for embryo research. Biologists want to efficiently search for embryonic images of their research in this large database. However, searching for the desired image efficiently in a large image database is not easy.

There are two general ways to find the image you want from a large image database. The first method is a search method using a query, and the second method is a search method using a browsing method.

The search method using a query is a method that can be used when a user specifically has metadata (such as a text tag) of the image to be searched or color characteristics of the image. The user can search for similar images from the database by using various features of the image to find as a query.

On the other hand, a search method using a browsing method may be used when the user knows vaguely about an image to be searched for. This method is used when the user looks through the images in the image database and wants to find the desired image.

When biologists want to find the embryo image to be studied, it is not easy to describe the text tag of the embryo image or the color characteristics of the image. In other words, it is very difficult for a user to search a database by using a search method to find an embryo image to be studied.

Thus, the user must look directly at all the embryo image data stored in the database using a browsing scheme to retrieve the desired embryo image.

However, since a large number of embryo images are stored in the database, it is practically difficult for a user to view all embryo images in the database.

To solve this problem, it is very important to structure the database. By structuring the database, the user can limit the scope of the search to bundle only embryo images that are similar to what he or she wants, and focus on the color features of the bundled similar images.

The primary method used for database structuring is hierarchical clustering.

Hierarchical clustering is a method of calculating the similarity between objects to classify the objects into clusters having similar characteristics, and expressing the divided clusters in a tree structure form.

When hierarchical clustering is used as a database structuring method, users can intuitively grasp the entire database structure. In addition, when a user traverses the tree to find a desired object by using the browsing function, the user can concentrate on searching only the desired objects.

Most existing hierarchical clustering methods use agglomerative hierarchical clustering.

When the hierarchical hierarchical clustering method is used, each object forms a separate cluster in the beginning stage, and two clusters are merged according to a predetermined criterion in each successive stage. This merging process proceeds until all objects form a cluster or some termination condition is satisfied.

The hierarchical clustering of this bottom-up strategy can gradually grow only certain clusters because the objects are phased together according to some criteria.

Thus, a tree consisting of clusters as a result of a merged hierarchical clustering method is very likely a skewed tree. If a tilt tree has been created, it may take a considerable amount of time when the user traverses this tilt tree to search for the desired image.

One embodiment of the present invention can provide an apparatus and method for organizing data hierarchically.

One embodiment of the present invention may provide an apparatus and method for supporting efficient retrieval by hierarchically organizing embryo data.

According to an aspect of the present invention, an operation of measuring similarity between all pairs of objects of one or more objects, generating a similarity graph based on the measured similarities, and clustering of repeatedly applying a graph segmentation algorithm to the similarity graph And forming a hierarchical structure by performing the method, wherein the clustering is performed in consideration of at least one of a diameter of a cluster and a number of objects included in the cluster, wherein the size of the cluster is equal to all objects in the cluster. A data organization method is provided that means the minimum value among the distances between pairs.

The object may be data of an image.

The image may be an embryo image.

Measuring similarities between all object pairs of the at least one object may include extracting an RGB vector from the image and measuring similarity between all image pairs of at least one image using the RGB vector. .

Extracting an RGB vector from the image may include extracting a color histogram on an RGB space of the image and extracting an RGB vector of the image based on the color histogram.

The similarity graph may be a k-nearest neighbors (NN) graph composed of k neighbors.

The k-NN graph can be expressed by performing k-neighbor queries based on the similarity and connecting only k objects retrieved to nodes.

Node V of the k-NN graph may correspond to n images, and the edges (i, j) between node i and node j are similarities between the image corresponding to node i and the image corresponding to node j. It may have w _ij as a weight.

The graph segmentation algorithm may bi-partition the similarity graph to minimize edge cutting.

Forming a hierarchical structure by repeatedly applying a graph segmentation algorithm to the similarity graph may include: dividing the similarity graph into a first predetermined number of clusters by the graph segmentation algorithm and the first pattern; And subdividing clusters having a larger number of objects than a threshold from among a predetermined number of clusters into a second predetermined number of clusters by using the graph partitioning algorithm.

Comprising a first predetermined number of clusters by using a graph partitioning algorithm for the similarity graph may include generating clusters by dividing the similarity graph and a minimum similarity value among the clusters generated by the dividing. Splitting the most clusters may include creating clusters.

The data organization method may further include selecting a representative object for each of the root node and the non-terminal nodes of the hierarchy.

The representative object may be an object having a largest average similarity among objects in a cluster corresponding to the root node or the non-terminal node.

The data organization method may further include providing a top-down search of the hierarchy by displaying representative objects of nodes in the hierarchy.

According to another aspect of the present invention, an operation of measuring similarities between all image pairs of at least one embryo image, generating a similarity graph based on the measured similarities, and repeatedly applying a graph segmentation algorithm to the similarity graph Forming a hierarchical structure by performing clustering, wherein the clustering is performed in consideration of at least one of a diameter of a cluster and a number of objects included in the cluster, and the size of the cluster A method for forming a hierarchy of embryo images is provided, which means a minimum value among the distances between pairs of objects.

The method for forming a hierarchical structure of the embryo image may include selecting a representative object for each of the root and non-terminal nodes of the hierarchy and providing a top-down search of the hierarchy by displaying representative objects of the nodes in the hierarchy. The method may further include an operation, wherein the representative object may be an embryo image having the largest average similarity among embryo images in a cluster corresponding to the root node or the non-terminal node.

According to another aspect of the present invention, a similarity measurer for measuring similarities between all pairs of objects of one or more objects, a similarity graph generator for generating a similarity graph based on the measured similarities and a graph for the similarity graph And a hierarchical structure forming unit for forming a hierarchical structure by repeatedly applying a partitioning algorithm, wherein the clustering is performed in consideration of at least one of a diameter of a cluster and a number of objects included in the cluster, A data organization apparatus is provided in which the size of a cluster means a minimum value among distances between all pairs of objects in the cluster.

The object may be data of an image.

The similarity measurer may extract an RGB vector from the image, and measure the similarity between all image pairs of one or more images using the RGB vector.

The hierarchical structure forming unit may divide the similarity graph into a first predetermined number of clusters by the graph partitioning algorithm, and among the first predetermined number of clusters, clusters in which the number of objects is greater than a threshold value. By using the graph partitioning algorithm it is possible to subdivide into a second predetermined number of clusters.

The hierarchical structure forming unit may generate clusters by dividing the similarity graph, and generate clusters by dividing the cluster with the least similarity value among the clusters generated by the dividing.

The apparatus for organizing data may further include a representative object selecting unit for selecting a representative object for each of the root node and the non-terminal nodes of the hierarchy, wherein the representative object corresponds to the root node or the non-terminal node. The object in the cluster may have the largest average similarity.

The data organization apparatus may further include a visualization unit configured to provide a top-down search of the hierarchy by displaying a representative object of the node of the hierarchy.

Apparatus and methods for organizing data hierarchically are provided.

An apparatus and method are provided that support efficient retrieval by hierarchically organizing embryo data.

1 illustrates a method for structuring a large amount of embryo image data in a tree form near a balanced state without skewing according to an embodiment of the present invention.
2 illustrates what to consider in structuring a database using hierarchical clustering according to an embodiment of the present invention.
3 illustrates a hierarchical structure generated by a hierarchical clustering scheme according to an embodiment of the present invention.
4 is a flowchart of a data organization method according to an embodiment of the present invention.
5 illustrates as an algorithm a data organization method according to an embodiment of the present invention.
6 illustrates representative objects of clusters belonging to a non-terminal node viewed through a visualization tool according to an embodiment of the present invention.
7 illustrates one of clusters existing in the terminal node through the visualization tool according to an embodiment of the present invention.
8 is a structural diagram of a data organization apparatus according to an embodiment of the present invention.

Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. Like reference symbols in the drawings denote like elements.

1 illustrates a method for structuring a large amount of embryo image data in a tree form near a balanced state without skewing according to an embodiment of the present invention.

Graph-based hierarchical clustering for database structuring may consist of the following first to fourth steps.

First step 110: generating a similarity graph.

Second Step 120: Constructing Clusters Using a Graph Segmentation Algorithm.

Third Step 130: For each of the configured clusters, selecting a representative object capable of representing the cluster.

Fourth Step 140: Forming a tree structure by repeating the second and third steps.

In a first step 110, a similarity graph consisting of k neighbors is generated by using an RGB vector extracted from an embryo image stored in a database.

In a second step 120, clusters are created by using a graph segmentation algorithm.

At this time, by considering the size of the cluster and the number of objects included in the cluster at the same time, it is possible to prevent the size of the specific cluster from becoming too large or the number of objects in the specific cluster from becoming too large.

In the third step, representative objects of each cluster are selected to minimize the number of embryo images approached to retrieve the desired embryo image.

The representative object refers to an embryo image that can best reflect the characteristics of the cluster among all the embryo images in the cluster.

In the fourth step, the database is structured in a tree structure by repeating steps 2 and 3.

2 illustrates what to consider in structuring a database using hierarchical clustering according to an embodiment of the present invention.

Cluster size (210)

In general, the size of a cluster means the maximum value among the distances between all pairs of objects in the cluster.

One embodiment of the present invention may use similarity as a measure for clustering embryo image data. Similarity can be seen as inversely proportional to distance.

Therefore, in an embodiment of the present invention, the size of a cluster may mean a minimum value among similarities among all pairs of objects in the cluster.

In the process of structuring the database, if the cluster is too large, dissimilar embryo image data may be collected within the cluster. Therefore, a problem may arise that even a candidate image far from an image desired by the user should be examined. In this case, it may be meaningless to select a representative object in the cluster.

On the other hand, if the cluster size is too small, similar embryo images may be distributed and stored in multiple clusters. Accordingly, a problem may arise in that a plurality of candidate clusters are accessed and examined to search for an embryo image desired by a user.

Number of data contained within the cluster (220)

Even if the size of the cluster is small, if the number of data (or objects) contained in the cluster is too large, the user must look at too many candidate image data in order to retrieve the desired embryo image.

On the other hand, if the number of data contained in the cluster is too small, there is a high possibility that the embryo image desired by the user does not exist in the cluster. Therefore, a problem may arise in that a plurality of candidate clusters need to be accessed and looked for in order to retrieve a desired embryo image.

Number of clusters accessed and number of data 230

A user may want to access the minimal image data within a large embryonic image database to retrieve the desired image.

Therefore, in order to support such a user's needs, the number of clusters or image data to be accessed must be minimized until the user finally retrieves the desired embryo image.

As a database structuring method, when a hierarchical structure is used, the majority of candidate image sets that are far from the image desired by the user may be excluded from the search object from the beginning. Thus, when the search is performed, the number of clusters or the number of image data that are accessed can be reduced. However, there is a need to find ways to significantly reduce the number of clusters or image data that are accessed.

Hierarchy Balance (240)

If the hierarchical structure formed by the hierarchical clustering method is an inclined structure, there is a problem in that a constant search performance cannot be guaranteed. As the depth of the hierarchy becomes deeper, the user may have to access a plurality of candidate clusters to find a desired image through a browsing process from the top root node, thereby degrading search performance. On the other hand, if the depth of the hierarchical structure is too low, the size of a specific cluster may be too large or the number of image data included in the cluster may increase, which may cause a user to look at a large number of candidate image data.

In consideration of this, in an embodiment of the present invention, the hierarchical layer considering the size of the cluster and the number of data at the same time so that the user can search images similar to the desired image by accessing the minimum image data without visual burden. Discloses an ever clustering scheme.

3 illustrates a hierarchical structure generated by a hierarchical clustering scheme according to an embodiment of the present invention.

The non-lead nodes 310, 320 or 330 of each layer are divided into n or fewer clusters [C ₀ , C ₁ , ..., C _k ] (0 ≤ k <n). It is composed.

Each cluster C _i and stores information about the size of a cluster C _i ^size, number of data in the cluster C _i and a representative object ^num C _i ^rep in the entry, stores the pointer information for the lower node in the entry.

Clustering according to an embodiment of the present invention until the clusters ([C ₀ , C ₁ , ..., C _k ]) in the non-terminal nodes 310, 320 or 330 each contain less than a certain number of data. The method continues to be divided.

For example, the first internal node 320 is obtained by dividing the cluster C ₀ of the root node 310. The second internal node 330 is obtained by dividing the cluster C _n of the root node 310.

Through this partitioning process, a hierarchical structure up to the terminal node 330 is formed.

Cluster C _i in terminal node 330 stores the pointer information for data page 350 as an entry. The data page 350 stores up to a predetermined number of similar objects included in the cluster C _i . The object may be image data. The image may be an embryo image.

Using this tree structure, the user can traverse the tree to find the desired embryo image even if the user does not know about the embryo image data to be searched for in detail. That is, the user can look at the representative objects of each cluster of nodes starting from the top node and going down to the bottom node.

As the user descends to a lower node, the user can view similar embryo image data in more detail. By repeating this process, the user can finally find a cluster including image data similar to the desired embryo image data.

If there is image data desired by the user in the searched cluster, the search is terminated. If there is no embryo image data desired by the user in the searched cluster, the user may move back to an upper node and search for desired image data.

4 is a flowchart of a data organization method according to an embodiment of the present invention.

The data organization method according to this embodiment can be used to structure an image database. The image may be an embryo image.

Data organization method uses graph-based hierarchical clustering.

In operation 410, the similarities between all object pairs of one or more objects are measured.

The object may be data of an image. In addition, the image may be an embryo image.

If the object is an image, operation 410 may include 1) extracting a color histogram on the RGB space of the image, 2) extracting an RGB vector of the image based on the extracted color histogram, and 3) extracting the RGB histogram. The method may include measuring similarity between all image pairs of the one or more images using the vector. At this time, 1) extracting the color histogram on the RGB space of the image and 2) extracting the RGB vector of the image based on the extracted color histogram can be seen as the operation of extracting the RGB vector from the image.

Image data may have vectors of a particular dimension. Each dimension of the vector may correspond to a specific region of the image, and a value of each dimension may mean a median value of RGB pixels of the specific region. In other words, a vector may be considered to concisely represent a corresponding image.

To represent image data (eg, embryo image data) as feature vectors, color histograms on the RGB space used in the color-based image retrieval method can be extracted and used.

Therefore, similarity measurement methods such as Euclidean distance, histogram intersection, and cross-talk distance, which are used in the color image histogram, can be used. The similarity measuring method may measure the similarity between all image pairs of one or more images.

However, in the case of using the color histogram as the feature vector, the image data is represented as high-dimensional data having d attributes. When clustering is performed on such high-dimensional data, a dimensionality curse problem may occur, and a problem of performance of clustering may be low.

Therefore, in the present embodiment, graph-based clustering is performed to solve the above problem.

In operation 420, a similarity graph is generated based on the measured similarities.

Image data is represented by nodes in the similarity graph rather than points in d-dimensional space. The edges between the nodes represent the similarity between the image data.

In consideration of the calculation amount, a method of limiting the number of nodes to be connected to the trunk line may be used. For example, the Fixed-Radius Method (FRM) is a method of connecting only nodes of an image within an arbitrary radius to an edge. Fixed-radius schemes are known to be efficient when using metric distance.

However, the similarity used in one embodiment of the present invention is a non-metric distance.

Thus, the similarity graph may be a k-nearest neighbors (NN) graph of k neighbors.

The k-NN graph performs a similarity-based k-nearest neighbor query to express only k objects (or image data such as embryo images) retrieved by nodes.

As a configuration method of the similarity graph, a symmetric configuration method or an asymmetric configuration method may be used.

In the symmetric configuration, when image A is a query, image B is retrieved as a similar image, and when image B is a query, image A and image B are connected by edges only when image A is retrieved as a similar image. to be.

The asymmetric constituent cushion is a method of connecting the image A and the image B by edges when the image B is searched for a similar image when the image A is a query, even if the image A is not searched for the similar image when the image B is a query.

In the similarity graph for representing n images, the node V of the similarity graph may correspond to n images of {1, 2, ..., n}. The trunk line (i, j) between node i and node j may have a weight similarity w _ij between the image corresponding to node i and the image corresponding to node j.

The similarity graph G may be represented and stored as a matrix W having the similarity w _ij as the ij-th entry (eg, i row and j column).

In operations 430 and 440, a hierarchical structure is formed by performing clustering that repeatedly applies a graph segmentation algorithm to the similarity graph.

The clustering may be performed in consideration of one or more of the size of the cluster and the number of objects included in the cluster. The size of the cluster may mean a minimum value among the distances between all pairs of objects in the cluster.

One of the purposes of embodiments of the present invention is to quickly and easily find the object that the user ultimately wants through only a few object (eg embryo image) searches.

If there are too many images in the cluster, the number of images to be looked at by the user may be large. If the cluster becomes too large, it is more likely that dissimilar images will be gathered within the cluster. As a result, a user may need to look at unnecessary images.

The conventional hierarchical clustering method does not simultaneously consider the size of the cluster and the number of data in the cluster. Therefore, when the final clusters are obtained by performing conventional hierarchical clustering, there is a high possibility of generating a hierarchical structure in which the size and the number of data of each of the final clusters show a very large difference.

In addition, the conventional hierarchical clustering method merges the two clusters if the two clusters are closest to each other compared with other clusters, even if the similarity between the data existing in the two specific clusters is low. Therefore, the conventional hierarchical clustering method is likely to generate clusters of large size or very large number of data.

Accordingly, one embodiment of the present invention is directed to the above-mentioned considerations when structuring a database, namely (1) the size of the cluster, (2) the number of data contained within the cluster, and (3) Disclosed is a hierarchical clustering method which considers the number of clusters, the number of data, and (4) the balance of the hierarchical structure.

As the graph segmentation algorithm, spectral and metis can be used.

For spectral "R. Kannan, S. Vempala, and A. Vetta. 'On clusterings good, bad, and spectral,' In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000." And "A. Y. Ng, M. Jordan, and Y. Weiss.'On spectral clustering: Analysis and an algorithm, 'In Proceedings of Neural Information Processing Systems, 2001."

For metis "G. Karypis and V. Kumar," METIS 4.0: Unstructured graph partitioning and sparse matrix ordering system, "Technical report, Department of Computer Science, University of Minnesota, 1998; http: //www.cs.umn. edu / -metis. " And "G. Karypis and V. Kumar," Multilevel algorithms for multi-constraint graph partitioning, "Journal of Parallel and Distributed Computing, Vol. 48, No. 1, pp. 96-129, 1998."

In addition, as a graph segmentation algorithm, an algorithm (eg, hMetis) for bi-partitioning a k-NN graph to minimize edge cut may be used.

For hMetis, G. Karypis and V. Kumar, 'hMETIS 1.5: A Hypergraph Partitioning Package,' Technical report, Department of Computer Science, University of Minnesota, 1998;

Since each edge of the constructed similar graph represents the similarity between the embryo image data, hMetis is used to minimize the relationship between the image data in different clusters.

In operation 430, the similarity graph is divided into a first predetermined number of clusters by a graph partitioning algorithm.

By repeating the graph splitting algorithm, n clusters can be generated. n is the number of clusters predetermined by the user. The number of clusters to be created by operation 440 is named as the first predetermined number.

To generate n clusters, operation 440 generates clusters by 1) generating clusters by dividing the similarity graph and 2) dividing the clusters by the smallest similarity value among the clusters generated by the dividing. It may include an operation to. Among these, operation 2) may be repeatedly performed until the first predetermined number of clusters are generated.

That is, in order to consider the size of the cluster in forming the hierarchical structure, the target cluster to be divided by the graph division algorithm may be selected based on the size of the cluster.

As described above, minimum similarity may be used as a criterion for sizing clusters. That is, the cluster with the smallest similarity value may be selected as the target cluster to be divided into two by the graph segmentation algorithm of operation 440.

If minimum similarity is used as the size of the cluster, the specific cluster can be prevented from becoming too large.

In operation 440, clusters in which the object (eg, embryo graph data) of the first predetermined number of clusters is larger than the threshold value are subdivided into a second predetermined number of clusters by using a graph partitioning algorithm.

The second predetermined number and the first predetermined number may be the same value.

In order to take into account the number of data in the cluster, among the first predetermined number of clusters generated by operation 430, clusters whose number of data is greater than a given threshold O _max are subject to division of operation 440. Can be selected as the cluster. The selected cluster may again be subdivided into a second predetermined number of clusters based on the size of the cluster.

By operations 430 and 440, the size of the cluster and the number of data in the cluster are constantly checked during the formation of the hierarchical structure, and the cluster having a very large cluster or a large number of data is partitioned by a partitioning algorithm. The degree of balance in the hierarchy can be considered.

In operation 450, a representative object is selected for each of the root node and non-terminal nodes of the hierarchy.

By using hierarchical features, a number of objects that are far from the user's desired object (eg, bea imager) can be excluded from the beginning of the search process. Thus, the number of objects that a user needs to access for image retrieval can be significantly reduced.

Representative object refers to an object that can represent a cluster.

In an embodiment of the present invention, representative objects for the root node and non-terminal node of the hierarchical structure may be selected, and representative objects selected for the root node and the non-terminal node of the hierarchical structure may be stored. In addition, an actual object (eg, embryo image data) may be stored in the terminal node.

In order to solve the high dimensional problem, an embodiment of the present invention may model an object (eg, embryo image data) in a graph structure.

In the simplest way of searching for the representative object in the graph structure, the object corresponding to the node with the highest order in the graph can be selected as the representative object. However, this selection has the problem that the similarity value between the nodes and the objects represented in the edge connecting the nodes is ignored.

To solve this problem, an image most similar to all images in the cluster may be selected as the representative object. That is, the representative object may be an object having the highest average similarity among the entities in the cluster corresponding to the root node or the non-terminal node.

Equations 1 and 2 below are established for clusters {C ₁ , C ₂ , ..., C _m } obtained by clustering the similarity graph G = (V, E) represented by the matrix W. The representative object node of the cluster C _i may be a node having the largest average similarity according to Equation 3 below.

In operation 460, a top-down search of the hierarchy is provided by indicating a representative object of the node.

A user searching the hierarchy may proceed by searching by selecting a representative object most similar to an object (eg, embryo image data) that is desired by the upper node and gradually descending to the lower node. When the user finally reaches the terminal node by searching, the user can directly check all the objects in the cluster where objects similar to the object he wants are gathered.

If the object is embryo image data, each embryo image may have tag information.

Tag information may indicate a trait that is expressed as the embryo grows. That is, images of embryos with identical (or similar) expression traits can be considered similar because they have similar vector values. Thus, the accuracy of the hierarchical structure formed by using the tag information of the embryo images can be measured.

5 illustrates as an algorithm a data organization method according to an embodiment of the present invention.

Seventh to ninth lines 510 of the algorithm correspond to the cluster division operation 430.

The tenth to fifteenth lines 520 of the algorithm correspond to the cluster repartitioning operation 440.

6 illustrates representative objects of clusters belonging to a non-terminal node viewed through a visualization tool according to an embodiment of the present invention.

The user may check the representative objects 610 of each of the n clusters belonging to the root node 310 or the non-terminal node 320 or 330 through the visualization tool on the screen.

N = 7 in FIG.

Each of the representative objects 610 may include an image (eg, an embryo image) and a representative number of the image.

The image 620 on the right is an enlarged view of one representative object selected by the user among the n representative objects 610 on the left side.

The cluster including the enlarged representative object may include sub clusters. The lower classes are a predefined number of clusters created by applying the graph segmentation algorithm to the cluster containing the enlarged representative object.

The user may perform an operation (eg, pressing down DOWN button 624) to descend to the lower node to identify the lower node composed of the lower clusters.

The user may perform an operation (for example, pressing the UP button 622) to go up to the lower node to identify the upper node including the cluster including the enlarged representative object.

7 illustrates one of clusters existing in the terminal node through the visualization tool according to an embodiment of the present invention.

7, the user can visually confirm that the images 710 present in the cluster are similar to each other. Also, the representative object 720 of the images 710 may be checked.

The user may select one of the images 710 existing in the cluster belonging to the terminal node. The selected image 730 and tag information 740 of the selected image may be displayed.

8 is a structural diagram of a data organization apparatus according to an embodiment of the present invention.

The data organization apparatus 800 may include a similarity measurer 810, a similarity graph generator 820, a hierarchical structure generator 830, a representative object selector 840, and a visualization 850.

The similarity measurer 810 performs the above-described operation 410.

That is, the similarity measurer 810 may measure similarities between all object pairs of one or more objects.

The object may be data of an image. The image may be an embryo image.

The similarity measurer may extract an RGB vector from an image, and measure the similarity between all image pairs of one or more images using the extracted RGB vector.

The similarity graph generator 820 performs the above-described operation 420.

That is, the similarity graph generator 820 may generate a similarity graph based on the measured similarities.

The similarity graph may be a k-nearest neighbor graph composed of k neighbors.

The hierarchical structure forming unit 830 performs the operations 430 and 440 described above.

That is, the hierarchical structure forming unit 830 forms a hierarchical structure by repeatedly applying a graph division algorithm to the generated similarity graph.

The graph segmentation algorithm can divide the similarity graph to minimize the edge cutting.

The hierarchical structure forming unit 830 may divide the similarity graph into a first predetermined number of clusters by a graph partitioning algorithm, and cluster clusters in which the number of objects is greater than a threshold value among the first predetermined number of clusters. By using a partitioning algorithm it is possible to subdivide into a second predetermined number of clusters.

For the above division, the hierarchical structure forming unit 830 may generate clusters by dividing the similarity graph, and generate clusters by dividing the cluster with the least similarity value among the clusters generated by the dividing. have.

The representative object selector 840 performs the operation 450 described above.

That is, the representative object selector 840 may select a representative object for each of the root node and the non-terminal nodes of the formed hierarchical structure.

The visualization unit 850 performs the above-described operation 460.

That is, the visualization unit 850 may provide a top-down search of the hierarchical structure by displaying the representative objects of the formed nodes of the hierarchical structure.

Technical contents according to an embodiment of the present invention described above with reference to FIGS. 1 to 7 may be applied to the present embodiment as it is. Therefore, more detailed description will be omitted below.

Through hierarchical clustering according to an embodiment of the present invention, a large amount of embryo image data may be structured in a form close to a balance tree without being inclined. This structuring allows the user to efficiently search for the desired embryo image.

A visualization tool may be provided to help the user easily see the tree structure as a result of the structuring according to one embodiment of the present invention. Visualization tools help users quickly and easily find the objects they want while visually identifying the representative objects in the clusters.

The result of structuring data through hierarchical clustering according to an embodiment of the present invention can support a much more efficient search than the result of structuring data using a conventional hierarchical clustering method. The result of clustering according to an embodiment of the present invention is superior in terms of accuracy than the result of conventional hierarchical clustering.

Method according to an embodiment of the present invention is implemented in the form of program instructions that can be executed by various computer means may be recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.

800: data organization device
810: similarity measuring unit
820: Similarity graph generator
830: hierarchical structure forming unit
840: representative object selection
850: visualization

Claims

Measuring similarities between all pairs of objects in one or more objects;
Generating a similarity graph based on the measured similarities; And
Forming a hierarchical structure by performing clustering to repeatedly apply a graph segmentation algorithm to the similarity graph;
/ RTI >
The clustering is performed in consideration of at least one of the diameter of the cluster and the number of objects included in the cluster,
The size of the cluster means a minimum value among the distances between all pairs of objects in the cluster,
The graph segmentation algorithm bi-partitions the similarity graph to minimize edge cutting.

The method of claim 1,
And the object is data of an image.

The method of claim 2,
Wherein said image is an embryo image.

The method of claim 2,
Measuring similarities between all pairs of objects in the one or more objects,
Extracting an RGB vector from the image; And
Measuring similarity between all image pairs of at least one image using the RGB vector
Comprising a data organization method.

5. The method of claim 4,
Extracting the RGB vector from the image,
Extracting a color histogram on the RGB space of the image; And
Extracting an RGB vector of the image based on the color histogram
Comprising a data organization method.

The method of claim 1,
The similarity graph is a neighbor neighbors (k-NN) graph of k neighbors.

The method according to claim 6,
The k-NN graph is represented by performing a similarity-based k-neighbor query to connect only k retrieved objects to nodes.

The method according to claim 6,
Node V of the k-NN graph corresponds to n images, and the edge (i, j) between node i and node j is the similarity w between the image corresponding to node i and the image corresponding to node j _A method of organizing data, with _ij as the weight.

delete

The method of claim 1,
Forming a hierarchical structure by performing clustering to repeatedly apply a graph segmentation algorithm to the similarity graph,
Dividing the similarity graph into a first predetermined number of clusters by the graph partitioning algorithm; And
Subdividing the clusters whose number of objects is greater than a threshold among the first predetermined number of clusters into a second predetermined number of clusters by using the graph partitioning algorithm;
Comprising a data organization method.

The method of claim 10,
Configuring a first predetermined number of clusters by using a graph partitioning algorithm for the similarity graph,
Generating clusters by dividing the similarity graph; And
Generating clusters by dividing the cluster with the smallest similarity value among the clusters generated by the dividing;
Comprising a data organization method.

The method of claim 1,
Selecting a representative object for each of the root node and the non-terminal nodes of the hierarchy:
Further comprising, data organization method.

The method of claim 12,
And the representative object is an object having a largest average similarity among objects in a cluster corresponding to the root node or the non-terminal node.

The method of claim 12,
Providing a top-down search of the hierarchy by displaying representative objects of the nodes in the hierarchy
Further comprising, data organization method.

Measuring similarities between all image pairs of one or more embryo images;
Generating a similarity graph based on the measured similarities; And
Forming a hierarchical structure by performing clustering to repeatedly apply a graph segmentation algorithm to the similarity graph
/ RTI >
The clustering is performed in consideration of at least one of the diameter of the cluster and the number of objects included in the cluster,
The size of the cluster means a minimum value among the distances between all pairs of objects in the cluster,
The graph segmentation algorithm bi-partitions the similarity graph to minimize edges.

16. The method of claim 15,
Selecting a representative object for each of the root node and the non-terminal nodes of the hierarchy: and
Providing a top-down search of the hierarchy by displaying representative objects of the nodes in the hierarchy
Wherein the representative object is an embryo image having the highest average similarity among embryo images in a cluster corresponding to the root node or the non-terminal node.

delete