KR101247401B1 - Method and apparatus for hierarchical organization of embro data for supporting efficient search - Google Patents
Method and apparatus for hierarchical organization of embro data for supporting efficient search Download PDFInfo
- Publication number
- KR101247401B1 KR101247401B1 KR1020110026445A KR20110026445A KR101247401B1 KR 101247401 B1 KR101247401 B1 KR 101247401B1 KR 1020110026445 A KR1020110026445 A KR 1020110026445A KR 20110026445 A KR20110026445 A KR 20110026445A KR 101247401 B1 KR101247401 B1 KR 101247401B1
- Authority
- KR
- South Korea
- Prior art keywords
- graph
- image
- cluster
- similarity
- objects
- Prior art date
Links
Images
Abstract
Methods and apparatus are provided for organizing data. Similarities between all pairs of objects consisting of one or more objects are measured, and a similarity graph is generated based on the measured similarities. The hierarchical structure is formed by performing clustering on the generated similarity graph considering the size of the cluster and the number of objects included in the cluster. The object may be embryo image data representing an embryo.
Description
A method and apparatus for hierarchically organizing data.
A method and apparatus are disclosed for providing efficient retrieval by hierarchically organizing embryo data.
Embryos are the first stages in the development of multicellular organisms such as animals or plants.
The basic system of the future body of an animal or plant is determined at the time of embryo. Thus, embryos are an important subject in the study of the mechanism of development. The study of these embryos is called developmental biology.
In general, biologists in developmental biology have a large embryonic image database for embryo research. Biologists want to efficiently search for embryonic images of their research in this large database. However, searching for the desired image efficiently in a large image database is not easy.
There are two general ways to find the image you want from a large image database. The first method is a search method using a query, and the second method is a search method using a browsing method.
The search method using a query is a method that can be used when a user specifically has metadata (such as a text tag) of the image to be searched or color characteristics of the image. The user can search for similar images from the database by using various features of the image to find as a query.
On the other hand, a search method using a browsing method may be used when the user knows vaguely about an image to be searched for. This method is used when the user looks through the images in the image database and wants to find the desired image.
When biologists want to find the embryo image to be studied, it is not easy to describe the text tag of the embryo image or the color characteristics of the image. In other words, it is very difficult for a user to search a database by using a search method to find an embryo image to be studied.
Thus, the user must look directly at all the embryo image data stored in the database using a browsing scheme to retrieve the desired embryo image.
However, since a large number of embryo images are stored in the database, it is practically difficult for a user to view all embryo images in the database.
To solve this problem, it is very important to structure the database. By structuring the database, the user can limit the scope of the search to bundle only embryo images that are similar to what he or she wants, and focus on the color features of the bundled similar images.
The primary method used for database structuring is hierarchical clustering.
Hierarchical clustering is a method of calculating the similarity between objects to classify the objects into clusters having similar characteristics, and expressing the divided clusters in a tree structure form.
When hierarchical clustering is used as a database structuring method, users can intuitively grasp the entire database structure. In addition, when a user traverses the tree to find a desired object by using the browsing function, the user can concentrate on searching only the desired objects.
Most existing hierarchical clustering methods use agglomerative hierarchical clustering.
When the hierarchical hierarchical clustering method is used, each object forms a separate cluster in the beginning stage, and two clusters are merged according to a predetermined criterion in each successive stage. This merging process proceeds until all objects form a cluster or some termination condition is satisfied.
The hierarchical clustering of this bottom-up strategy can gradually grow only certain clusters because the objects are phased together according to some criteria.
Thus, a tree consisting of clusters as a result of a merged hierarchical clustering method is very likely a skewed tree. If a tilt tree has been created, it may take a considerable amount of time when the user traverses this tilt tree to search for the desired image.
One embodiment of the present invention can provide an apparatus and method for organizing data hierarchically.
One embodiment of the present invention may provide an apparatus and method for supporting efficient retrieval by hierarchically organizing embryo data.
According to an aspect of the present invention, an operation of measuring similarity between all pairs of objects of one or more objects, generating a similarity graph based on the measured similarities, and clustering of repeatedly applying a graph segmentation algorithm to the similarity graph And forming a hierarchical structure by performing the method, wherein the clustering is performed in consideration of at least one of a diameter of a cluster and a number of objects included in the cluster, wherein the size of the cluster is equal to all objects in the cluster. A data organization method is provided that means the minimum value among the distances between pairs.
The object may be data of an image.
The image may be an embryo image.
Measuring similarities between all object pairs of the at least one object may include extracting an RGB vector from the image and measuring similarity between all image pairs of at least one image using the RGB vector. .
Extracting an RGB vector from the image may include extracting a color histogram on an RGB space of the image and extracting an RGB vector of the image based on the color histogram.
The similarity graph may be a k-nearest neighbors (NN) graph composed of k neighbors.
The k-NN graph can be expressed by performing k-neighbor queries based on the similarity and connecting only k objects retrieved to nodes.
Node V of the k-NN graph may correspond to n images, and the edges (i, j) between node i and node j are similarities between the image corresponding to node i and the image corresponding to node j. It may have w ij as a weight.
The graph segmentation algorithm may bi-partition the similarity graph to minimize edge cutting.
Forming a hierarchical structure by repeatedly applying a graph segmentation algorithm to the similarity graph may include: dividing the similarity graph into a first predetermined number of clusters by the graph segmentation algorithm and the first pattern; And subdividing clusters having a larger number of objects than a threshold from among a predetermined number of clusters into a second predetermined number of clusters by using the graph partitioning algorithm.
Comprising a first predetermined number of clusters by using a graph partitioning algorithm for the similarity graph may include generating clusters by dividing the similarity graph and a minimum similarity value among the clusters generated by the dividing. Splitting the most clusters may include creating clusters.
The data organization method may further include selecting a representative object for each of the root node and the non-terminal nodes of the hierarchy.
The representative object may be an object having a largest average similarity among objects in a cluster corresponding to the root node or the non-terminal node.
The data organization method may further include providing a top-down search of the hierarchy by displaying representative objects of nodes in the hierarchy.
According to another aspect of the present invention, an operation of measuring similarities between all image pairs of at least one embryo image, generating a similarity graph based on the measured similarities, and repeatedly applying a graph segmentation algorithm to the similarity graph Forming a hierarchical structure by performing clustering, wherein the clustering is performed in consideration of at least one of a diameter of a cluster and a number of objects included in the cluster, and the size of the cluster A method for forming a hierarchy of embryo images is provided, which means a minimum value among the distances between pairs of objects.
The method for forming a hierarchical structure of the embryo image may include selecting a representative object for each of the root and non-terminal nodes of the hierarchy and providing a top-down search of the hierarchy by displaying representative objects of the nodes in the hierarchy. The method may further include an operation, wherein the representative object may be an embryo image having the largest average similarity among embryo images in a cluster corresponding to the root node or the non-terminal node.
According to another aspect of the present invention, a similarity measurer for measuring similarities between all pairs of objects of one or more objects, a similarity graph generator for generating a similarity graph based on the measured similarities and a graph for the similarity graph And a hierarchical structure forming unit for forming a hierarchical structure by repeatedly applying a partitioning algorithm, wherein the clustering is performed in consideration of at least one of a diameter of a cluster and a number of objects included in the cluster, A data organization apparatus is provided in which the size of a cluster means a minimum value among distances between all pairs of objects in the cluster.
The object may be data of an image.
The similarity measurer may extract an RGB vector from the image, and measure the similarity between all image pairs of one or more images using the RGB vector.
The hierarchical structure forming unit may divide the similarity graph into a first predetermined number of clusters by the graph partitioning algorithm, and among the first predetermined number of clusters, clusters in which the number of objects is greater than a threshold value. By using the graph partitioning algorithm it is possible to subdivide into a second predetermined number of clusters.
The hierarchical structure forming unit may generate clusters by dividing the similarity graph, and generate clusters by dividing the cluster with the least similarity value among the clusters generated by the dividing.
The apparatus for organizing data may further include a representative object selecting unit for selecting a representative object for each of the root node and the non-terminal nodes of the hierarchy, wherein the representative object corresponds to the root node or the non-terminal node. The object in the cluster may have the largest average similarity.
The data organization apparatus may further include a visualization unit configured to provide a top-down search of the hierarchy by displaying a representative object of the node of the hierarchy.
Apparatus and methods for organizing data hierarchically are provided.
An apparatus and method are provided that support efficient retrieval by hierarchically organizing embryo data.
1 illustrates a method for structuring a large amount of embryo image data in a tree form near a balanced state without skewing according to an embodiment of the present invention.
2 illustrates what to consider in structuring a database using hierarchical clustering according to an embodiment of the present invention.
3 illustrates a hierarchical structure generated by a hierarchical clustering scheme according to an embodiment of the present invention.
4 is a flowchart of a data organization method according to an embodiment of the present invention.
5 illustrates as an algorithm a data organization method according to an embodiment of the present invention.
6 illustrates representative objects of clusters belonging to a non-terminal node viewed through a visualization tool according to an embodiment of the present invention.
7 illustrates one of clusters existing in the terminal node through the visualization tool according to an embodiment of the present invention.
8 is a structural diagram of a data organization apparatus according to an embodiment of the present invention.
Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. Like reference symbols in the drawings denote like elements.
1 illustrates a method for structuring a large amount of embryo image data in a tree form near a balanced state without skewing according to an embodiment of the present invention.
Graph-based hierarchical clustering for database structuring may consist of the following first to fourth steps.
First step 110: generating a similarity graph.
Second Step 120: Constructing Clusters Using a Graph Segmentation Algorithm.
Third Step 130: For each of the configured clusters, selecting a representative object capable of representing the cluster.
Fourth Step 140: Forming a tree structure by repeating the second and third steps.
In a
In a
At this time, by considering the size of the cluster and the number of objects included in the cluster at the same time, it is possible to prevent the size of the specific cluster from becoming too large or the number of objects in the specific cluster from becoming too large.
In the third step, representative objects of each cluster are selected to minimize the number of embryo images approached to retrieve the desired embryo image.
The representative object refers to an embryo image that can best reflect the characteristics of the cluster among all the embryo images in the cluster.
In the fourth step, the database is structured in a tree structure by repeating
2 illustrates what to consider in structuring a database using hierarchical clustering according to an embodiment of the present invention.
Cluster size (210)
In general, the size of a cluster means the maximum value among the distances between all pairs of objects in the cluster.
One embodiment of the present invention may use similarity as a measure for clustering embryo image data. Similarity can be seen as inversely proportional to distance.
Therefore, in an embodiment of the present invention, the size of a cluster may mean a minimum value among similarities among all pairs of objects in the cluster.
In the process of structuring the database, if the cluster is too large, dissimilar embryo image data may be collected within the cluster. Therefore, a problem may arise that even a candidate image far from an image desired by the user should be examined. In this case, it may be meaningless to select a representative object in the cluster.
On the other hand, if the cluster size is too small, similar embryo images may be distributed and stored in multiple clusters. Accordingly, a problem may arise in that a plurality of candidate clusters are accessed and examined to search for an embryo image desired by a user.
Number of data contained within the cluster (220)
Even if the size of the cluster is small, if the number of data (or objects) contained in the cluster is too large, the user must look at too many candidate image data in order to retrieve the desired embryo image.
On the other hand, if the number of data contained in the cluster is too small, there is a high possibility that the embryo image desired by the user does not exist in the cluster. Therefore, a problem may arise in that a plurality of candidate clusters need to be accessed and looked for in order to retrieve a desired embryo image.
Number of clusters accessed and number of
A user may want to access the minimal image data within a large embryonic image database to retrieve the desired image.
Therefore, in order to support such a user's needs, the number of clusters or image data to be accessed must be minimized until the user finally retrieves the desired embryo image.
As a database structuring method, when a hierarchical structure is used, the majority of candidate image sets that are far from the image desired by the user may be excluded from the search object from the beginning. Thus, when the search is performed, the number of clusters or the number of image data that are accessed can be reduced. However, there is a need to find ways to significantly reduce the number of clusters or image data that are accessed.
Hierarchy Balance (240)
If the hierarchical structure formed by the hierarchical clustering method is an inclined structure, there is a problem in that a constant search performance cannot be guaranteed. As the depth of the hierarchy becomes deeper, the user may have to access a plurality of candidate clusters to find a desired image through a browsing process from the top root node, thereby degrading search performance. On the other hand, if the depth of the hierarchical structure is too low, the size of a specific cluster may be too large or the number of image data included in the cluster may increase, which may cause a user to look at a large number of candidate image data.
In consideration of this, in an embodiment of the present invention, the hierarchical layer considering the size of the cluster and the number of data at the same time so that the user can search images similar to the desired image by accessing the minimum image data without visual burden. Discloses an ever clustering scheme.
3 illustrates a hierarchical structure generated by a hierarchical clustering scheme according to an embodiment of the present invention.
The
Each cluster C i and stores information about the size of a cluster C i size, number of data in the cluster C i and a representative object num C i rep in the entry, stores the pointer information for the lower node in the entry.
Clustering according to an embodiment of the present invention until the clusters ([C 0 , C 1 , ..., C k ]) in the
For example, the first
Through this partitioning process, a hierarchical structure up to the
Cluster C i in
Using this tree structure, the user can traverse the tree to find the desired embryo image even if the user does not know about the embryo image data to be searched for in detail. That is, the user can look at the representative objects of each cluster of nodes starting from the top node and going down to the bottom node.
As the user descends to a lower node, the user can view similar embryo image data in more detail. By repeating this process, the user can finally find a cluster including image data similar to the desired embryo image data.
If there is image data desired by the user in the searched cluster, the search is terminated. If there is no embryo image data desired by the user in the searched cluster, the user may move back to an upper node and search for desired image data.
4 is a flowchart of a data organization method according to an embodiment of the present invention.
The data organization method according to this embodiment can be used to structure an image database. The image may be an embryo image.
Data organization method uses graph-based hierarchical clustering.
In
The object may be data of an image. In addition, the image may be an embryo image.
If the object is an image,
Image data may have vectors of a particular dimension. Each dimension of the vector may correspond to a specific region of the image, and a value of each dimension may mean a median value of RGB pixels of the specific region. In other words, a vector may be considered to concisely represent a corresponding image.
To represent image data (eg, embryo image data) as feature vectors, color histograms on the RGB space used in the color-based image retrieval method can be extracted and used.
Therefore, similarity measurement methods such as Euclidean distance, histogram intersection, and cross-talk distance, which are used in the color image histogram, can be used. The similarity measuring method may measure the similarity between all image pairs of one or more images.
However, in the case of using the color histogram as the feature vector, the image data is represented as high-dimensional data having d attributes. When clustering is performed on such high-dimensional data, a dimensionality curse problem may occur, and a problem of performance of clustering may be low.
Therefore, in the present embodiment, graph-based clustering is performed to solve the above problem.
In
Image data is represented by nodes in the similarity graph rather than points in d-dimensional space. The edges between the nodes represent the similarity between the image data.
In consideration of the calculation amount, a method of limiting the number of nodes to be connected to the trunk line may be used. For example, the Fixed-Radius Method (FRM) is a method of connecting only nodes of an image within an arbitrary radius to an edge. Fixed-radius schemes are known to be efficient when using metric distance.
However, the similarity used in one embodiment of the present invention is a non-metric distance.
Thus, the similarity graph may be a k-nearest neighbors (NN) graph of k neighbors.
The k-NN graph performs a similarity-based k-nearest neighbor query to express only k objects (or image data such as embryo images) retrieved by nodes.
As a configuration method of the similarity graph, a symmetric configuration method or an asymmetric configuration method may be used.
In the symmetric configuration, when image A is a query, image B is retrieved as a similar image, and when image B is a query, image A and image B are connected by edges only when image A is retrieved as a similar image. to be.
The asymmetric constituent cushion is a method of connecting the image A and the image B by edges when the image B is searched for a similar image when the image A is a query, even if the image A is not searched for the similar image when the image B is a query.
In the similarity graph for representing n images, the node V of the similarity graph may correspond to n images of {1, 2, ..., n}. The trunk line (i, j) between node i and node j may have a weight similarity w ij between the image corresponding to node i and the image corresponding to node j.
The similarity graph G may be represented and stored as a matrix W having the similarity w ij as the ij-th entry (eg, i row and j column).
In
The clustering may be performed in consideration of one or more of the size of the cluster and the number of objects included in the cluster. The size of the cluster may mean a minimum value among the distances between all pairs of objects in the cluster.
One of the purposes of embodiments of the present invention is to quickly and easily find the object that the user ultimately wants through only a few object (eg embryo image) searches.
If there are too many images in the cluster, the number of images to be looked at by the user may be large. If the cluster becomes too large, it is more likely that dissimilar images will be gathered within the cluster. As a result, a user may need to look at unnecessary images.
The conventional hierarchical clustering method does not simultaneously consider the size of the cluster and the number of data in the cluster. Therefore, when the final clusters are obtained by performing conventional hierarchical clustering, there is a high possibility of generating a hierarchical structure in which the size and the number of data of each of the final clusters show a very large difference.
In addition, the conventional hierarchical clustering method merges the two clusters if the two clusters are closest to each other compared with other clusters, even if the similarity between the data existing in the two specific clusters is low. Therefore, the conventional hierarchical clustering method is likely to generate clusters of large size or very large number of data.
Accordingly, one embodiment of the present invention is directed to the above-mentioned considerations when structuring a database, namely (1) the size of the cluster, (2) the number of data contained within the cluster, and (3) Disclosed is a hierarchical clustering method which considers the number of clusters, the number of data, and (4) the balance of the hierarchical structure.
As the graph segmentation algorithm, spectral and metis can be used.
For spectral "R. Kannan, S. Vempala, and A. Vetta. 'On clusterings good, bad, and spectral,' In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000." And "A. Y. Ng, M. Jordan, and Y. Weiss.'On spectral clustering: Analysis and an algorithm, 'In Proceedings of Neural Information Processing Systems, 2001."
For metis "G. Karypis and V. Kumar," METIS 4.0: Unstructured graph partitioning and sparse matrix ordering system, "Technical report, Department of Computer Science, University of Minnesota, 1998; http: //www.cs.umn. edu / -metis. " And "G. Karypis and V. Kumar," Multilevel algorithms for multi-constraint graph partitioning, "Journal of Parallel and Distributed Computing, Vol. 48, No. 1, pp. 96-129, 1998."
In addition, as a graph segmentation algorithm, an algorithm (eg, hMetis) for bi-partitioning a k-NN graph to minimize edge cut may be used.
For hMetis, G. Karypis and V. Kumar, 'hMETIS 1.5: A Hypergraph Partitioning Package,' Technical report, Department of Computer Science, University of Minnesota, 1998;
Since each edge of the constructed similar graph represents the similarity between the embryo image data, hMetis is used to minimize the relationship between the image data in different clusters.
In
By repeating the graph splitting algorithm, n clusters can be generated. n is the number of clusters predetermined by the user. The number of clusters to be created by
To generate n clusters,
That is, in order to consider the size of the cluster in forming the hierarchical structure, the target cluster to be divided by the graph division algorithm may be selected based on the size of the cluster.
As described above, minimum similarity may be used as a criterion for sizing clusters. That is, the cluster with the smallest similarity value may be selected as the target cluster to be divided into two by the graph segmentation algorithm of
If minimum similarity is used as the size of the cluster, the specific cluster can be prevented from becoming too large.
In
The second predetermined number and the first predetermined number may be the same value.
In order to take into account the number of data in the cluster, among the first predetermined number of clusters generated by
By
In
By using hierarchical features, a number of objects that are far from the user's desired object (eg, bea imager) can be excluded from the beginning of the search process. Thus, the number of objects that a user needs to access for image retrieval can be significantly reduced.
Representative object refers to an object that can represent a cluster.
In an embodiment of the present invention, representative objects for the root node and non-terminal node of the hierarchical structure may be selected, and representative objects selected for the root node and the non-terminal node of the hierarchical structure may be stored. In addition, an actual object (eg, embryo image data) may be stored in the terminal node.
In order to solve the high dimensional problem, an embodiment of the present invention may model an object (eg, embryo image data) in a graph structure.
In the simplest way of searching for the representative object in the graph structure, the object corresponding to the node with the highest order in the graph can be selected as the representative object. However, this selection has the problem that the similarity value between the nodes and the objects represented in the edge connecting the nodes is ignored.
To solve this problem, an image most similar to all images in the cluster may be selected as the representative object. That is, the representative object may be an object having the highest average similarity among the entities in the cluster corresponding to the root node or the non-terminal node.
In
A user searching the hierarchy may proceed by searching by selecting a representative object most similar to an object (eg, embryo image data) that is desired by the upper node and gradually descending to the lower node. When the user finally reaches the terminal node by searching, the user can directly check all the objects in the cluster where objects similar to the object he wants are gathered.
If the object is embryo image data, each embryo image may have tag information.
Tag information may indicate a trait that is expressed as the embryo grows. That is, images of embryos with identical (or similar) expression traits can be considered similar because they have similar vector values. Thus, the accuracy of the hierarchical structure formed by using the tag information of the embryo images can be measured.
5 illustrates as an algorithm a data organization method according to an embodiment of the present invention.
Seventh to
The tenth to
6 illustrates representative objects of clusters belonging to a non-terminal node viewed through a visualization tool according to an embodiment of the present invention.
The user may check the
N = 7 in FIG.
Each of the
The
The cluster including the enlarged representative object may include sub clusters. The lower classes are a predefined number of clusters created by applying the graph segmentation algorithm to the cluster containing the enlarged representative object.
The user may perform an operation (eg, pressing down DOWN button 624) to descend to the lower node to identify the lower node composed of the lower clusters.
The user may perform an operation (for example, pressing the UP button 622) to go up to the lower node to identify the upper node including the cluster including the enlarged representative object.
7 illustrates one of clusters existing in the terminal node through the visualization tool according to an embodiment of the present invention.
7, the user can visually confirm that the
The user may select one of the
8 is a structural diagram of a data organization apparatus according to an embodiment of the present invention.
The
The
That is, the
The object may be data of an image. The image may be an embryo image.
The similarity measurer may extract an RGB vector from an image, and measure the similarity between all image pairs of one or more images using the extracted RGB vector.
The
That is, the
The similarity graph may be a k-nearest neighbor graph composed of k neighbors.
The hierarchical
That is, the hierarchical
The graph segmentation algorithm can divide the similarity graph to minimize the edge cutting.
The hierarchical
For the above division, the hierarchical
The
That is, the
The
That is, the
Technical contents according to an embodiment of the present invention described above with reference to FIGS. 1 to 7 may be applied to the present embodiment as it is. Therefore, more detailed description will be omitted below.
Through hierarchical clustering according to an embodiment of the present invention, a large amount of embryo image data may be structured in a form close to a balance tree without being inclined. This structuring allows the user to efficiently search for the desired embryo image.
A visualization tool may be provided to help the user easily see the tree structure as a result of the structuring according to one embodiment of the present invention. Visualization tools help users quickly and easily find the objects they want while visually identifying the representative objects in the clusters.
The result of structuring data through hierarchical clustering according to an embodiment of the present invention can support a much more efficient search than the result of structuring data using a conventional hierarchical clustering method. The result of clustering according to an embodiment of the present invention is superior in terms of accuracy than the result of conventional hierarchical clustering.
Method according to an embodiment of the present invention is implemented in the form of program instructions that can be executed by various computer means may be recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.
As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.
Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.
800: data organization device
810: similarity measuring unit
820: Similarity graph generator
830: hierarchical structure forming unit
840: representative object selection
850: visualization
Claims (18)
Generating a similarity graph based on the measured similarities; And
Forming a hierarchical structure by performing clustering to repeatedly apply a graph segmentation algorithm to the similarity graph;
/ RTI >
The clustering is performed in consideration of at least one of the diameter of the cluster and the number of objects included in the cluster,
The size of the cluster means a minimum value among the distances between all pairs of objects in the cluster,
The graph segmentation algorithm bi-partitions the similarity graph to minimize edge cutting.
And the object is data of an image.
Wherein said image is an embryo image.
Measuring similarities between all pairs of objects in the one or more objects,
Extracting an RGB vector from the image; And
Measuring similarity between all image pairs of at least one image using the RGB vector
Comprising a data organization method.
Extracting the RGB vector from the image,
Extracting a color histogram on the RGB space of the image; And
Extracting an RGB vector of the image based on the color histogram
Comprising a data organization method.
The similarity graph is a neighbor neighbors (k-NN) graph of k neighbors.
The k-NN graph is represented by performing a similarity-based k-neighbor query to connect only k retrieved objects to nodes.
Node V of the k-NN graph corresponds to n images, and the edge (i, j) between node i and node j is the similarity w between the image corresponding to node i and the image corresponding to node j A method of organizing data, with ij as the weight.
Forming a hierarchical structure by performing clustering to repeatedly apply a graph segmentation algorithm to the similarity graph,
Dividing the similarity graph into a first predetermined number of clusters by the graph partitioning algorithm; And
Subdividing the clusters whose number of objects is greater than a threshold among the first predetermined number of clusters into a second predetermined number of clusters by using the graph partitioning algorithm;
Comprising a data organization method.
Configuring a first predetermined number of clusters by using a graph partitioning algorithm for the similarity graph,
Generating clusters by dividing the similarity graph; And
Generating clusters by dividing the cluster with the smallest similarity value among the clusters generated by the dividing;
Comprising a data organization method.
Selecting a representative object for each of the root node and the non-terminal nodes of the hierarchy:
Further comprising, data organization method.
And the representative object is an object having a largest average similarity among objects in a cluster corresponding to the root node or the non-terminal node.
Providing a top-down search of the hierarchy by displaying representative objects of the nodes in the hierarchy
Further comprising, data organization method.
Generating a similarity graph based on the measured similarities; And
Forming a hierarchical structure by performing clustering to repeatedly apply a graph segmentation algorithm to the similarity graph
/ RTI >
The clustering is performed in consideration of at least one of the diameter of the cluster and the number of objects included in the cluster,
The size of the cluster means a minimum value among the distances between all pairs of objects in the cluster,
The graph segmentation algorithm bi-partitions the similarity graph to minimize edges.
Selecting a representative object for each of the root node and the non-terminal nodes of the hierarchy: and
Providing a top-down search of the hierarchy by displaying representative objects of the nodes in the hierarchy
Wherein the representative object is an embryo image having the highest average similarity among embryo images in a cluster corresponding to the root node or the non-terminal node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020110026445A KR101247401B1 (en) | 2011-03-24 | 2011-03-24 | Method and apparatus for hierarchical organization of embro data for supporting efficient search |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020110026445A KR101247401B1 (en) | 2011-03-24 | 2011-03-24 | Method and apparatus for hierarchical organization of embro data for supporting efficient search |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20120108504A KR20120108504A (en) | 2012-10-05 |
KR101247401B1 true KR101247401B1 (en) | 2013-03-25 |
Family
ID=47280103
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020110026445A KR101247401B1 (en) | 2011-03-24 | 2011-03-24 | Method and apparatus for hierarchical organization of embro data for supporting efficient search |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101247401B1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101587348B1 (en) * | 2013-12-27 | 2016-02-03 | 경희대학교 산학협력단 | Method for searching cycle graph in big graph database |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100341396B1 (en) | 1999-12-27 | 2002-06-22 | 오길록 | 3-D clustering representation system and method using hierarchical terms |
JP2005516310A (en) * | 2002-02-01 | 2005-06-02 | ロゼッタ インファーマティクス エルエルシー | Computer system and method for identifying genes and revealing pathways associated with traits |
KR20080084504A (en) * | 2007-03-16 | 2008-09-19 | 제주대학교 산학협력단 | Method for clustering similar trajectories of moving objects in road network databases |
-
2011
- 2011-03-24 KR KR1020110026445A patent/KR101247401B1/en not_active IP Right Cessation
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100341396B1 (en) | 1999-12-27 | 2002-06-22 | 오길록 | 3-D clustering representation system and method using hierarchical terms |
JP2005516310A (en) * | 2002-02-01 | 2005-06-02 | ロゼッタ インファーマティクス エルエルシー | Computer system and method for identifying genes and revealing pathways associated with traits |
KR20080084504A (en) * | 2007-03-16 | 2008-09-19 | 제주대학교 산학협력단 | Method for clustering similar trajectories of moving objects in road network databases |
Non-Patent Citations (1)
Title |
---|
오현교 외2인. 이미지 데이터베이스에서 매개변수를 필요로 하지 않는 클러스터링 및 아웃라이어 검출 방법. 전자공학회 논문지. 대한전자공학회. 2010.01. 제47권 CI편 제1호. * |
Also Published As
Publication number | Publication date |
---|---|
KR20120108504A (en) | 2012-10-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
USRE47340E1 (en) | Image retrieval apparatus | |
Rai et al. | A survey of clustering techniques | |
Bijuraj | Clustering and its Applications | |
JP5121917B2 (en) | Image search apparatus, image search method and program | |
JP6183376B2 (en) | Index generation apparatus and method, search apparatus, and search method | |
US8645380B2 (en) | Optimized KD-tree for scalable search | |
US20120011124A1 (en) | Unsupervised document clustering using latent semantic density analysis | |
US8243988B1 (en) | Clustering images using an image region graph | |
WO2013129580A1 (en) | Approximate nearest neighbor search device, approximate nearest neighbor search method, and program | |
US7512282B2 (en) | Methods and apparatus for incremental approximate nearest neighbor searching | |
WO2001031503A1 (en) | Multimedia information sorting/arranging device and sorting/arranging method | |
JP4937395B2 (en) | Feature vector generation apparatus, feature vector generation method and program | |
JP5160312B2 (en) | Document classification device | |
US8429163B1 (en) | Content similarity pyramid | |
JP5014479B2 (en) | Image search apparatus, image search method and program | |
JP4926266B2 (en) | Learning data creation device, learning data creation method and program | |
US10133811B2 (en) | Non-transitory computer-readable recording medium, data arrangement method, and data arrangement apparatus | |
JP2008210024A (en) | Apparatus for analyzing set of documents, method for analyzing set of documents, program implementing this method, and recording medium storing this program | |
US20120054140A1 (en) | Information processing apparatus, information processing method and storage medium | |
JP6680956B1 (en) | Search needs evaluation device, search needs evaluation system, and search needs evaluation method | |
Ujas et al. | A guide on analyzing flow cytometry data using clustering methods and nonlinear dimensionality reduction (tSNE or UMAP) | |
KR101247401B1 (en) | Method and apparatus for hierarchical organization of embro data for supporting efficient search | |
JP6145064B2 (en) | Document set analysis device, document set analysis method, document set analysis program | |
Mosbah et al. | Majority voting re-ranking algorithm for content based-image retrieval | |
JP4964798B2 (en) | Image dictionary generating device, image dictionary generating method, image dictionary generating program and recording medium thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant | ||
FPAY | Annual fee payment |
Payment date: 20151214 Year of fee payment: 4 |
|
FPAY | Annual fee payment |
Payment date: 20161227 Year of fee payment: 5 |
|
LAPS | Lapse due to unpaid annual fee |