WO2020195545A1 - Dispositif de gestion d'informations et procédé de gestion d'informations - Google Patents

Dispositif de gestion d'informations et procédé de gestion d'informations Download PDF

Info

Publication number
WO2020195545A1
WO2020195545A1 PCT/JP2020/008353 JP2020008353W WO2020195545A1 WO 2020195545 A1 WO2020195545 A1 WO 2020195545A1 JP 2020008353 W JP2020008353 W JP 2020008353W WO 2020195545 A1 WO2020195545 A1 WO 2020195545A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
item
nodes
information management
items
Prior art date
Application number
PCT/JP2020/008353
Other languages
English (en)
Japanese (ja)
Inventor
真理奈 藤田
宏視 荒
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Publication of WO2020195545A1 publication Critical patent/WO2020195545A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Definitions

  • the present invention relates to an information management device and an information management method capable of managing information in a hierarchical manner.
  • the route pattern extraction unit specifies a route including a category including a concept selected by the comparison concept selection unit in the information classification hierarchy, and is higher than the concept of each category included in the route.
  • Information on how it relates to the concept of the category is set, and the concept of each category is abstracted except for the user-specified concept input in the input reception unit to generate a route pattern and category.
  • the generation unit generates candidate categories by replacing the concept of categories included in the route pattern so as to satisfy the above-set information, the control unit adds the candidate categories to the information classification hierarchy, and the output unit provides information.
  • a technique for outputting a classification hierarchy is disclosed.
  • the conventional information classification hierarchy was constructed by considering only the fluctuation of the document notation, and the usage of items was not considered. For this reason, even if the items are the same, the description contents may differ depending on the document, and it may take time and effort to acquire the necessary information.
  • the present invention has been made in view of the above circumstances, and an object of the present invention is to provide an information management device and an information management method capable of hierarchically managing information reflecting how it is used.
  • the information management device includes an extraction unit that extracts a predetermined node from the hierarchical structure of the nodes assigned to the conceptualized information, and the predetermined unit that is extracted by the extraction unit. It is provided with a classification unit that classifies the predetermined node extracted by the extraction unit based on the information of the lower node associated with the node.
  • information that reflects how it is used can be managed hierarchically.
  • FIG. 7 (b) is a diagram of the concept of the hierarchical structure node of FIG. 6 (b).
  • FIG. 7 (c) is a diagram showing an integrated example of the concept based on the semantic analysis
  • FIG. 7 (c) is a diagram showing an integrated example based on the semantic analysis of the concept of the node having a hierarchical structure based on the item of the document of FIG. 4 (b).
  • 8 (a) is a diagram showing an example of extracting a predetermined node to be integrated or divided of lower nodes for the hierarchical structure of FIG. 6 (b)
  • FIG. 8 (b) is an item of the document of FIG.
  • FIG. 9 (a) is a diagram showing an integration example based on the abstraction degree analysis of the concept of the hierarchical structure node of FIG. 7 (a), and FIG. 9 (b) is the concept of the hierarchical structure node of FIG. 8 (a).
  • 9 (c) is a diagram showing an integration example of the concept based on the abstraction degree analysis of FIG. 7 (c), and
  • FIG. 9 (d) is a diagram showing an integration example based on the abstraction degree analysis of the concept of the node of the hierarchical structure of FIG. 7 (c).
  • FIG. 10 is a diagram showing an example of a correspondence relationship between items and information contents extracted from the document of FIG.
  • FIG. 11 (a) is a diagram showing a classification example of patterns of how to associate lower nodes associated with the habitat of each document
  • FIG. 11 (b) is an example of a mathematical model of pattern P1 of FIG. 11 (a).
  • 11 (c) is a diagram showing an example of a mathematical model of pattern P2 of FIG. 11 (a)
  • FIG. 11 (d) is a diagram showing an example of a mathematical model of pattern P3 of FIG. 11 (a).
  • FIG. 14 is a flowchart showing pattern classification and mathematical modeling processing based on the substructure according to the embodiment.
  • FIG. 15 is a flowchart showing a specific example of the process of S18 in FIG.
  • FIG. 16 is a flowchart showing an example of node division processing based on the abstraction degree reset according to the embodiment.
  • FIG. 17 is a flowchart showing another example of the node division process based on the abstraction degree reset according to the embodiment.
  • FIG. 18 is a flowchart showing still another example of the node division process based on the abstraction degree reset according to the embodiment.
  • FIG. 19 is a block diagram showing a hardware configuration example of the information management device of FIG.
  • the information management device classifies information based on how the notation of the document is used. At this time, the information management device extracts a predetermined node from the hierarchical structure of the nodes assigned to the conceptualized information, and classifies the predetermined node based on the information of the lower node associated with the predetermined node.
  • Nodes are assigned, for example, document items.
  • the node may be assigned a document heading or a document title.
  • the node may be assigned an item name such as a form.
  • the information management device when the operating subject is described as " ⁇ part is”, the processor reads the program ⁇ part, loads it into the DRAM (Dynamic Random Access Memory), and then realizes the function of the ⁇ part. It shall mean that.
  • FIG. 1 is a block diagram showing a configuration example of the information management device according to the embodiment.
  • the information management device includes an item extraction unit 1, a node candidate generation unit 2, a node extraction unit 3, a node integration unit 4, a classification unit 5, a modeling unit 6, a node division unit 7, a thesaurus dictionary 8, and a conceptual model. 9 is provided.
  • the item extraction unit 1 extracts items from documents D1 to D4 ... And generates a hierarchical structure of nodes to which the items are assigned. At this time, the item extraction unit 1 uses the description of the documents D1 to D4 ... as it is as the item name attached to the node. Therefore, even if the item names given to the nodes have the same concept, the notation may vary.
  • the node candidate generation unit 2 unifies the names of the items of the same concept extracted from the documents D1 to D4 ... Based on the morphological analysis and the synonym analysis. At this time, the node candidate generation unit 2 can refer to the thesaurus dictionary 8. Further, the node candidate generation unit 2 modifies the hierarchical structure of the nodes based on the inclusion relationship of the words extracted from the documents D1 to D4. For example, when the concept of the lower node associated with the predetermined node is a modifier that is not included in the concept of the predetermined node, the node candidate generation unit 2 can aggregate the lower node into the predetermined node.
  • the node extraction unit 3 extracts a predetermined node from the hierarchical structure of the nodes. For example, the node extraction unit 3 can extract nodes having one or less levels of lower node hierarchy as predetermined nodes. By extracting the nodes whose lower node hierarchy is one level or less as the predetermined node, it is possible to facilitate the pattern classification based on the item of the lower node associated with the predetermined node.
  • the node integration unit 4 integrates the degree of abstraction of the items of the lower nodes associated with the predetermined node. At this time, the node integration unit 4 can refer to the conceptual model 9. As a result, even if the items are in the same hierarchy, the item name described in the upper concept and the item name described in the lower concept can be matched.
  • the classification unit 5 classifies the predetermined node based on the item of the lower node associated with the predetermined node. At this time, the classification unit 5 can classify the predetermined node based on the combination of the concepts of the lower nodes associated with the predetermined node. For example, the classification unit 5 can classify the first node and the second node into different groups when the concept of the lower node associated with the first node cannot be the concept of the lower node associated with the second node. .. As a result, even when the item name of the first node and the item name of the second node are the same, it can be determined that the item of the first node and the item of the second node are used differently.
  • the modeling unit 6 estimates a model of how to associate the lower node based on the information of the lower node associated with the predetermined node. At this time, the modeling unit 6 can generate a pattern of how the lower nodes are associated with the predetermined nodes classified in the same group by the classification unit 5. This pattern can indicate the degree of cohesion or the degree of variation of the lower nodes associated with the predetermined nodes assigned to the items extracted from the plurality of documents D1 to D4.
  • the modeling unit 6 can refer to the information content of the lower node associated with the predetermined node. For example, the modeling unit 6 can estimate a model of how the lower node is associated based on the element of the information content of the lower node associated with the predetermined node.
  • the element of the information content of the lower node is, for example, a word included in the information content of the lower node.
  • the model of how to associate the information of the lower node may be constructed based on the amount of information of the information content of the lower node, or may be constructed based on the similarity of the elements of the information content of the lower node.
  • the node division unit 7 divides the items of the lower nodes associated with the predetermined nodes classified into different groups into specific items specific to each group, and outputs the hierarchical structure of the nodes for each group. At this time, the node dividing unit 7 can divide the items of the lower node associated with the predetermined node based on the model estimated by the modeling unit 6. As a result, even when items of the same concept are extracted from documents D1 to D4 ..., the item names can be different according to the difference in how these items are used, and the difference in how the items are used. Can be realized in the search that reflects.
  • the thesaurus dictionary 8 is a dictionary that classifies words based on the similarity of meanings.
  • the conceptual model 9 is a model showing the vertical relationship between concepts. At this time, the upper layer can have a higher degree of abstraction than the lower layer. As the conceptual model 9, for example, an ontology can be used.
  • FIG. 2 is a diagram showing an example of the document of FIG.
  • document D1 is given the title of anemone fish ecology.
  • Document D1 includes items such as habitat, breeding method, foraging method, gender and survival time.
  • Habitat items include water quality, depth, temperature, symbiosis and habitat areas.
  • the item of water quality includes the information content of seawater.
  • the item “water depth” includes the information content of 20-40 m.
  • the item “temperature” includes the information content of 24 degrees.
  • the item of symbiosis includes the information content of sea anemones.
  • the item Habitat includes information about the Indo- Pacific and near the equator.
  • FIG. 3 is a diagram showing another example of the document of FIG.
  • document D2 is given the title of Dobiuo ecology.
  • Document D2 includes items such as habitat, breeding method, feeding method, gender and longevity.
  • the item habitat includes items such as water quality, depth, temperature and habitat.
  • the item of water quality includes the information content of seawater.
  • the item "water depth” includes the information content of 1 m.
  • the habitat section includes information about the Pacific Ocean, Indian Ocean, and Atlantic Ocean.
  • FIG. 4 is a diagram showing still another example of the document of FIG.
  • document D3 is given the title of panda ecology.
  • Document D3 includes items such as morphology, habitat, breeding method, foraging method and longevity.
  • the item morphology includes the items size, hair and bark.
  • the item of size includes the items of total length and weight.
  • the item barking includes the items male and female.
  • the item habitat includes the item country name and habitat.
  • the item habitat includes the items temperate and bamboo grove.
  • the item “weight” includes the information content “kg”.
  • the item “Osu” includes the information content “Meow Meow”.
  • the item “female” includes the information content “myanmyan”.
  • the item “country name” includes the information content “China”
  • FIG. 5 is a diagram showing still another example of the document of FIG.
  • document D4 is given the title of lion ecology.
  • Document D4 includes items such as morphology, habitat, breeding method, foraging method, social system and longevity.
  • the item morphology includes the items size, hair and bark.
  • the item of size includes the items of total length and weight.
  • the item habitat includes the item country name and habitat.
  • the item Habitat includes the items Subtropical and Grassland.
  • the item "country name” includes the information content of Africa.
  • FIG. 6A is a diagram showing a hierarchical structure of nodes to which the document items of FIG. 2 are assigned
  • FIG. 6B is a diagram showing a hierarchical structure of nodes to which the document items of FIG. 3 are assigned. is there.
  • the item extraction unit 1 extracts titles and items from the document D1 of FIG. Then, the item extraction unit 1 assigns the node N111 to the title of the ecology of anemone fish.
  • the item extraction unit 1 assigns nodes N121 to N125 to the items of habitat, breeding method, foraging method, gender and survival time, respectively.
  • the item extraction unit 1 assigns nodes N131 to N135 to the items of water quality, water depth, temperature, symbiosis, and habitat, respectively.
  • the item extraction unit 1 associates the nodes N121 to N125 with the node N111, and associates the nodes N131 to N135 with the node N121.
  • the item extraction unit 1 of FIG. 1 extracts a title and an item from the document D2 of FIG. Then, the item extraction unit 1 assigns the node N211 to the title of the ecology of flying fish.
  • the item extraction unit 1 assigns nodes N221 to N225 to the items of habitat, breeding method, feeding method, gender and longevity, respectively.
  • the item extraction unit 1 assigns nodes N231 to N234 to the items of water quality, water depth, temperature, and habitat, respectively.
  • the item extraction unit 1 associates the nodes N221 to N225 with the node N211 and associates the nodes N231 to N234 with the node N221.
  • the item of the foraging method of the node N123 in FIG. 6A and the item of the inoculation method of the node N223 in FIG. 6B have the same concept, but the item extraction unit 1 describes the document D1. , D2 notation is used as it is. Further, the item of the survival time of the node N125 in FIG. 6A and the item of the lifespan of the node N225 in FIG. 6B have the same concept, but the item extraction unit 1 describes the documents D1 and D2. Is used as it is.
  • FIG. 7 (a) is a diagram showing an integration example based on the semantic analysis of the concept of the hierarchical structure node of FIG. 6 (a), and FIG. 7 (b) is a diagram of the concept of the hierarchical structure node of FIG. 6 (b).
  • FIG. 7 (c) is a diagram showing an integrated example of the concept based on the semantic analysis
  • FIG. 7 (c) is a diagram showing an integrated example based on the semantic analysis of the concept of the node having a hierarchical structure based on the item of the document of FIG. 4 (b).
  • FIG. 7A the node candidate generation unit 2 of FIG.
  • the node candidate generation unit 2 changes the item of the survival period of the node N125 to the item of the lifespan based on the synonym analysis.
  • the node candidate generation unit 2 extracts the item "ecology” from the title of the flying fish ecology of the node N211 based on the morphological analysis, and changes the name of the node N211 to the item "ecology”. Further, the node candidate generation unit 2 changes the item of the feeding method of the node N223 to the item of the breeding method based on the synonym analysis.
  • the node candidate generation unit 2 can integrate the notations of the items of the same concept even when the notations of the items of the same concept are different in the documents D1 and D2.
  • the item extraction unit 1 of FIG. 1 extracts a title and an item from the document D3 of FIG. Then, the item extraction unit 1 assigns the node N311 to the title of panda ecology.
  • the item extraction unit 1 assigns nodes N321 to N325 to the items of habitat, morphology, foraging method, breeding method, and lifespan, respectively.
  • the item extraction unit 1 assigns nodes N331 to N335 to the items of country name, habitat, size, hair, and bark, respectively.
  • the item extraction unit 1 assigns nodes N341 to N346 to the items of temperate zone, bamboo grove, total length, weight, male and female, respectively.
  • the item extraction unit 1 associates nodes N321 to N325 with node N311, associates nodes N331 and N322 with node N321, associates nodes N333 to N335 with node N322, and associates nodes N341 and N342 with node N332.
  • Nodes N343 and N344 are associated with node N333, and nodes N345 and N346 are associated with node N335.
  • the item extraction unit 1 sets a temporary item X1 of the upper concept of the temperate zone for the item of the temperate zone of the node N341, and sets a temporary item X2 of the higher concept of the bamboo forest for the item of the bamboo forest of the node N342. Can be set.
  • the node candidate generation unit 2 extracts the item "ecology" from the title of panda ecology of node N311 based on the morphological analysis, and changes the name of node N311 to the item "ecology". Further, the node candidate generation unit 2 determines whether or not the concept of the item of the node N345 and the concept of the item of the node N346 are included in the concept of the item of the bark of the node N335. Further, the node candidate generation unit 2 determines whether or not the information content of the node N345 as meow and the information content of the node N346 as myanmyan are included in the concept of the item of the bark of the node N335.
  • the node candidate generation unit 2 although the concept of the item of the node N345 and the concept of the item of the node N346 are not included in the concept of the item of the bark of the node N335, the information content of the node N345 and the information content of the node N345 are not included.
  • the information content of node N346 is included in the concept of the item of barking of node N335, the item of node N345 and the item of node N346 are judged to be mere modifiers, and nodes N345 and N346 are noded. Consolidate to N335.
  • FIG. 8 (a) is a diagram showing an example of extracting a predetermined node to be integrated or divided of lower nodes for the hierarchical structure of FIG. 6 (b), and FIG. 8 (b) is an item of the document of FIG. It is a figure which shows the extraction example of the predetermined node which is the target of integration or division of the lower node about the hierarchical structure based on.
  • FIG. 8A when the node candidate generation unit 2 generates the hierarchical structure of the node of FIG. 7B, the node extraction unit 3 of FIG. 1 has a lower layer structure of one stage from the lower nodes N231 to N231 to The node N221 associated with N234 is extracted.
  • the item extraction unit 1 of FIG. 1 extracts titles and items from the document D4 of FIG. Then, the item extraction unit 1 assigns the node N411 to the title of lion's ecology.
  • the item extraction unit 1 assigns nodes N421 to N426 to the items of habitat, morphology, foraging method, breeding method, longevity and social system, respectively.
  • the item extraction unit 1 assigns nodes N431 to N435 to the items of country name, habitat, size, hair, and bark, respectively.
  • the item extraction unit 1 assigns nodes N441 to N444 to the items of subtropical, grassland, total length, and weight, respectively.
  • the item extraction unit 1 associates nodes N421 to N426 with node N411, associates nodes N431 and N422 with node N421, associates nodes N433 to N435 with node N422, and associates nodes N441 and N442 with node N432.
  • Nodes N443 and N444 are associated with node N433.
  • the item extraction unit 1 sets a temporary item Y1 of the subtropical superordinate concept for the item of the subtropical node N441, and sets a temporary item Y2 of the superordinate concept of the grassland for the item of the grassland of the node N442. Can be set.
  • the node extraction unit 3 can extract the nodes N432 associated with the lower nodes N441 and N442 having one lower layer structure. it can.
  • the node extraction unit 3 may extract the node N433 associated with the lower nodes N443 and N444 whose lower layer structure is one stage.
  • 9 (a) is a diagram showing an integration example based on the abstraction degree analysis of the concept of the hierarchical structure node of FIG. 7 (a), and FIG. 9 (b) is the concept of the hierarchical structure node of FIG. 8 (a).
  • 9 (c) is a diagram showing an integration example of the concept based on the abstraction degree analysis of FIG. 9 (c), and FIG. Is a diagram showing an integration example based on the abstraction degree analysis of the concept of the node of the hierarchical structure shown in FIG. 8B.
  • the node extraction unit 3 of FIG. 1 extracts the node N121 associated with the lower nodes N131 to N135 having the lower layer structure of one stage from the hierarchical structure of the nodes of FIG. 7A. To do. Further, in FIG. 9B, it is assumed that the node extraction unit 3 extracts the node N221 associated with the lower nodes N231 to N234 having the lower layer structure in one stage from the hierarchical structure of the nodes in FIG. 7B. ..
  • the node integration unit 4 integrates the item of the habitat area of the lower node N135 in FIG. 9 (a) into the item of the habitat area based on the abstraction degree analysis.
  • the item name of the lower node N135 in FIG. 9A can be matched with the item name of the lower node N234 in FIG. 9B, and the fluctuation of the notation of the lower node can be eliminated.
  • the node extraction unit 3 extracts the node N332 associated with the lower nodes N341 to N342 having the lower layer structure in one stage from the hierarchical structure of the nodes in FIG. 7C. ..
  • the node extraction unit 3 extracts the node N432 associated with the lower nodes N441 to N442 having the lower layer structure in one stage from the hierarchical structure of the nodes in FIG. 8B. ..
  • the node integration unit 4 integrates the temporary item X1 of the lower node N341 into the item of climate and the temporary item X2 of the lower node N342 into the item of vegetation based on the abstraction degree analysis. Further, the node integration unit 4 integrates the temporary item Y1 of the lower node N441 into the item of climate and the temporary item Y2 of the lower node N442 into the item of vegetation based on the abstraction degree analysis.
  • the item names of the lower nodes N341 and N342 in FIG. 9C can be matched with the item names of the lower nodes N441 and N442 in FIG. 9B, respectively, and the fluctuation of the notation of the lower node can be eliminated. can do.
  • FIG. 10 is a diagram showing an example of a correspondence relationship between items and information contents extracted from the document of FIG.
  • the subordinate concepts of seawater, steam water, and freshwater are associated with the superordinate concept of water quality
  • the subordinate concepts of the Indo-Pacific, the sea area near the equator, the Indian Ocean, the Pacific Ocean, and the East Asian rivers are referred to as habitats.
  • the node integration unit 4 can integrate the item names of the concepts of the lower nodes having different abstractions by referring to the concept model 9.
  • document D1 of FIG. 2 describes the item of habitat for the information content of the Indo- Pacific and the vicinity of the equator.
  • the item of habitat is associated with the information content of the Indo-Pacific and the sea area near the equator. Therefore, the node integration unit 4 can integrate the item of the habitat area of the lower node N135 of FIG. 9A into the item of the habitat area by referring to the conceptual model 9 of FIG.
  • the item of climate is associated with the information content of temperate zone and subtropical zone
  • the item of vegetation is associated with the information content of grassland and bamboo grove. Therefore, the node integration unit 4 integrates the provisional items X1 and Y1 of the lower nodes N341 and N441 of FIGS. 9 (c) and 9 (d) into the item of climate by referring to the conceptual model 9 of FIG. Then, the provisional items X2 and Y2 of the lower nodes N342 and N442 can be integrated into the item called vegetation.
  • FIG. 11 (a) is a diagram showing a classification example of patterns of how to associate lower nodes associated with the habitat of each document
  • FIG. 11 (b) is an example of a mathematical model of pattern P1 of FIG. 11 (a).
  • 11 (c) is a diagram showing an example of a mathematical model of pattern P2 of FIG. 11 (a)
  • FIG. 11 (d) is a diagram showing an example of a mathematical model of pattern P3 of FIG. 11 (a). Is.
  • the node extraction unit 3 of FIG. 1 assumes that the item of habitat is extracted as a predetermined node. ..
  • the document on the ecology of bear flies there are water quality, water depth, temperature, habitat and symbiosis as items of lower nodes linked to the item of habitat, and for the document on the ecology of Tobiuo, the item of habitat.
  • the items of the subordinate nodes associated with it are water quality, water depth, temperature and habitat, and for the document on the ecology of dolphins, the items of the subordinate nodes associated with the item of habitat are water quality, water depth and temperature. ..
  • the classification unit 5 of FIG. 1 classifies the item of habitat in each document on the ecology of anemone fish, flying fish, dolphin, sweetfish, medaka, panda and lion based on the item of the lower node associated with the item of habitat. To do.
  • the classification unit 5 can use, for example, the distance between the vectors when the items of the lower nodes of each document are vectorized as an index for classifying the item of habitat in each document.
  • the classification unit 5 can generate a vector to which a component of 1 or 0 is added depending on the presence or absence of the item of the lower node. For example, the classification unit 5 generates a vector (1,1,1,1,1,0,0,0) for bear flies and (1,1,1,1,0,0) for tobiuo. , 0,0) is generated, for dolphins, the vector (1,1,1,0,0,0,0) is generated, and for sweetfish, (1,0,0,1) is generated. , 0,1,0,0), for medaka, (1,0,0,0,0,1,0,0), for pandas and lions, (0) , 0,0,0,0,0,1,1) is generated.
  • the distance between the vectors is 1 or 2.
  • the distance between the vectors is 1.
  • the distance between the vectors is zero.
  • Clownfish, flying fish and dolphins are more than three distances from sweetfish and killifish.
  • Clownfish, flying fish and dolphins are more than five distances from pandas and lions. Ayu and medaka are at least 4 distances from pandas and lions.
  • the classification unit 5 classifies the item of the habitat associated with the lower nodes whose distance between the vectors is smaller than 3 into the same group, and the vectors.
  • the item of habitat associated with lower nodes with a distance of 3 or more can be classified into another group.
  • the classification unit 5 classifies the first node and the second node into different groups when the concept of the lower node associated with the first node cannot be the concept of the lower node associated with the second node. May be good.
  • the item climate and Vegetation for the ecology of pandas and lions cannot be the item for the ecology of anemone fish, flying fish, dolphins, sweetfish and medaka. Therefore, the item of habitat for pandas and lions can be classified into a different group from the item of habitat for clownfish, flying fish, dolphins, sweetfish and medaka.
  • the item of flow velocity for the ecology of sweetfish and medaka cannot be the item for the ecology of anemone fish, flying fish and dolphin. Therefore, the item of habitat for sweetfish and medaka can be classified into a different group from the item of habitat for anemone fish, flying fish and dolphin.
  • the modeling unit 6 has a habitat pattern P1 showing how to link to the item habitat for anemone fish, flying fish and dolphin, and a habitat pattern P2 showing how to tie to the item habitat for sweetfish and medaka. And generate a habitat pattern P3 that shows how to tie to the item habitat for pandas and lions.
  • the modeling unit 6 can estimate the mathematical model for each habitat pattern P1 to P3 based on the information of the lower nodes associated with each habitat pattern P1 to P3.
  • the mathematical model of each habitat pattern P1 to P3 for example, the existence probability of the subordinate items, the cohesiveness of the subordinate nodes of each habitat pattern P1 to P3, or the distribution model of the information associated with each subordinate item can be used.
  • the information associated with the subordinate items items or information contents further subordinate to the subordinate items can be used.
  • the degree of cohesion of the lower nodes can be calculated based on the variance of the existence probability of the lower items for each of the habitat patterns P1 to P3.
  • the degree of cohesion of the lower nodes may be obtained based on the average distance from the representative vector of the vectors belonging to each habitat pattern P1 to P3.
  • the existence probabilities of the items of water quality, water depth, temperature, habitat and symbiosis are 1.0, 1.0, 1.0 and 0, respectively. 67, 0.33.
  • the cohesiveness of the habitat pattern P1 is 0.45.
  • the existence probabilities of the items of water quality, habitat area and flow velocity in the habitat pattern P2 are 1.0, 0.5 and 1.0, respectively.
  • the degree of cohesion for the habitat pattern P2 is 0.7.
  • a distribution model of (East Asian river) (1.0) is generated. can do.
  • the existence probabilities of the items of climate and vegetation in the habitat pattern P3 are 1.0 and 1.0, respectively.
  • the degree of cohesion for the habitat pattern P3 is 1.0.
  • FIG. 12A shows an example of dividing the lower node based on the pattern P1 of FIG. 11B
  • FIG. 12B shows an example of dividing the lower node based on the pattern P2 of FIG. 11C
  • FIG. 12 (c) is a diagram showing an example of division of lower nodes based on the pattern P3 of FIG. 11 (d).
  • the node division portion 7 of FIG. 1 divides the items of the lower nodes associated with the habitat pattern P1 into specific items specific to the habitat pattern P1.
  • the node division 7 changes the item of habitat pattern P1 habitat to the item of sea area when only the information content representing the sea area such as the Pacific Ocean and the Indian Ocean appears in the item of habitat pattern P1 habitat. To do.
  • the node division 7 maintains the item of climate of habitat pattern P3 as it is in the item of climate of habitat pattern P3 when there is no bias in the information content embodying the climate. .. Further, the node dividing unit 7 maintains the item of vegetation of habitat pattern P3 as it is in the item of vegetation of habitat pattern P3 when there is no bias in the information content embodying the vegetation.
  • the habitat pattern P3 Can be referred to.
  • FIG. 13 (a) is a diagram showing another extraction example of a predetermined node to be integrated or divided of lower nodes based on the hierarchical structure of FIG. 7 (a), and FIG. 13 (b) is FIG. 8 (b). It is a figure which shows the other extraction example of the predetermined node which is the target of integration or division of the lower node based on the hierarchical structure of.
  • the node extraction unit 3 of FIG. 1 extracts a predetermined node from the hierarchical structure of the node reflecting the processing result of the classification unit 5 of FIG.
  • the node extraction unit 3 sets the lower nodes N131 to N135 of the item of the habitat of the node N121 in FIG. 7A as the pattern PA as the information content of the node N121.
  • the hierarchy of the lower nodes N121 to N125 of the node N111 to which the item of ecology is assigned becomes one stage. Therefore, the node extraction unit 3 can extract the node N111 to which the item of ecology is assigned as a predetermined node by extracting the nodes whose lower node hierarchy is one stage or less.
  • the item of ecology that was not extracted from the hierarchical structure of FIG. 7A can also be subject to pattern classification by the classification unit 5.
  • the node extraction unit 3 sets the lower nodes N441 and N442 of the item of the habitat of the node N432 in FIG. 8B as the pattern PB as the information content of the node N432.
  • the hierarchy of the lower nodes N431 and N432 of the node N421 to which the item of habitat is assigned becomes one stage. Therefore, the node extraction unit 3 can extract the node N421 to which the item of habitat is assigned as a predetermined node by extracting the nodes whose lower node hierarchy is one stage or less.
  • the item of habitat that was not extracted from the hierarchical structure of FIG. 8B can also be subject to pattern classification by the classification unit 5.
  • FIG. 14 is a flowchart showing pattern classification and mathematical modeling processing based on the substructure according to the embodiment.
  • the node name N to be analyzed and the subordinate node name list of the node name N are acquired (S11).
  • the lower node vector v i is a vector that quantifies the correspondence between the lower node group actually associated with the node i having the node name N and the node information described in the lower node name list M of the node name N. ..
  • clustering descendants vector v i node K of the node name N extracted from all documents (K is a positive integer) are classified into pieces of a group (S13).
  • Any clustering method can be used for clustering.
  • the number of classifications can be determined in advance, classified by the K-means method, and the threshold value regarding the similarity between vectors can be arbitrarily set to perform hierarchical clustering.
  • the nodes belonging to the k-th cluster group as k pattern node of the node name N, imparts k N is the group id (S15).
  • FIG. 15 is a flowchart showing a specific example of the process of S18 in FIG.
  • k N lower node name number tied to a node of the group M (M is a positive integer) to acquire (S31).
  • the base mathematical model Y m exists (S34).
  • the base model of a node to which the item height is assigned can be a normal distribution.
  • the mathematical model Y m as a base calculates the presence probability p m kN (z) in the information content y s m of each element z, which may be stored in the lower node m (S36).
  • the information content y s m a plurality of elements z determines whether there simultaneously (S37).
  • the mathematical model Y m kN is calculated by taking the sum of z * p m kN (z) for those elements z (S38), and the process proceeds to S40. ..
  • the vector P m kN relating to all the elements z of the existence probability p m kN (z) is stored in the mathematical model Y m kN (S39).
  • FIG. 16 is a flowchart showing an example of node division processing based on the abstraction degree reset according to the embodiment.
  • the elements that can be stored in the lower node u are compared between the groups, and the elements of the target group are explained, but the target is the concept name of the maximum abstraction that does not include the elements of other groups.
  • the node name of the node u related to the group is reset (S53).
  • FIG. 17 is a flowchart showing another example of the node division process based on the abstraction degree reset according to the embodiment.
  • the node name in the list L to compare the information content y s m which is stored in the lower node m in the group k N, the node name lowest abstract of information content y s m encompassing the node name Reset to the node name of the lower node m (S64).
  • FIG. 18 is a flowchart showing still another example of the node division process based on the abstraction degree reset according to the embodiment.
  • the number X of the elements included in the data in which the information content y s m stored in the lower node m in the group k N is not 0 is calculated (S73). In the number X, if a plurality of elements exist in the information content y s m , all of them are added.
  • the number of elements X data set to obtain a threshold value for determining whether belonging to each mathematical model Y o (S75).
  • a group k N calculates the probability that information content y s m which is stored in the lower node m belongs in the lowest level of abstraction node of mathematical models Y o below the threshold Reset o to the node name of the lower node m (S76), and proceed to the process of S79.
  • the threshold value of the number of element types of the target concept as a reference for determining whether the number of elements belongs to a certain concept is acquired (S77). ..
  • the node o which is the lowest node among the nodes having the number of element types lower than the threshold value is the node of the lower node m. Reset to the first name (S78).
  • FIG. 19 is a block diagram showing a hardware configuration example of the information management device of FIG.
  • the information management device 101 includes a processor 11, a communication control device 12, a communication interface 13, a main storage device 14, and an external storage device 15.
  • the processor 11, the communication control device 12, the communication interface 13, the main storage device 14, and the external storage device 15 are connected to each other via the internal bus 16.
  • the main storage device 14 and the external storage device 15 are accessible from the processor 11.
  • an input device 20 and an output device 21 are provided outside the information management device 101.
  • the input device 20 and the output device 21 are connected to the internal bus 16 via the input / output interface 17.
  • the input device 20 is, for example, a keyboard, a mouse, a touch panel, a card reader, a voice input device, or the like.
  • the output device 21 is, for example, a screen display device (liquid crystal monitor, organic EL (Electro Luminescence) display, graphic card, etc.), an audio output device (speaker, etc.), a printing device, and the like.
  • the processor 11 is hardware that controls the operation of the entire information management device 101.
  • the processor 11 may be a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit).
  • the processor 11 may be a single core losser or a multi-core losser.
  • the processor 11 may include a hardware circuit (for example, FPGA (Field-Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)) that performs a part or all of the processing.
  • the processor 11 may include a neural network.
  • the main storage device 14 can be composed of, for example, a semiconductor memory such as SRAM or DRAM.
  • the main storage device 14 can store a program being executed by the processor 11 or provide a work area for the processor 11 to execute the program.
  • the external storage device 15 is a storage device having a large storage capacity, and is, for example, a hard disk device or an SSD (Solid State Drive).
  • the external storage device 15 can hold an executable file of various programs and data used for executing the program.
  • the information management program 15A can be stored in the external storage device 15.
  • the information management program 15A may be software that can be installed in the information management device 101, or may be incorporated as firmware in the information management device 101.
  • the communication control device 12 is hardware having a function of controlling communication with the outside.
  • the communication control device 12 is connected to the network 19 via the communication interface 13.
  • the network 19 may be a WAN (Wide Area Network) such as the Internet, a LAN (Local Area Network) such as WiFi or Ethernet (registered trademark), or a mixture of WAN and LAN. May be good.
  • the input / output interface 17 converts the data input from the input device 20 into a data format that can be processed by the processor 11, and converts the data output from the processor 11 into a data format that can be processed by the output device 21. ..
  • the processor 11 reads the information management program 15A into the main storage device 14 and executes the information management program 15A to extract a predetermined node from the hierarchical structure of the nodes assigned to the conceptualized information and associate it with the predetermined node.
  • a predetermined node can be classified based on the information of the subordinate node.
  • the processor 11 realizes the functions of the item extraction unit 1, the node candidate generation unit 2, the node extraction unit 3, the node integration unit 4, the classification unit 5, the modeling unit 6, and the node division unit 7 in FIG. Can be done.
  • the execution of the information management program 15A may be shared by a plurality of processors and computers.
  • the processor 11 may instruct a cloud computer or the like to execute all or a part of the information management program 15A via the network 19 and receive the execution result.
  • the present invention is not limited to the above-described embodiment, and includes various modifications.
  • the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to those having all the described configurations.
  • it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment and it is also possible to add the configuration of another embodiment to the configuration of one embodiment.
  • each of the above configurations, functions, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

Le but de la présente invention est de gérer hiérarchiquement des informations en reflétant la manière dont les informations sont utilisées. Une unité d'extraction d'éléments 1 extrait des éléments de documents D1-D4, et génère une structure hiérarchique de nœuds auxquels les éléments sont attribués. Une unité de génération de nœuds candidats 2 unifie les noms des éléments du même concept extraits des documents D1-D4 sur la base d'une analyse de morphèmes et d'une analyse de synonymes. Une unité d'extraction de nœuds 3 extrait des nœuds prescrits de la structure hiérarchique de nœuds. Une unité d'intégration de nœuds 4 intègre les niveaux d'abstraction des éléments du même concept ayant un niveau d'abstraction différent de nœuds inférieurs liés aux nœuds prescrits du même concept. Une unité de classification 5 classifie les nœuds prescrits sur la base des éléments des nœuds inférieurs liés aux nœuds prescrits. Une unité de modélisation 6 estime un modèle d'un contenu d'informations des nœuds inférieurs sur la base d'éléments du contenu d'informations des nœuds inférieurs liés aux nœuds prescrits. Une unité de division de nœuds 7 divise les éléments du même concept des nœuds inférieurs liés aux nœuds prescrits respectifs, classés dans des groupes différents, en éléments spécifiques uniques aux groupes respectifs.
PCT/JP2020/008353 2019-03-22 2020-02-28 Dispositif de gestion d'informations et procédé de gestion d'informations WO2020195545A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-054851 2019-03-22
JP2019054851A JP7099976B2 (ja) 2019-03-22 2019-03-22 情報管理装置および情報管理方法

Publications (1)

Publication Number Publication Date
WO2020195545A1 true WO2020195545A1 (fr) 2020-10-01

Family

ID=72559317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/008353 WO2020195545A1 (fr) 2019-03-22 2020-02-28 Dispositif de gestion d'informations et procédé de gestion d'informations

Country Status (2)

Country Link
JP (1) JP7099976B2 (fr)
WO (1) WO2020195545A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009136426A1 (fr) * 2008-05-08 2009-11-12 三菱電機株式会社 Equipement permettant d’effectuer une interrogation de recherche
JP2010501947A (ja) * 2006-08-31 2010-01-21 スウィーニー,ピーター 消費者定義の情報アーキテクチャ用のシステム、方法およびコンピュータプログラム
US20160062993A1 (en) * 2014-08-21 2016-03-03 Samsung Electronics Co., Ltd. Method and electronic device for classifying contents
JP2016139229A (ja) * 2015-01-27 2016-08-04 日本放送協会 個人プロファイル生成装置及びそのプログラム、並びに、コンテンツ推薦装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010501947A (ja) * 2006-08-31 2010-01-21 スウィーニー,ピーター 消費者定義の情報アーキテクチャ用のシステム、方法およびコンピュータプログラム
WO2009136426A1 (fr) * 2008-05-08 2009-11-12 三菱電機株式会社 Equipement permettant d’effectuer une interrogation de recherche
US20160062993A1 (en) * 2014-08-21 2016-03-03 Samsung Electronics Co., Ltd. Method and electronic device for classifying contents
JP2016139229A (ja) * 2015-01-27 2016-08-04 日本放送協会 個人プロファイル生成装置及びそのプログラム、並びに、コンテンツ推薦装置

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AOKI, CHIZURU ET AL.: "A legal ontology refinement environment using a general ontology and a case ontology", THE 27TH HUMAN INTERFACE AND COGNITIVE MODEL WORKSHOP MATERIAL, 25 March 1996 (1996-03-25), pages 9 - 16 *
ICHISE, RYUTARO ET AL.: "Instance-based hierarchical knowledge source integration", THE 11TH SPECIAL INTEREST GROUP ON AI CHALLENGE, 12 March 2001 (2001-03-12), pages 61 - 66 *
YAMAMOTO, KOUHEI ET AL.: "A hierarchical topic model for expanding category hierarchies , The 6th Forum on Data Engineering and Information Management", THE 12TH DBSJ ANNUAL, 3 May 2014 (2014-05-03), pages 1 - 8, Retrieved from the Internet <URL:http://db-event.jpn.org/deim2014/final/proceedings/C4-6.pdf> *

Also Published As

Publication number Publication date
JP2020154991A (ja) 2020-09-24
JP7099976B2 (ja) 2022-07-12

Similar Documents

Publication Publication Date Title
Dinh et al. Clustering mixed numerical and categorical data with missing values
Gupta et al. Scalable machine‐learning algorithms for big data analytics: a comprehensive review
Wang et al. Locating structural centers: A density-based clustering method for community detection
Alinezhad et al. Community detection in attributed networks considering both structural and attribute similarities: two mathematical programming approaches
Pan et al. Clustering of designers based on building information modeling event logs
CN102609528A (zh) 基于概率图模型的频繁模式关联分类方法
Laclavík et al. Emails as graph: relation discovery in email archive
Ye et al. A web services classification method based on GCN
Praveen et al. A novel approach to improve the performance of divisive clustering-BST
Lee et al. Learning multi-resolution representations of research patterns in bibliographic networks
Xiao et al. A survey of parallel clustering algorithms based on spark
Boden et al. MiMAG: mining coherent subgraphs in multi-layer graphs with edge labels
Wang et al. Link prediction in heterogeneous collaboration networks
Bernard et al. Contextual and behavioral customer journey discovery using a genetic approach
Jiang et al. A Chinese expert disambiguation method based on semi-supervised graph clustering
Levatić et al. Semi‐Supervised Predictive Clustering Trees for (Hierarchical) Multi‐Label Classification
Jiménez et al. A clustering approach to extract data from HTML tables
Nath Style change detection by threshold based and window merge clustering methods.
WO2020195545A1 (fr) Dispositif de gestion d&#39;informations et procédé de gestion d&#39;informations
Wang et al. Maximal sub-prevalent co-location patterns and efficient mining algorithms
JP2011003156A (ja) データ分類装置、データ分類方法及びデータ分類プログラム
CN110162580A (zh) 基于分布式预警平台的数据挖掘与深度分析方法及应用
Sun et al. Graph embedding with rich information through heterogeneous network
Zhu et al. Classification trees for Imbalanced Data: surface-to-volume regularization
JP2009176072A (ja) 要素集団抽出システム、要素集団抽出方法およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20779688

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20779688

Country of ref document: EP

Kind code of ref document: A1