WO2020195545A1 - Information management device and information management method - Google Patents

Information management device and information management method Download PDF

Info

Publication number
WO2020195545A1
WO2020195545A1 PCT/JP2020/008353 JP2020008353W WO2020195545A1 WO 2020195545 A1 WO2020195545 A1 WO 2020195545A1 JP 2020008353 W JP2020008353 W JP 2020008353W WO 2020195545 A1 WO2020195545 A1 WO 2020195545A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
item
nodes
information management
items
Prior art date
Application number
PCT/JP2020/008353
Other languages
French (fr)
Japanese (ja)
Inventor
真理奈 藤田
宏視 荒
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Publication of WO2020195545A1 publication Critical patent/WO2020195545A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Definitions

  • the present invention relates to an information management device and an information management method capable of managing information in a hierarchical manner.
  • the route pattern extraction unit specifies a route including a category including a concept selected by the comparison concept selection unit in the information classification hierarchy, and is higher than the concept of each category included in the route.
  • Information on how it relates to the concept of the category is set, and the concept of each category is abstracted except for the user-specified concept input in the input reception unit to generate a route pattern and category.
  • the generation unit generates candidate categories by replacing the concept of categories included in the route pattern so as to satisfy the above-set information, the control unit adds the candidate categories to the information classification hierarchy, and the output unit provides information.
  • a technique for outputting a classification hierarchy is disclosed.
  • the conventional information classification hierarchy was constructed by considering only the fluctuation of the document notation, and the usage of items was not considered. For this reason, even if the items are the same, the description contents may differ depending on the document, and it may take time and effort to acquire the necessary information.
  • the present invention has been made in view of the above circumstances, and an object of the present invention is to provide an information management device and an information management method capable of hierarchically managing information reflecting how it is used.
  • the information management device includes an extraction unit that extracts a predetermined node from the hierarchical structure of the nodes assigned to the conceptualized information, and the predetermined unit that is extracted by the extraction unit. It is provided with a classification unit that classifies the predetermined node extracted by the extraction unit based on the information of the lower node associated with the node.
  • information that reflects how it is used can be managed hierarchically.
  • FIG. 7 (b) is a diagram of the concept of the hierarchical structure node of FIG. 6 (b).
  • FIG. 7 (c) is a diagram showing an integrated example of the concept based on the semantic analysis
  • FIG. 7 (c) is a diagram showing an integrated example based on the semantic analysis of the concept of the node having a hierarchical structure based on the item of the document of FIG. 4 (b).
  • 8 (a) is a diagram showing an example of extracting a predetermined node to be integrated or divided of lower nodes for the hierarchical structure of FIG. 6 (b)
  • FIG. 8 (b) is an item of the document of FIG.
  • FIG. 9 (a) is a diagram showing an integration example based on the abstraction degree analysis of the concept of the hierarchical structure node of FIG. 7 (a), and FIG. 9 (b) is the concept of the hierarchical structure node of FIG. 8 (a).
  • 9 (c) is a diagram showing an integration example of the concept based on the abstraction degree analysis of FIG. 7 (c), and
  • FIG. 9 (d) is a diagram showing an integration example based on the abstraction degree analysis of the concept of the node of the hierarchical structure of FIG. 7 (c).
  • FIG. 10 is a diagram showing an example of a correspondence relationship between items and information contents extracted from the document of FIG.
  • FIG. 11 (a) is a diagram showing a classification example of patterns of how to associate lower nodes associated with the habitat of each document
  • FIG. 11 (b) is an example of a mathematical model of pattern P1 of FIG. 11 (a).
  • 11 (c) is a diagram showing an example of a mathematical model of pattern P2 of FIG. 11 (a)
  • FIG. 11 (d) is a diagram showing an example of a mathematical model of pattern P3 of FIG. 11 (a).
  • FIG. 14 is a flowchart showing pattern classification and mathematical modeling processing based on the substructure according to the embodiment.
  • FIG. 15 is a flowchart showing a specific example of the process of S18 in FIG.
  • FIG. 16 is a flowchart showing an example of node division processing based on the abstraction degree reset according to the embodiment.
  • FIG. 17 is a flowchart showing another example of the node division process based on the abstraction degree reset according to the embodiment.
  • FIG. 18 is a flowchart showing still another example of the node division process based on the abstraction degree reset according to the embodiment.
  • FIG. 19 is a block diagram showing a hardware configuration example of the information management device of FIG.
  • the information management device classifies information based on how the notation of the document is used. At this time, the information management device extracts a predetermined node from the hierarchical structure of the nodes assigned to the conceptualized information, and classifies the predetermined node based on the information of the lower node associated with the predetermined node.
  • Nodes are assigned, for example, document items.
  • the node may be assigned a document heading or a document title.
  • the node may be assigned an item name such as a form.
  • the information management device when the operating subject is described as " ⁇ part is”, the processor reads the program ⁇ part, loads it into the DRAM (Dynamic Random Access Memory), and then realizes the function of the ⁇ part. It shall mean that.
  • FIG. 1 is a block diagram showing a configuration example of the information management device according to the embodiment.
  • the information management device includes an item extraction unit 1, a node candidate generation unit 2, a node extraction unit 3, a node integration unit 4, a classification unit 5, a modeling unit 6, a node division unit 7, a thesaurus dictionary 8, and a conceptual model. 9 is provided.
  • the item extraction unit 1 extracts items from documents D1 to D4 ... And generates a hierarchical structure of nodes to which the items are assigned. At this time, the item extraction unit 1 uses the description of the documents D1 to D4 ... as it is as the item name attached to the node. Therefore, even if the item names given to the nodes have the same concept, the notation may vary.
  • the node candidate generation unit 2 unifies the names of the items of the same concept extracted from the documents D1 to D4 ... Based on the morphological analysis and the synonym analysis. At this time, the node candidate generation unit 2 can refer to the thesaurus dictionary 8. Further, the node candidate generation unit 2 modifies the hierarchical structure of the nodes based on the inclusion relationship of the words extracted from the documents D1 to D4. For example, when the concept of the lower node associated with the predetermined node is a modifier that is not included in the concept of the predetermined node, the node candidate generation unit 2 can aggregate the lower node into the predetermined node.
  • the node extraction unit 3 extracts a predetermined node from the hierarchical structure of the nodes. For example, the node extraction unit 3 can extract nodes having one or less levels of lower node hierarchy as predetermined nodes. By extracting the nodes whose lower node hierarchy is one level or less as the predetermined node, it is possible to facilitate the pattern classification based on the item of the lower node associated with the predetermined node.
  • the node integration unit 4 integrates the degree of abstraction of the items of the lower nodes associated with the predetermined node. At this time, the node integration unit 4 can refer to the conceptual model 9. As a result, even if the items are in the same hierarchy, the item name described in the upper concept and the item name described in the lower concept can be matched.
  • the classification unit 5 classifies the predetermined node based on the item of the lower node associated with the predetermined node. At this time, the classification unit 5 can classify the predetermined node based on the combination of the concepts of the lower nodes associated with the predetermined node. For example, the classification unit 5 can classify the first node and the second node into different groups when the concept of the lower node associated with the first node cannot be the concept of the lower node associated with the second node. .. As a result, even when the item name of the first node and the item name of the second node are the same, it can be determined that the item of the first node and the item of the second node are used differently.
  • the modeling unit 6 estimates a model of how to associate the lower node based on the information of the lower node associated with the predetermined node. At this time, the modeling unit 6 can generate a pattern of how the lower nodes are associated with the predetermined nodes classified in the same group by the classification unit 5. This pattern can indicate the degree of cohesion or the degree of variation of the lower nodes associated with the predetermined nodes assigned to the items extracted from the plurality of documents D1 to D4.
  • the modeling unit 6 can refer to the information content of the lower node associated with the predetermined node. For example, the modeling unit 6 can estimate a model of how the lower node is associated based on the element of the information content of the lower node associated with the predetermined node.
  • the element of the information content of the lower node is, for example, a word included in the information content of the lower node.
  • the model of how to associate the information of the lower node may be constructed based on the amount of information of the information content of the lower node, or may be constructed based on the similarity of the elements of the information content of the lower node.
  • the node division unit 7 divides the items of the lower nodes associated with the predetermined nodes classified into different groups into specific items specific to each group, and outputs the hierarchical structure of the nodes for each group. At this time, the node dividing unit 7 can divide the items of the lower node associated with the predetermined node based on the model estimated by the modeling unit 6. As a result, even when items of the same concept are extracted from documents D1 to D4 ..., the item names can be different according to the difference in how these items are used, and the difference in how the items are used. Can be realized in the search that reflects.
  • the thesaurus dictionary 8 is a dictionary that classifies words based on the similarity of meanings.
  • the conceptual model 9 is a model showing the vertical relationship between concepts. At this time, the upper layer can have a higher degree of abstraction than the lower layer. As the conceptual model 9, for example, an ontology can be used.
  • FIG. 2 is a diagram showing an example of the document of FIG.
  • document D1 is given the title of anemone fish ecology.
  • Document D1 includes items such as habitat, breeding method, foraging method, gender and survival time.
  • Habitat items include water quality, depth, temperature, symbiosis and habitat areas.
  • the item of water quality includes the information content of seawater.
  • the item “water depth” includes the information content of 20-40 m.
  • the item “temperature” includes the information content of 24 degrees.
  • the item of symbiosis includes the information content of sea anemones.
  • the item Habitat includes information about the Indo- Pacific and near the equator.
  • FIG. 3 is a diagram showing another example of the document of FIG.
  • document D2 is given the title of Dobiuo ecology.
  • Document D2 includes items such as habitat, breeding method, feeding method, gender and longevity.
  • the item habitat includes items such as water quality, depth, temperature and habitat.
  • the item of water quality includes the information content of seawater.
  • the item "water depth” includes the information content of 1 m.
  • the habitat section includes information about the Pacific Ocean, Indian Ocean, and Atlantic Ocean.
  • FIG. 4 is a diagram showing still another example of the document of FIG.
  • document D3 is given the title of panda ecology.
  • Document D3 includes items such as morphology, habitat, breeding method, foraging method and longevity.
  • the item morphology includes the items size, hair and bark.
  • the item of size includes the items of total length and weight.
  • the item barking includes the items male and female.
  • the item habitat includes the item country name and habitat.
  • the item habitat includes the items temperate and bamboo grove.
  • the item “weight” includes the information content “kg”.
  • the item “Osu” includes the information content “Meow Meow”.
  • the item “female” includes the information content “myanmyan”.
  • the item “country name” includes the information content “China”
  • FIG. 5 is a diagram showing still another example of the document of FIG.
  • document D4 is given the title of lion ecology.
  • Document D4 includes items such as morphology, habitat, breeding method, foraging method, social system and longevity.
  • the item morphology includes the items size, hair and bark.
  • the item of size includes the items of total length and weight.
  • the item habitat includes the item country name and habitat.
  • the item Habitat includes the items Subtropical and Grassland.
  • the item "country name” includes the information content of Africa.
  • FIG. 6A is a diagram showing a hierarchical structure of nodes to which the document items of FIG. 2 are assigned
  • FIG. 6B is a diagram showing a hierarchical structure of nodes to which the document items of FIG. 3 are assigned. is there.
  • the item extraction unit 1 extracts titles and items from the document D1 of FIG. Then, the item extraction unit 1 assigns the node N111 to the title of the ecology of anemone fish.
  • the item extraction unit 1 assigns nodes N121 to N125 to the items of habitat, breeding method, foraging method, gender and survival time, respectively.
  • the item extraction unit 1 assigns nodes N131 to N135 to the items of water quality, water depth, temperature, symbiosis, and habitat, respectively.
  • the item extraction unit 1 associates the nodes N121 to N125 with the node N111, and associates the nodes N131 to N135 with the node N121.
  • the item extraction unit 1 of FIG. 1 extracts a title and an item from the document D2 of FIG. Then, the item extraction unit 1 assigns the node N211 to the title of the ecology of flying fish.
  • the item extraction unit 1 assigns nodes N221 to N225 to the items of habitat, breeding method, feeding method, gender and longevity, respectively.
  • the item extraction unit 1 assigns nodes N231 to N234 to the items of water quality, water depth, temperature, and habitat, respectively.
  • the item extraction unit 1 associates the nodes N221 to N225 with the node N211 and associates the nodes N231 to N234 with the node N221.
  • the item of the foraging method of the node N123 in FIG. 6A and the item of the inoculation method of the node N223 in FIG. 6B have the same concept, but the item extraction unit 1 describes the document D1. , D2 notation is used as it is. Further, the item of the survival time of the node N125 in FIG. 6A and the item of the lifespan of the node N225 in FIG. 6B have the same concept, but the item extraction unit 1 describes the documents D1 and D2. Is used as it is.
  • FIG. 7 (a) is a diagram showing an integration example based on the semantic analysis of the concept of the hierarchical structure node of FIG. 6 (a), and FIG. 7 (b) is a diagram of the concept of the hierarchical structure node of FIG. 6 (b).
  • FIG. 7 (c) is a diagram showing an integrated example of the concept based on the semantic analysis
  • FIG. 7 (c) is a diagram showing an integrated example based on the semantic analysis of the concept of the node having a hierarchical structure based on the item of the document of FIG. 4 (b).
  • FIG. 7A the node candidate generation unit 2 of FIG.
  • the node candidate generation unit 2 changes the item of the survival period of the node N125 to the item of the lifespan based on the synonym analysis.
  • the node candidate generation unit 2 extracts the item "ecology” from the title of the flying fish ecology of the node N211 based on the morphological analysis, and changes the name of the node N211 to the item "ecology”. Further, the node candidate generation unit 2 changes the item of the feeding method of the node N223 to the item of the breeding method based on the synonym analysis.
  • the node candidate generation unit 2 can integrate the notations of the items of the same concept even when the notations of the items of the same concept are different in the documents D1 and D2.
  • the item extraction unit 1 of FIG. 1 extracts a title and an item from the document D3 of FIG. Then, the item extraction unit 1 assigns the node N311 to the title of panda ecology.
  • the item extraction unit 1 assigns nodes N321 to N325 to the items of habitat, morphology, foraging method, breeding method, and lifespan, respectively.
  • the item extraction unit 1 assigns nodes N331 to N335 to the items of country name, habitat, size, hair, and bark, respectively.
  • the item extraction unit 1 assigns nodes N341 to N346 to the items of temperate zone, bamboo grove, total length, weight, male and female, respectively.
  • the item extraction unit 1 associates nodes N321 to N325 with node N311, associates nodes N331 and N322 with node N321, associates nodes N333 to N335 with node N322, and associates nodes N341 and N342 with node N332.
  • Nodes N343 and N344 are associated with node N333, and nodes N345 and N346 are associated with node N335.
  • the item extraction unit 1 sets a temporary item X1 of the upper concept of the temperate zone for the item of the temperate zone of the node N341, and sets a temporary item X2 of the higher concept of the bamboo forest for the item of the bamboo forest of the node N342. Can be set.
  • the node candidate generation unit 2 extracts the item "ecology" from the title of panda ecology of node N311 based on the morphological analysis, and changes the name of node N311 to the item "ecology". Further, the node candidate generation unit 2 determines whether or not the concept of the item of the node N345 and the concept of the item of the node N346 are included in the concept of the item of the bark of the node N335. Further, the node candidate generation unit 2 determines whether or not the information content of the node N345 as meow and the information content of the node N346 as myanmyan are included in the concept of the item of the bark of the node N335.
  • the node candidate generation unit 2 although the concept of the item of the node N345 and the concept of the item of the node N346 are not included in the concept of the item of the bark of the node N335, the information content of the node N345 and the information content of the node N345 are not included.
  • the information content of node N346 is included in the concept of the item of barking of node N335, the item of node N345 and the item of node N346 are judged to be mere modifiers, and nodes N345 and N346 are noded. Consolidate to N335.
  • FIG. 8 (a) is a diagram showing an example of extracting a predetermined node to be integrated or divided of lower nodes for the hierarchical structure of FIG. 6 (b), and FIG. 8 (b) is an item of the document of FIG. It is a figure which shows the extraction example of the predetermined node which is the target of integration or division of the lower node about the hierarchical structure based on.
  • FIG. 8A when the node candidate generation unit 2 generates the hierarchical structure of the node of FIG. 7B, the node extraction unit 3 of FIG. 1 has a lower layer structure of one stage from the lower nodes N231 to N231 to The node N221 associated with N234 is extracted.
  • the item extraction unit 1 of FIG. 1 extracts titles and items from the document D4 of FIG. Then, the item extraction unit 1 assigns the node N411 to the title of lion's ecology.
  • the item extraction unit 1 assigns nodes N421 to N426 to the items of habitat, morphology, foraging method, breeding method, longevity and social system, respectively.
  • the item extraction unit 1 assigns nodes N431 to N435 to the items of country name, habitat, size, hair, and bark, respectively.
  • the item extraction unit 1 assigns nodes N441 to N444 to the items of subtropical, grassland, total length, and weight, respectively.
  • the item extraction unit 1 associates nodes N421 to N426 with node N411, associates nodes N431 and N422 with node N421, associates nodes N433 to N435 with node N422, and associates nodes N441 and N442 with node N432.
  • Nodes N443 and N444 are associated with node N433.
  • the item extraction unit 1 sets a temporary item Y1 of the subtropical superordinate concept for the item of the subtropical node N441, and sets a temporary item Y2 of the superordinate concept of the grassland for the item of the grassland of the node N442. Can be set.
  • the node extraction unit 3 can extract the nodes N432 associated with the lower nodes N441 and N442 having one lower layer structure. it can.
  • the node extraction unit 3 may extract the node N433 associated with the lower nodes N443 and N444 whose lower layer structure is one stage.
  • 9 (a) is a diagram showing an integration example based on the abstraction degree analysis of the concept of the hierarchical structure node of FIG. 7 (a), and FIG. 9 (b) is the concept of the hierarchical structure node of FIG. 8 (a).
  • 9 (c) is a diagram showing an integration example of the concept based on the abstraction degree analysis of FIG. 9 (c), and FIG. Is a diagram showing an integration example based on the abstraction degree analysis of the concept of the node of the hierarchical structure shown in FIG. 8B.
  • the node extraction unit 3 of FIG. 1 extracts the node N121 associated with the lower nodes N131 to N135 having the lower layer structure of one stage from the hierarchical structure of the nodes of FIG. 7A. To do. Further, in FIG. 9B, it is assumed that the node extraction unit 3 extracts the node N221 associated with the lower nodes N231 to N234 having the lower layer structure in one stage from the hierarchical structure of the nodes in FIG. 7B. ..
  • the node integration unit 4 integrates the item of the habitat area of the lower node N135 in FIG. 9 (a) into the item of the habitat area based on the abstraction degree analysis.
  • the item name of the lower node N135 in FIG. 9A can be matched with the item name of the lower node N234 in FIG. 9B, and the fluctuation of the notation of the lower node can be eliminated.
  • the node extraction unit 3 extracts the node N332 associated with the lower nodes N341 to N342 having the lower layer structure in one stage from the hierarchical structure of the nodes in FIG. 7C. ..
  • the node extraction unit 3 extracts the node N432 associated with the lower nodes N441 to N442 having the lower layer structure in one stage from the hierarchical structure of the nodes in FIG. 8B. ..
  • the node integration unit 4 integrates the temporary item X1 of the lower node N341 into the item of climate and the temporary item X2 of the lower node N342 into the item of vegetation based on the abstraction degree analysis. Further, the node integration unit 4 integrates the temporary item Y1 of the lower node N441 into the item of climate and the temporary item Y2 of the lower node N442 into the item of vegetation based on the abstraction degree analysis.
  • the item names of the lower nodes N341 and N342 in FIG. 9C can be matched with the item names of the lower nodes N441 and N442 in FIG. 9B, respectively, and the fluctuation of the notation of the lower node can be eliminated. can do.
  • FIG. 10 is a diagram showing an example of a correspondence relationship between items and information contents extracted from the document of FIG.
  • the subordinate concepts of seawater, steam water, and freshwater are associated with the superordinate concept of water quality
  • the subordinate concepts of the Indo-Pacific, the sea area near the equator, the Indian Ocean, the Pacific Ocean, and the East Asian rivers are referred to as habitats.
  • the node integration unit 4 can integrate the item names of the concepts of the lower nodes having different abstractions by referring to the concept model 9.
  • document D1 of FIG. 2 describes the item of habitat for the information content of the Indo- Pacific and the vicinity of the equator.
  • the item of habitat is associated with the information content of the Indo-Pacific and the sea area near the equator. Therefore, the node integration unit 4 can integrate the item of the habitat area of the lower node N135 of FIG. 9A into the item of the habitat area by referring to the conceptual model 9 of FIG.
  • the item of climate is associated with the information content of temperate zone and subtropical zone
  • the item of vegetation is associated with the information content of grassland and bamboo grove. Therefore, the node integration unit 4 integrates the provisional items X1 and Y1 of the lower nodes N341 and N441 of FIGS. 9 (c) and 9 (d) into the item of climate by referring to the conceptual model 9 of FIG. Then, the provisional items X2 and Y2 of the lower nodes N342 and N442 can be integrated into the item called vegetation.
  • FIG. 11 (a) is a diagram showing a classification example of patterns of how to associate lower nodes associated with the habitat of each document
  • FIG. 11 (b) is an example of a mathematical model of pattern P1 of FIG. 11 (a).
  • 11 (c) is a diagram showing an example of a mathematical model of pattern P2 of FIG. 11 (a)
  • FIG. 11 (d) is a diagram showing an example of a mathematical model of pattern P3 of FIG. 11 (a). Is.
  • the node extraction unit 3 of FIG. 1 assumes that the item of habitat is extracted as a predetermined node. ..
  • the document on the ecology of bear flies there are water quality, water depth, temperature, habitat and symbiosis as items of lower nodes linked to the item of habitat, and for the document on the ecology of Tobiuo, the item of habitat.
  • the items of the subordinate nodes associated with it are water quality, water depth, temperature and habitat, and for the document on the ecology of dolphins, the items of the subordinate nodes associated with the item of habitat are water quality, water depth and temperature. ..
  • the classification unit 5 of FIG. 1 classifies the item of habitat in each document on the ecology of anemone fish, flying fish, dolphin, sweetfish, medaka, panda and lion based on the item of the lower node associated with the item of habitat. To do.
  • the classification unit 5 can use, for example, the distance between the vectors when the items of the lower nodes of each document are vectorized as an index for classifying the item of habitat in each document.
  • the classification unit 5 can generate a vector to which a component of 1 or 0 is added depending on the presence or absence of the item of the lower node. For example, the classification unit 5 generates a vector (1,1,1,1,1,0,0,0) for bear flies and (1,1,1,1,0,0) for tobiuo. , 0,0) is generated, for dolphins, the vector (1,1,1,0,0,0,0) is generated, and for sweetfish, (1,0,0,1) is generated. , 0,1,0,0), for medaka, (1,0,0,0,0,1,0,0), for pandas and lions, (0) , 0,0,0,0,0,1,1) is generated.
  • the distance between the vectors is 1 or 2.
  • the distance between the vectors is 1.
  • the distance between the vectors is zero.
  • Clownfish, flying fish and dolphins are more than three distances from sweetfish and killifish.
  • Clownfish, flying fish and dolphins are more than five distances from pandas and lions. Ayu and medaka are at least 4 distances from pandas and lions.
  • the classification unit 5 classifies the item of the habitat associated with the lower nodes whose distance between the vectors is smaller than 3 into the same group, and the vectors.
  • the item of habitat associated with lower nodes with a distance of 3 or more can be classified into another group.
  • the classification unit 5 classifies the first node and the second node into different groups when the concept of the lower node associated with the first node cannot be the concept of the lower node associated with the second node. May be good.
  • the item climate and Vegetation for the ecology of pandas and lions cannot be the item for the ecology of anemone fish, flying fish, dolphins, sweetfish and medaka. Therefore, the item of habitat for pandas and lions can be classified into a different group from the item of habitat for clownfish, flying fish, dolphins, sweetfish and medaka.
  • the item of flow velocity for the ecology of sweetfish and medaka cannot be the item for the ecology of anemone fish, flying fish and dolphin. Therefore, the item of habitat for sweetfish and medaka can be classified into a different group from the item of habitat for anemone fish, flying fish and dolphin.
  • the modeling unit 6 has a habitat pattern P1 showing how to link to the item habitat for anemone fish, flying fish and dolphin, and a habitat pattern P2 showing how to tie to the item habitat for sweetfish and medaka. And generate a habitat pattern P3 that shows how to tie to the item habitat for pandas and lions.
  • the modeling unit 6 can estimate the mathematical model for each habitat pattern P1 to P3 based on the information of the lower nodes associated with each habitat pattern P1 to P3.
  • the mathematical model of each habitat pattern P1 to P3 for example, the existence probability of the subordinate items, the cohesiveness of the subordinate nodes of each habitat pattern P1 to P3, or the distribution model of the information associated with each subordinate item can be used.
  • the information associated with the subordinate items items or information contents further subordinate to the subordinate items can be used.
  • the degree of cohesion of the lower nodes can be calculated based on the variance of the existence probability of the lower items for each of the habitat patterns P1 to P3.
  • the degree of cohesion of the lower nodes may be obtained based on the average distance from the representative vector of the vectors belonging to each habitat pattern P1 to P3.
  • the existence probabilities of the items of water quality, water depth, temperature, habitat and symbiosis are 1.0, 1.0, 1.0 and 0, respectively. 67, 0.33.
  • the cohesiveness of the habitat pattern P1 is 0.45.
  • the existence probabilities of the items of water quality, habitat area and flow velocity in the habitat pattern P2 are 1.0, 0.5 and 1.0, respectively.
  • the degree of cohesion for the habitat pattern P2 is 0.7.
  • a distribution model of (East Asian river) (1.0) is generated. can do.
  • the existence probabilities of the items of climate and vegetation in the habitat pattern P3 are 1.0 and 1.0, respectively.
  • the degree of cohesion for the habitat pattern P3 is 1.0.
  • FIG. 12A shows an example of dividing the lower node based on the pattern P1 of FIG. 11B
  • FIG. 12B shows an example of dividing the lower node based on the pattern P2 of FIG. 11C
  • FIG. 12 (c) is a diagram showing an example of division of lower nodes based on the pattern P3 of FIG. 11 (d).
  • the node division portion 7 of FIG. 1 divides the items of the lower nodes associated with the habitat pattern P1 into specific items specific to the habitat pattern P1.
  • the node division 7 changes the item of habitat pattern P1 habitat to the item of sea area when only the information content representing the sea area such as the Pacific Ocean and the Indian Ocean appears in the item of habitat pattern P1 habitat. To do.
  • the node division 7 maintains the item of climate of habitat pattern P3 as it is in the item of climate of habitat pattern P3 when there is no bias in the information content embodying the climate. .. Further, the node dividing unit 7 maintains the item of vegetation of habitat pattern P3 as it is in the item of vegetation of habitat pattern P3 when there is no bias in the information content embodying the vegetation.
  • the habitat pattern P3 Can be referred to.
  • FIG. 13 (a) is a diagram showing another extraction example of a predetermined node to be integrated or divided of lower nodes based on the hierarchical structure of FIG. 7 (a), and FIG. 13 (b) is FIG. 8 (b). It is a figure which shows the other extraction example of the predetermined node which is the target of integration or division of the lower node based on the hierarchical structure of.
  • the node extraction unit 3 of FIG. 1 extracts a predetermined node from the hierarchical structure of the node reflecting the processing result of the classification unit 5 of FIG.
  • the node extraction unit 3 sets the lower nodes N131 to N135 of the item of the habitat of the node N121 in FIG. 7A as the pattern PA as the information content of the node N121.
  • the hierarchy of the lower nodes N121 to N125 of the node N111 to which the item of ecology is assigned becomes one stage. Therefore, the node extraction unit 3 can extract the node N111 to which the item of ecology is assigned as a predetermined node by extracting the nodes whose lower node hierarchy is one stage or less.
  • the item of ecology that was not extracted from the hierarchical structure of FIG. 7A can also be subject to pattern classification by the classification unit 5.
  • the node extraction unit 3 sets the lower nodes N441 and N442 of the item of the habitat of the node N432 in FIG. 8B as the pattern PB as the information content of the node N432.
  • the hierarchy of the lower nodes N431 and N432 of the node N421 to which the item of habitat is assigned becomes one stage. Therefore, the node extraction unit 3 can extract the node N421 to which the item of habitat is assigned as a predetermined node by extracting the nodes whose lower node hierarchy is one stage or less.
  • the item of habitat that was not extracted from the hierarchical structure of FIG. 8B can also be subject to pattern classification by the classification unit 5.
  • FIG. 14 is a flowchart showing pattern classification and mathematical modeling processing based on the substructure according to the embodiment.
  • the node name N to be analyzed and the subordinate node name list of the node name N are acquired (S11).
  • the lower node vector v i is a vector that quantifies the correspondence between the lower node group actually associated with the node i having the node name N and the node information described in the lower node name list M of the node name N. ..
  • clustering descendants vector v i node K of the node name N extracted from all documents (K is a positive integer) are classified into pieces of a group (S13).
  • Any clustering method can be used for clustering.
  • the number of classifications can be determined in advance, classified by the K-means method, and the threshold value regarding the similarity between vectors can be arbitrarily set to perform hierarchical clustering.
  • the nodes belonging to the k-th cluster group as k pattern node of the node name N, imparts k N is the group id (S15).
  • FIG. 15 is a flowchart showing a specific example of the process of S18 in FIG.
  • k N lower node name number tied to a node of the group M (M is a positive integer) to acquire (S31).
  • the base mathematical model Y m exists (S34).
  • the base model of a node to which the item height is assigned can be a normal distribution.
  • the mathematical model Y m as a base calculates the presence probability p m kN (z) in the information content y s m of each element z, which may be stored in the lower node m (S36).
  • the information content y s m a plurality of elements z determines whether there simultaneously (S37).
  • the mathematical model Y m kN is calculated by taking the sum of z * p m kN (z) for those elements z (S38), and the process proceeds to S40. ..
  • the vector P m kN relating to all the elements z of the existence probability p m kN (z) is stored in the mathematical model Y m kN (S39).
  • FIG. 16 is a flowchart showing an example of node division processing based on the abstraction degree reset according to the embodiment.
  • the elements that can be stored in the lower node u are compared between the groups, and the elements of the target group are explained, but the target is the concept name of the maximum abstraction that does not include the elements of other groups.
  • the node name of the node u related to the group is reset (S53).
  • FIG. 17 is a flowchart showing another example of the node division process based on the abstraction degree reset according to the embodiment.
  • the node name in the list L to compare the information content y s m which is stored in the lower node m in the group k N, the node name lowest abstract of information content y s m encompassing the node name Reset to the node name of the lower node m (S64).
  • FIG. 18 is a flowchart showing still another example of the node division process based on the abstraction degree reset according to the embodiment.
  • the number X of the elements included in the data in which the information content y s m stored in the lower node m in the group k N is not 0 is calculated (S73). In the number X, if a plurality of elements exist in the information content y s m , all of them are added.
  • the number of elements X data set to obtain a threshold value for determining whether belonging to each mathematical model Y o (S75).
  • a group k N calculates the probability that information content y s m which is stored in the lower node m belongs in the lowest level of abstraction node of mathematical models Y o below the threshold Reset o to the node name of the lower node m (S76), and proceed to the process of S79.
  • the threshold value of the number of element types of the target concept as a reference for determining whether the number of elements belongs to a certain concept is acquired (S77). ..
  • the node o which is the lowest node among the nodes having the number of element types lower than the threshold value is the node of the lower node m. Reset to the first name (S78).
  • FIG. 19 is a block diagram showing a hardware configuration example of the information management device of FIG.
  • the information management device 101 includes a processor 11, a communication control device 12, a communication interface 13, a main storage device 14, and an external storage device 15.
  • the processor 11, the communication control device 12, the communication interface 13, the main storage device 14, and the external storage device 15 are connected to each other via the internal bus 16.
  • the main storage device 14 and the external storage device 15 are accessible from the processor 11.
  • an input device 20 and an output device 21 are provided outside the information management device 101.
  • the input device 20 and the output device 21 are connected to the internal bus 16 via the input / output interface 17.
  • the input device 20 is, for example, a keyboard, a mouse, a touch panel, a card reader, a voice input device, or the like.
  • the output device 21 is, for example, a screen display device (liquid crystal monitor, organic EL (Electro Luminescence) display, graphic card, etc.), an audio output device (speaker, etc.), a printing device, and the like.
  • the processor 11 is hardware that controls the operation of the entire information management device 101.
  • the processor 11 may be a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit).
  • the processor 11 may be a single core losser or a multi-core losser.
  • the processor 11 may include a hardware circuit (for example, FPGA (Field-Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)) that performs a part or all of the processing.
  • the processor 11 may include a neural network.
  • the main storage device 14 can be composed of, for example, a semiconductor memory such as SRAM or DRAM.
  • the main storage device 14 can store a program being executed by the processor 11 or provide a work area for the processor 11 to execute the program.
  • the external storage device 15 is a storage device having a large storage capacity, and is, for example, a hard disk device or an SSD (Solid State Drive).
  • the external storage device 15 can hold an executable file of various programs and data used for executing the program.
  • the information management program 15A can be stored in the external storage device 15.
  • the information management program 15A may be software that can be installed in the information management device 101, or may be incorporated as firmware in the information management device 101.
  • the communication control device 12 is hardware having a function of controlling communication with the outside.
  • the communication control device 12 is connected to the network 19 via the communication interface 13.
  • the network 19 may be a WAN (Wide Area Network) such as the Internet, a LAN (Local Area Network) such as WiFi or Ethernet (registered trademark), or a mixture of WAN and LAN. May be good.
  • the input / output interface 17 converts the data input from the input device 20 into a data format that can be processed by the processor 11, and converts the data output from the processor 11 into a data format that can be processed by the output device 21. ..
  • the processor 11 reads the information management program 15A into the main storage device 14 and executes the information management program 15A to extract a predetermined node from the hierarchical structure of the nodes assigned to the conceptualized information and associate it with the predetermined node.
  • a predetermined node can be classified based on the information of the subordinate node.
  • the processor 11 realizes the functions of the item extraction unit 1, the node candidate generation unit 2, the node extraction unit 3, the node integration unit 4, the classification unit 5, the modeling unit 6, and the node division unit 7 in FIG. Can be done.
  • the execution of the information management program 15A may be shared by a plurality of processors and computers.
  • the processor 11 may instruct a cloud computer or the like to execute all or a part of the information management program 15A via the network 19 and receive the execution result.
  • the present invention is not limited to the above-described embodiment, and includes various modifications.
  • the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to those having all the described configurations.
  • it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment and it is also possible to add the configuration of another embodiment to the configuration of one embodiment.
  • each of the above configurations, functions, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The purpose of the present invention is to hierarchically manage information reflecting the manner in which the information used. An item extraction unit 1 extract items from documents D1-D4, and generates a hierarchical structure of nodes to which the items are assigned. A node candidate generation unit 2 unifies the names of the items of the same concept extracted from the documents D1-D4 on the basis of morpheme analysis and synonym analysis. A node extraction unit 3 extracts prescribed nodes from the node hierarchical structure. A node integration unit 4 integrates the abstraction levels of the items of the same concept different in abstraction level of lower nodes tied to the prescribed nodes of the same concept. A classification unit 5 classifies the prescribed nodes on the basis of the items of the lower nodes tied to the prescribed nodes. A modeling unit 6 estimates a model of an information content of the lower nodes on the basis of elements of the information content of the lower nodes tied to the prescribed nodes. A node division unit 7 divides the items of the same concept of the lower nodes tied to the respective prescribed nodes classified into different groups, into specific items unique to the respective groups.

Description

情報管理装置および情報管理方法Information management device and information management method
 本発明は、情報を階層化して管理可能な情報管理装置および情報管理方法に関する。 The present invention relates to an information management device and an information management method capable of managing information in a hierarchical manner.
 利用者が必要な情報を取得し易くするために、情報を階層的に分類する技術が提案されている。 In order to make it easier for users to obtain the necessary information, a technology for classifying information hierarchically has been proposed.
 例えば、特許文献1には、経路パタン抽出部は、比較概念選択部で選択された概念を含むカテゴリを含む経路を情報分類階層において特定し、当該経路に含まれる各カテゴリの概念に対し、上位のカテゴリの概念とどのような関係にあるかの情報を設定し、当該各カテゴリの概念を、入力受付部で入力されたユーザ指定概念を除き、抽象化することにより経路パタンを生成し、カテゴリ生成部は、経路パタンに含まれるカテゴリの概念を、上記設定した情報を満たすように置換することで、候補カテゴリを生成し、制御部は候補カテゴリを情報分類階層に追加し、出力部は情報分類階層を出力する技術が開示されている。 For example, in Patent Document 1, the route pattern extraction unit specifies a route including a category including a concept selected by the comparison concept selection unit in the information classification hierarchy, and is higher than the concept of each category included in the route. Information on how it relates to the concept of the category is set, and the concept of each category is abstracted except for the user-specified concept input in the input reception unit to generate a route pattern and category. The generation unit generates candidate categories by replacing the concept of categories included in the route pattern so as to satisfy the above-set information, the control unit adds the candidate categories to the information classification hierarchy, and the output unit provides information. A technique for outputting a classification hierarchy is disclosed.
特開2012-43212号公報Japanese Unexamined Patent Publication No. 2012-43212
 しかしながら、従来の情報分類階層は、ドキュメントの表記の揺れのみを考慮して構築され、項目の使われ方は考慮されていなかった。このため、同じ項目であっても、ドキュメントによっては記載内容が異なることがあり、必要な情報の取得に手間がかかることがあった。 However, the conventional information classification hierarchy was constructed by considering only the fluctuation of the document notation, and the usage of items was not considered. For this reason, even if the items are the same, the description contents may differ depending on the document, and it may take time and effort to acquire the necessary information.
 本発明は、上記事情に鑑みなされたものであり、その目的は、使われ方が反映された情報を階層的に管理可能な情報管理装置および情報管理方法を提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an information management device and an information management method capable of hierarchically managing information reflecting how it is used.
 上記目的を達成するため、第1の観点に係る情報管理装置は、概念化された情報に割り当てられたノードの階層構造から所定ノードを抽出する抽出部と、前記抽出部にて抽出された前記所定ノードに紐付けられた下位ノードの情報に基づいて、前記抽出部で抽出された前記所定ノードを分類する分類部とを備える。 In order to achieve the above object, the information management device according to the first aspect includes an extraction unit that extracts a predetermined node from the hierarchical structure of the nodes assigned to the conceptualized information, and the predetermined unit that is extracted by the extraction unit. It is provided with a classification unit that classifies the predetermined node extracted by the extraction unit based on the information of the lower node associated with the node.
 本発明によれば、使われ方が反映された情報を階層的に管理することができる。 According to the present invention, information that reflects how it is used can be managed hierarchically.
図1は、実施形態に係る情報管理装置の構成例を示すブロック図である。FIG. 1 is a block diagram showing a configuration example of the information management device according to the embodiment. 図2は、図1のドキュメントの一例を示す図である。FIG. 2 is a diagram showing an example of the document of FIG. 図3は、図1のドキュメントのその他の例を示す図である。FIG. 3 is a diagram showing another example of the document of FIG. 図4は、図1のドキュメントのさらにその他の例を示す図である。FIG. 4 is a diagram showing still another example of the document of FIG. 図5は、図1のドキュメントのさらにその他の例を示す図である。FIG. 5 is a diagram showing still another example of the document of FIG. 図6(a)は、図2のドキュメントの項目が割り当てられたノードの階層構造を示す図、図6(b)は、図3のドキュメントの項目が割り当てられたノードの階層構造を示す図である。FIG. 6A is a diagram showing a hierarchical structure of nodes to which the document items of FIG. 2 are assigned, and FIG. 6B is a diagram showing a hierarchical structure of nodes to which the document items of FIG. 3 are assigned. is there. 図7(a)は、図6(a)の階層構造のノードの概念の意味解析に基づく統合例を示す図、図7(b)は、図6(b)の階層構造のノードの概念の意味解析に基づく概念の統合例を示す図、図7(c)は、図4(b)のドキュメントの項目に基づく階層構造のノードの概念の意味解析に基づく統合例を示す図である。FIG. 7 (a) is a diagram showing an integration example based on the semantic analysis of the concept of the hierarchical structure node of FIG. 6 (a), and FIG. 7 (b) is a diagram of the concept of the hierarchical structure node of FIG. 6 (b). FIG. 7 (c) is a diagram showing an integrated example of the concept based on the semantic analysis, and FIG. 7 (c) is a diagram showing an integrated example based on the semantic analysis of the concept of the node having a hierarchical structure based on the item of the document of FIG. 4 (b). 図8(a)は、図6(b)の階層構造についての下位ノードの統合または分割の対象となる所定ノードの抽出例を示す図、図8(b)は、図5のドキュメントの項目に基づく階層構造についての下位ノードの統合または分割の対象となる所定ノードの抽出例を示す図である。8 (a) is a diagram showing an example of extracting a predetermined node to be integrated or divided of lower nodes for the hierarchical structure of FIG. 6 (b), and FIG. 8 (b) is an item of the document of FIG. It is a figure which shows the extraction example of the predetermined node which is the target of integration or division of the lower node about the hierarchical structure based on. 図9(a)は、図7(a)の階層構造のノードの概念の抽象度解析に基づく統合例を示す図、図9(b)は、図8(a)の階層構造のノードの概念の抽象度解析に基づく概念の統合例を示す図、図9(c)は、図7(c)の階層構造のノードの概念の抽象度解析に基づく統合例を示す図、図9(d)は、図8(b)の階層構造のノードの概念の抽象度解析に基づく統合例を示す図である。9 (a) is a diagram showing an integration example based on the abstraction degree analysis of the concept of the hierarchical structure node of FIG. 7 (a), and FIG. 9 (b) is the concept of the hierarchical structure node of FIG. 8 (a). 9 (c) is a diagram showing an integration example of the concept based on the abstraction degree analysis of FIG. 7 (c), and FIG. 9 (d) is a diagram showing an integration example based on the abstraction degree analysis of the concept of the node of the hierarchical structure of FIG. 7 (c). Is a diagram showing an integration example based on the abstraction degree analysis of the concept of the node of the hierarchical structure shown in FIG. 8 (b). 図10は、図1のドキュメントから抽出された情報内容と項目との対応関係の一例を示す図である。FIG. 10 is a diagram showing an example of a correspondence relationship between items and information contents extracted from the document of FIG. 図11(a)は、各ドキュメントの生息環境に紐付く下位ノードの紐付き方のパタンの分類例を示す図、図11(b)は、図11(a)のパタンP1の数理モデルの一例を示す図、図11(c)は、図11(a)のパタンP2の数理モデルの一例を示す図、図11(d)は、図11(a)のパタンP3の数理モデルの一例を示す図である。FIG. 11 (a) is a diagram showing a classification example of patterns of how to associate lower nodes associated with the habitat of each document, and FIG. 11 (b) is an example of a mathematical model of pattern P1 of FIG. 11 (a). 11 (c) is a diagram showing an example of a mathematical model of pattern P2 of FIG. 11 (a), and FIG. 11 (d) is a diagram showing an example of a mathematical model of pattern P3 of FIG. 11 (a). Is. 図12(a)は、図11(b)のパタンP1に基づく下位ノードの分割例を示す図、図12(b)は、図11(c)のパタンP2に基づく下位ノードの分割例を示す図、図12(c)は、図11(d)のパタンP3に基づく下位ノードの分割例を示す図である。FIG. 12A shows an example of dividing the lower node based on the pattern P1 of FIG. 11B, and FIG. 12B shows an example of dividing the lower node based on the pattern P2 of FIG. 11C. FIG. 12 (c) is a diagram showing an example of division of lower nodes based on the pattern P3 of FIG. 11 (d). 図13(a)は、図7(a)の階層構造に基づく下位ノードの統合または分割の対象となる所定ノードのその他の抽出例を示す図、図13(b)は、図8(b)の階層構造に基づく下位ノードの統合または分割の対象となる所定ノードのその他の抽出例を示す図である。13 (a) is a diagram showing another extraction example of a predetermined node to be integrated or divided of lower nodes based on the hierarchical structure of FIG. 7 (a), and FIG. 13 (b) is FIG. 8 (b). It is a figure which shows the other extraction example of the predetermined node which is the target of integration or division of the lower node based on the hierarchical structure of. 図14は、実施形態に係る下位構造に基づくパタン分類と数理モデル化処理を示すフローチャートである。FIG. 14 is a flowchart showing pattern classification and mathematical modeling processing based on the substructure according to the embodiment. 図15は、図14のS18の処理の具体例を示すフローチャートである。FIG. 15 is a flowchart showing a specific example of the process of S18 in FIG. 図16は、実施形態に係る抽象度再設定に基づくノードの分割処理の一例を示すフローチャートである。FIG. 16 is a flowchart showing an example of node division processing based on the abstraction degree reset according to the embodiment. 図17は、実施形態に係る抽象度再設定に基づくノードの分割処理のその他の例を示すフローチャートである。FIG. 17 is a flowchart showing another example of the node division process based on the abstraction degree reset according to the embodiment. 図18は、実施形態に係る抽象度再設定に基づくノードの分割処理のさらにその他の例を示すフローチャートである。FIG. 18 is a flowchart showing still another example of the node division process based on the abstraction degree reset according to the embodiment. 図19は、図1の情報管理装置のハードウェア構成例を示すブロック図である。FIG. 19 is a block diagram showing a hardware configuration example of the information management device of FIG.
 実施形態について、図面を参照して説明する。なお、以下に説明する実施形態は特許請求の範囲に係る発明を限定するものではなく、また、実施形態の中で説明されている諸要素およびその組み合わせの全てが発明の解決手段に必須であるとは限らない。 The embodiment will be described with reference to the drawings. It should be noted that the embodiments described below do not limit the invention according to the claims, and all of the elements and combinations thereof described in the embodiments are essential for the means for solving the invention. Not necessarily.
 実施形態に係る情報管理装置は、ドキュメントの表記の使われ方に基づいて情報を分類する。このとき、情報管理装置は、概念化された情報に割り当てられたノードの階層構造から所定ノードを抽出し、その所定ノードに紐付けられた下位ノードの情報に基づいて所定ノードを分類する。ノードは、例えば、ドキュメントの項目が割り当てられる。ノードは、ドキュメントの見出しが割り当てられてもよいし、ドキュメントのタイトルが割り当てられてもよい。ノードは、例えば、帳票などの項目名が割り当てられてもよい。 The information management device according to the embodiment classifies information based on how the notation of the document is used. At this time, the information management device extracts a predetermined node from the hierarchical structure of the nodes assigned to the conceptualized information, and classifies the predetermined node based on the information of the lower node associated with the predetermined node. Nodes are assigned, for example, document items. The node may be assigned a document heading or a document title. The node may be assigned an item name such as a form.
 以下、実施形態に係る情報管理装置について、ドキュメントの項目がノードに割り当てられる場合を例にとって説明する。以下の説明では、“○○部は”と動作主体を記した場合、プロセッサがプログラムである○○部を読み出し、DRAM(Dynamic Random Access Memory)にロードした上で○○部の機能を実現することを意味するものとする。 Hereinafter, the information management device according to the embodiment will be described by taking the case where the document items are assigned to the nodes as an example. In the following explanation, when the operating subject is described as "○○ part is", the processor reads the program ○○ part, loads it into the DRAM (Dynamic Random Access Memory), and then realizes the function of the ○○ part. It shall mean that.
 図1は、実施形態に係る情報管理装置の構成例を示すブロック図である。
 図1において、情報管理装置は、項目抽出部1、ノード候補生成部2、ノード抽出部3、ノード統合部4、分類部5、モデル化部6、ノード分割部7、シソーラス辞書8および概念モデル9を備える。
FIG. 1 is a block diagram showing a configuration example of the information management device according to the embodiment.
In FIG. 1, the information management device includes an item extraction unit 1, a node candidate generation unit 2, a node extraction unit 3, a node integration unit 4, a classification unit 5, a modeling unit 6, a node division unit 7, a thesaurus dictionary 8, and a conceptual model. 9 is provided.
 項目抽出部1は、ドキュメントD1~D4・・から項目を抽出し、その項目が割り当てられたノードの階層構造を生成する。このとき、項目抽出部1は、ノードに付される項目名として、ドキュメントD1~D4・・の記載をそのまま用いる。このため、ノードに付される項目名は、同一の概念の項目であっても、表記にバラツキが発生することがある。 The item extraction unit 1 extracts items from documents D1 to D4 ... And generates a hierarchical structure of nodes to which the items are assigned. At this time, the item extraction unit 1 uses the description of the documents D1 to D4 ... as it is as the item name attached to the node. Therefore, even if the item names given to the nodes have the same concept, the notation may vary.
 ノード候補生成部2は、形態素解析および類語分析に基づいて、ドキュメントD1~D4・・から抽出された同一概念の項目の名称を統一する。このとき、ノード候補生成部2は、シソーラス辞書8を参照することができる。また、ノード候補生成部2は、ドキュメントD1~D4・・から抽出された単語の包含関係に基づいてノードの階層構造を修正する。例えば、所定ノードに紐付く下位ノードの概念が、所定ノードの概念に含まれない修飾語である場合、ノード候補生成部2は、その下位ノードを所定ノードに集約することができる。 The node candidate generation unit 2 unifies the names of the items of the same concept extracted from the documents D1 to D4 ... Based on the morphological analysis and the synonym analysis. At this time, the node candidate generation unit 2 can refer to the thesaurus dictionary 8. Further, the node candidate generation unit 2 modifies the hierarchical structure of the nodes based on the inclusion relationship of the words extracted from the documents D1 to D4. For example, when the concept of the lower node associated with the predetermined node is a modifier that is not included in the concept of the predetermined node, the node candidate generation unit 2 can aggregate the lower node into the predetermined node.
 ノード抽出部3は、ノードの階層構造から所定ノードを抽出する。例えば、ノード抽出部3は、所定ノードとして、下位ノードの階層が1段以下のノードを抽出することができる。所定ノードとして、下位ノードの階層が1段以下のノードを抽出することにより、所定ノードに紐付く下位ノードの項目に基づくパタン分類を容易化することができる。 The node extraction unit 3 extracts a predetermined node from the hierarchical structure of the nodes. For example, the node extraction unit 3 can extract nodes having one or less levels of lower node hierarchy as predetermined nodes. By extracting the nodes whose lower node hierarchy is one level or less as the predetermined node, it is possible to facilitate the pattern classification based on the item of the lower node associated with the predetermined node.
 ノード統合部4は、所定ノードに紐付く下位ノードの項目の抽象度を統合する。このとき、ノード統合部4は、概念モデル9を参照することができる。これにより、同一階層の項目であっても、上位概念で表記された項目名と、下位概念で表記された項目名とを一致させることができる。 The node integration unit 4 integrates the degree of abstraction of the items of the lower nodes associated with the predetermined node. At this time, the node integration unit 4 can refer to the conceptual model 9. As a result, even if the items are in the same hierarchy, the item name described in the upper concept and the item name described in the lower concept can be matched.
 分類部5は、所定ノードに紐付く下位ノードの項目に基づいて所定ノードを分類する。このとき、分類部5は、所定ノードに紐付く下位ノードの概念の組み合わせに基づいて、所定ノードを分類することができる。例えば、分類部5は、第1ノードに紐付く下位ノードの概念が、第2ノードに紐付く下位ノードの概念となり得ない場合、第1ノードと第2ノードを異なるグループに分類することができる。これにより、第1ノードの項目名と第2ノードの項目名とが等しい場合においても、第1ノードの項目と第2ノードの項目とは、使われ方が異なると判断することができる。 The classification unit 5 classifies the predetermined node based on the item of the lower node associated with the predetermined node. At this time, the classification unit 5 can classify the predetermined node based on the combination of the concepts of the lower nodes associated with the predetermined node. For example, the classification unit 5 can classify the first node and the second node into different groups when the concept of the lower node associated with the first node cannot be the concept of the lower node associated with the second node. .. As a result, even when the item name of the first node and the item name of the second node are the same, it can be determined that the item of the first node and the item of the second node are used differently.
 ここで、異なるグループに分類された所定ノードに割り当てられた項目は、表記が同じであっても、使われ方が異なると判断することができ、項目の使われ方が反映された情報の検索効率を向上させることができる。このため、報告書、設計書、企画書、論文、社内向けおよび社外向けなどの種類に応じて適正化されたドキュメントの作成を支援したり、採択率の良い論文の作成を支援したりすることができる。 Here, it can be determined that the items assigned to the predetermined nodes classified into different groups are used differently even if the notation is the same, and the search for information reflecting the usage of the items is performed. Efficiency can be improved. For this reason, we support the creation of documents optimized for each type of report, design document, proposal, dissertation, internal and external, etc., and support the creation of dissertation with a good acceptance rate. Can be done.
 モデル化部6は、所定ノードに紐付く下位ノードの情報に基づいて、下位ノードの紐付き方のモデルを推定する。このとき、モデル化部6は、分類部5にて同一グループに分類された所定ノードに紐付く下位ノードの紐付き方のパタンを生成することができる。このパタンは、複数のドキュメントD1~D4・・から抽出された項目に割り当てられた所定ノードに紐付く下位ノードのまとまり度またはバラツキ度を示すことができる。 The modeling unit 6 estimates a model of how to associate the lower node based on the information of the lower node associated with the predetermined node. At this time, the modeling unit 6 can generate a pattern of how the lower nodes are associated with the predetermined nodes classified in the same group by the classification unit 5. This pattern can indicate the degree of cohesion or the degree of variation of the lower nodes associated with the predetermined nodes assigned to the items extracted from the plurality of documents D1 to D4.
 モデル化部6は、下位ノードの紐付き方のモデルを推定する場合、所定ノードに紐付く下位ノードの情報内容を参照することができる。例えば、モデル化部6は、所定ノードに紐付く下位ノードの情報内容の要素に基づいて、下位ノードの紐付き方のモデルを推定することができる。下位ノードの情報内容の要素は、例えば、下位ノードの情報内容に含まれる単語である。この下位ノードの情報の紐付き方のモデルは、下位ノードの情報内容の情報量に基づいて構築してもよいし、下位ノードの情報内容の要素の類似性に基づいて構築してもよい。 When estimating the model of how to associate the lower node, the modeling unit 6 can refer to the information content of the lower node associated with the predetermined node. For example, the modeling unit 6 can estimate a model of how the lower node is associated based on the element of the information content of the lower node associated with the predetermined node. The element of the information content of the lower node is, for example, a word included in the information content of the lower node. The model of how to associate the information of the lower node may be constructed based on the amount of information of the information content of the lower node, or may be constructed based on the similarity of the elements of the information content of the lower node.
 ノード分割部7は、異なるグループに分類された所定ノードにそれぞれ紐付く下位ノードの項目を、各グループに特有の具体的な項目に分割し、そのノードの階層構造をグループごとに出力する。このとき、ノード分割部7は、モデル化部6にて推定されたモデルに基づいて、所定ノードに紐付く下位ノードの項目を分割することができる。これにより、ドキュメントD1~D4・・から同一概念の項目として抽出された場合においても、これらの項目の使われ方に違いに応じて項目名を異ならせることができ、項目の使われ方の違いが反映された検索を実現することができる。 The node division unit 7 divides the items of the lower nodes associated with the predetermined nodes classified into different groups into specific items specific to each group, and outputs the hierarchical structure of the nodes for each group. At this time, the node dividing unit 7 can divide the items of the lower node associated with the predetermined node based on the model estimated by the modeling unit 6. As a result, even when items of the same concept are extracted from documents D1 to D4 ..., the item names can be different according to the difference in how these items are used, and the difference in how the items are used. Can be realized in the search that reflects.
 シソーラス辞書8は、意味の類似性に基づいて単語を分類した辞書である。概念モデル9は、概念間の上下の関係性を示すモデルである。このとき、上位の階層は、下位の階層よりも抽象度を高くすることができる。概念モデル9は、例えば、オントロジーを用いることができる。 The thesaurus dictionary 8 is a dictionary that classifies words based on the similarity of meanings. The conceptual model 9 is a model showing the vertical relationship between concepts. At this time, the upper layer can have a higher degree of abstraction than the lower layer. As the conceptual model 9, for example, an ontology can be used.
 以下、図1の情報管理装置の処理について、実際のドキュメントを例にとって具体的に説明する。
 図2は、図1のドキュメントの一例を示す図である。
 図2において、ドキュメントD1には、クマノミの生態というタイトルが設けられている。ドキュメントD1は、生息環境、繁殖方法、採食方法、性別および生存期間という項目を含む。生息環境という項目は、水質、水深、温度、共生および生息海域という項目を含む。
Hereinafter, the processing of the information management device of FIG. 1 will be specifically described by taking an actual document as an example.
FIG. 2 is a diagram showing an example of the document of FIG.
In FIG. 2, document D1 is given the title of anemone fish ecology. Document D1 includes items such as habitat, breeding method, foraging method, gender and survival time. Habitat items include water quality, depth, temperature, symbiosis and habitat areas.
 水質という項目は、海水という情報内容を含む。水深という項目は、20-40mという情報内容を含む。温度という項目は、24度という情報内容を含む。共生という項目は、イソギンチャクという情報内容を含む。生息海域という項目は、インド太平洋および赤道付近という情報内容を含む。 The item of water quality includes the information content of seawater. The item "water depth" includes the information content of 20-40 m. The item "temperature" includes the information content of 24 degrees. The item of symbiosis includes the information content of sea anemones. The item Habitat includes information about the Indo-Pacific and near the equator.
 図3は、図1のドキュメントのその他の例を示す図である。
 図3において、ドキュメントD2には、ドビウオの生態というタイトルが設けられている。ドキュメントD2は、生息環境、繁殖方法、餌接種方法、性別および寿命という項目を含む。生息環境という項目は、水質、水深、温度および生息地域という項目を含む。
FIG. 3 is a diagram showing another example of the document of FIG.
In FIG. 3, document D2 is given the title of Dobiuo ecology. Document D2 includes items such as habitat, breeding method, feeding method, gender and longevity. The item habitat includes items such as water quality, depth, temperature and habitat.
 水質という項目は、海水という情報内容を含む。水深という項目は、1mという情報内容を含む。生息地域という項目は、太平洋、インド洋および大西洋という情報内容を含む。 The item of water quality includes the information content of seawater. The item "water depth" includes the information content of 1 m. The habitat section includes information about the Pacific Ocean, Indian Ocean, and Atlantic Ocean.
 図4は、図1のドキュメントのさらにその他の例を示す図である。
 図4において、ドキュメントD3には、パンダの生態というタイトルが設けられている。ドキュメントD3は、形態、生息地域、繁殖方法、採食方法および寿命という項目を含む。形態という項目は、大きさ、体毛および鳴き声という項目を含む。大きさという項目は、全長および体重という項目を含む。鳴き声という項目は、おすおよびめすという項目を含む。生息地域という項目は、国名および生息環境という項目を含む。生息環境という項目は、温帯および竹林という項目を含む。
FIG. 4 is a diagram showing still another example of the document of FIG.
In FIG. 4, document D3 is given the title of panda ecology. Document D3 includes items such as morphology, habitat, breeding method, foraging method and longevity. The item morphology includes the items size, hair and bark. The item of size includes the items of total length and weight. The item barking includes the items male and female. The item habitat includes the item country name and habitat. The item habitat includes the items temperate and bamboo grove.
 体重という項目は、kgという情報内容を含む。おすという項目は、ニャーニャーという情報内容を含む。めすという項目は、ミャンミャンという情報内容を含む。国名という項目は、中国という情報内容を含む The item "weight" includes the information content "kg". The item "Osu" includes the information content "Meow Meow". The item "female" includes the information content "myanmyan". The item "country name" includes the information content "China"
 図5は、図1のドキュメントのさらにその他の例を示す図である。
 図5において、ドキュメントD4には、ライオンの生態というタイトルが設けられている。ドキュメントD4は、形態、生息地域、繁殖方法、採食方法、社会システムおよび寿命という項目を含む。形態という項目は、大きさ、体毛および鳴き声という項目を含む。大きさという項目は、全長および体重という項目を含む。生息地域という項目は、国名および生息環境という項目を含む。生息環境という項目は、亜熱帯および草地という項目を含む。国名という項目は、アフリカという情報内容を含む。
FIG. 5 is a diagram showing still another example of the document of FIG.
In FIG. 5, document D4 is given the title of lion ecology. Document D4 includes items such as morphology, habitat, breeding method, foraging method, social system and longevity. The item morphology includes the items size, hair and bark. The item of size includes the items of total length and weight. The item habitat includes the item country name and habitat. The item Habitat includes the items Subtropical and Grassland. The item "country name" includes the information content of Africa.
 図6(a)は、図2のドキュメントの項目が割り当てられたノードの階層構造を示す図、図6(b)は、図3のドキュメントの項目が割り当てられたノードの階層構造を示す図である。
 図6(a)において、項目抽出部1は、図2のドキュメントD1からタイトルおよび項目を抽出する。そして、項目抽出部1は、クマノミの生態というタイトルにノードN111を割り当てる。
FIG. 6A is a diagram showing a hierarchical structure of nodes to which the document items of FIG. 2 are assigned, and FIG. 6B is a diagram showing a hierarchical structure of nodes to which the document items of FIG. 3 are assigned. is there.
In FIG. 6A, the item extraction unit 1 extracts titles and items from the document D1 of FIG. Then, the item extraction unit 1 assigns the node N111 to the title of the ecology of anemone fish.
 項目抽出部1は、生息環境、繁殖方法、採食方法、性別および生存期間という項目にノードN121~N125をそれぞれ割り当てる。項目抽出部1は、水質、水深、温度、共生および生息海域という項目にノードN131~N135をそれぞれ割り当てる。項目抽出部1は、ノードN111にノードN121~N125を紐付け、ノードN121にノードN131~N135を紐付ける。 The item extraction unit 1 assigns nodes N121 to N125 to the items of habitat, breeding method, foraging method, gender and survival time, respectively. The item extraction unit 1 assigns nodes N131 to N135 to the items of water quality, water depth, temperature, symbiosis, and habitat, respectively. The item extraction unit 1 associates the nodes N121 to N125 with the node N111, and associates the nodes N131 to N135 with the node N121.
 図6(b)において、図1の項目抽出部1は、図3のドキュメントD2からタイトルおよび項目を抽出する。そして、項目抽出部1は、トビウオの生態というタイトルにノードN211を割り当てる。 In FIG. 6B, the item extraction unit 1 of FIG. 1 extracts a title and an item from the document D2 of FIG. Then, the item extraction unit 1 assigns the node N211 to the title of the ecology of flying fish.
 項目抽出部1は、生息環境、繁殖方法、餌接種方法、性別および寿命という項目にノードN221~N225をそれぞれ割り当てる。項目抽出部1は、水質、水深、温度および生息地域という項目にノードN231~N234をそれぞれ割り当てる。項目抽出部1は、ノードN211にノードN221~N225を紐付け、ノードN221にノードN231~N234を紐付ける。 The item extraction unit 1 assigns nodes N221 to N225 to the items of habitat, breeding method, feeding method, gender and longevity, respectively. The item extraction unit 1 assigns nodes N231 to N234 to the items of water quality, water depth, temperature, and habitat, respectively. The item extraction unit 1 associates the nodes N221 to N225 with the node N211 and associates the nodes N231 to N234 with the node N221.
 ここで、図6(a)のノードN123の採食方法という項目と、図6(b)のノードN223の餌接種方法という項目とは、同一概念であるが、項目抽出部1は、ドキュメントD1、D2の表記をそのまま用いる。また、図6(a)のノードN125の生存期間という項目と、図6(b)のノードN225の寿命という項目とは、同一概念であるが、項目抽出部1は、ドキュメントD1、D2の表記をそのまま用いる。 Here, the item of the foraging method of the node N123 in FIG. 6A and the item of the inoculation method of the node N223 in FIG. 6B have the same concept, but the item extraction unit 1 describes the document D1. , D2 notation is used as it is. Further, the item of the survival time of the node N125 in FIG. 6A and the item of the lifespan of the node N225 in FIG. 6B have the same concept, but the item extraction unit 1 describes the documents D1 and D2. Is used as it is.
 図7(a)は、図6(a)の階層構造のノードの概念の意味解析に基づく統合例を示す図、図7(b)は、図6(b)の階層構造のノードの概念の意味解析に基づく概念の統合例を示す図、図7(c)は、図4(b)のドキュメントの項目に基づく階層構造のノードの概念の意味解析に基づく統合例を示す図である。
 図7(a)において、図1のノード候補生成部2は、形態素解析に基づいて、ノードN111のクマノミの生態というタイトルから生態という項目を抽出し、ノードN111の名称を生態という項目に変更する。また、ノード候補生成部2は、類語分析に基づいて、ノードN125の生存期間という項目を寿命という項目に変更する。
FIG. 7 (a) is a diagram showing an integration example based on the semantic analysis of the concept of the hierarchical structure node of FIG. 6 (a), and FIG. 7 (b) is a diagram of the concept of the hierarchical structure node of FIG. 6 (b). FIG. 7 (c) is a diagram showing an integrated example of the concept based on the semantic analysis, and FIG. 7 (c) is a diagram showing an integrated example based on the semantic analysis of the concept of the node having a hierarchical structure based on the item of the document of FIG. 4 (b).
In FIG. 7A, the node candidate generation unit 2 of FIG. 1 extracts an item of ecology from the title of anemone fish ecology of node N111 based on morphological analysis, and changes the name of node N111 to the item of ecology. .. Further, the node candidate generation unit 2 changes the item of the survival period of the node N125 to the item of the lifespan based on the synonym analysis.
 図7(b)において、ノード候補生成部2は、形態素解析に基づいて、ノードN211のトビウオの生態というタイトルから生態という項目を抽出し、ノードN211の名称を生態という項目に変更する。また、ノード候補生成部2は、類語分析に基づいて、ノードN223の餌接種方法という項目を繁殖方法という項目に変更する。 In FIG. 7B, the node candidate generation unit 2 extracts the item "ecology" from the title of the flying fish ecology of the node N211 based on the morphological analysis, and changes the name of the node N211 to the item "ecology". Further, the node candidate generation unit 2 changes the item of the feeding method of the node N223 to the item of the breeding method based on the synonym analysis.
 これにより、ノード候補生成部2は、ドキュメントD1、D2において同一概念の項目の表記が異なる場合においても、同一概念の項目の表記を統合することができる。 As a result, the node candidate generation unit 2 can integrate the notations of the items of the same concept even when the notations of the items of the same concept are different in the documents D1 and D2.
 図7(c)において、図1の項目抽出部1は、図4のドキュメントD3からタイトルおよび項目を抽出する。そして、項目抽出部1は、パンダの生態というタイトルにノードN311を割り当てる。 In FIG. 7 (c), the item extraction unit 1 of FIG. 1 extracts a title and an item from the document D3 of FIG. Then, the item extraction unit 1 assigns the node N311 to the title of panda ecology.
 項目抽出部1は、生息地域、形態、採食方法、繁殖方法および寿命という項目にノードN321~N325をそれぞれ割り当てる。項目抽出部1は、国名、生息環境、大きさ、体毛および鳴き声という項目にノードN331~N335をそれぞれ割り当てる。項目抽出部1は、温帯、竹林、全長、体重、おすおよびめすという項目にノードN341~N346をそれぞれ割り当てる。項目抽出部1は、ノードN311にノードN321~N325を紐付け、ノードN321にノードN331、N322を紐付け、ノードN322にノードN333~N335を紐付け、ノードN332にノードN341、N342を紐付け、ノードN333にノードN343、N344を紐付け、ノードN335にノードN345、N346を紐付ける。 The item extraction unit 1 assigns nodes N321 to N325 to the items of habitat, morphology, foraging method, breeding method, and lifespan, respectively. The item extraction unit 1 assigns nodes N331 to N335 to the items of country name, habitat, size, hair, and bark, respectively. The item extraction unit 1 assigns nodes N341 to N346 to the items of temperate zone, bamboo grove, total length, weight, male and female, respectively. The item extraction unit 1 associates nodes N321 to N325 with node N311, associates nodes N331 and N322 with node N321, associates nodes N333 to N335 with node N322, and associates nodes N341 and N342 with node N332. Nodes N343 and N344 are associated with node N333, and nodes N345 and N346 are associated with node N335.
 ここで、項目抽出部1は、ノードN341の温帯という項目に対して、温帯の上位概念の仮項目X1を設定し、ノードN342の竹林という項目に対して、竹林の上位概念の仮項目X2を設定することができる。 Here, the item extraction unit 1 sets a temporary item X1 of the upper concept of the temperate zone for the item of the temperate zone of the node N341, and sets a temporary item X2 of the higher concept of the bamboo forest for the item of the bamboo forest of the node N342. Can be set.
 次に、ノード候補生成部2は、形態素解析に基づいて、ノードN311のパンダの生態というタイトルから生態という項目を抽出し、ノードN311の名称を生態という項目に変更する。また、ノード候補生成部2は、ノードN345のおすという項目の概念およびノードN346のめすという項目の概念が、ノードN335の鳴き声という項目の概念に含まれるどうかを判断する。また、ノード候補生成部2は、ノードN345のニャーニャーという情報内容およびノードN346のミャンミャンという情報内容が、ノードN335の鳴き声という項目の概念に含まれるどうかを判断する。 Next, the node candidate generation unit 2 extracts the item "ecology" from the title of panda ecology of node N311 based on the morphological analysis, and changes the name of node N311 to the item "ecology". Further, the node candidate generation unit 2 determines whether or not the concept of the item of the node N345 and the concept of the item of the node N346 are included in the concept of the item of the bark of the node N335. Further, the node candidate generation unit 2 determines whether or not the information content of the node N345 as meow and the information content of the node N346 as myanmyan are included in the concept of the item of the bark of the node N335.
 そして、ノード候補生成部2は、ノードN345のおすという項目の概念およびノードN346のめすという項目の概念が、ノードN335の鳴き声という項目の概念に含まれないが、ノードN345のニャーニャーという情報内容およびノードN346のミャンミャンという情報内容が、ノードN335の鳴き声という項目の概念に含まれる場合、ノードN345のおすという項目およびノードN346のめすという項目は、単なる修飾語と判断し、ノードN345、N346をノードN335に集約する。 Then, in the node candidate generation unit 2, although the concept of the item of the node N345 and the concept of the item of the node N346 are not included in the concept of the item of the bark of the node N335, the information content of the node N345 and the information content of the node N345 are not included. When the information content of node N346 is included in the concept of the item of barking of node N335, the item of node N345 and the item of node N346 are judged to be mere modifiers, and nodes N345 and N346 are noded. Consolidate to N335.
 図8(a)は、図6(b)の階層構造についての下位ノードの統合または分割の対象となる所定ノードの抽出例を示す図、図8(b)は、図5のドキュメントの項目に基づく階層構造についての下位ノードの統合または分割の対象となる所定ノードの抽出例を示す図である。
 図8(a)において、図1のノード抽出部3は、ノード候補生成部2にて図7(b)のノードの階層構造が生成されると、下層構造が1段である下位ノードN231~N234に紐付くノードN221を抽出する。
8 (a) is a diagram showing an example of extracting a predetermined node to be integrated or divided of lower nodes for the hierarchical structure of FIG. 6 (b), and FIG. 8 (b) is an item of the document of FIG. It is a figure which shows the extraction example of the predetermined node which is the target of integration or division of the lower node about the hierarchical structure based on.
In FIG. 8A, when the node candidate generation unit 2 generates the hierarchical structure of the node of FIG. 7B, the node extraction unit 3 of FIG. 1 has a lower layer structure of one stage from the lower nodes N231 to N231 to The node N221 associated with N234 is extracted.
 図8(b)において、図1の項目抽出部1は、図5のドキュメントD4からタイトルおよび項目を抽出する。そして、項目抽出部1は、ライオンの生態というタイトルにノードN411を割り当てる。 In FIG. 8B, the item extraction unit 1 of FIG. 1 extracts titles and items from the document D4 of FIG. Then, the item extraction unit 1 assigns the node N411 to the title of lion's ecology.
 項目抽出部1は、生息地域、形態、採食方法、繁殖方法、寿命および社会システムという項目にノードN421~N426をそれぞれ割り当てる。項目抽出部1は、国名、生息環境、大きさ、体毛および鳴き声という項目にノードN431~N435をそれぞれ割り当てる。項目抽出部1は、亜熱帯、草原、全長および体重という項目にノードN441~N444をそれぞれ割り当てる。項目抽出部1は、ノードN411にノードN421~N426を紐付け、ノードN421にノードN431、N422を紐付け、ノードN422にノードN433~N435を紐付け、ノードN432にノードN441、N442を紐付け、ノードN433にノードN443、N444を紐付ける。 The item extraction unit 1 assigns nodes N421 to N426 to the items of habitat, morphology, foraging method, breeding method, longevity and social system, respectively. The item extraction unit 1 assigns nodes N431 to N435 to the items of country name, habitat, size, hair, and bark, respectively. The item extraction unit 1 assigns nodes N441 to N444 to the items of subtropical, grassland, total length, and weight, respectively. The item extraction unit 1 associates nodes N421 to N426 with node N411, associates nodes N431 and N422 with node N421, associates nodes N433 to N435 with node N422, and associates nodes N441 and N442 with node N432. Nodes N443 and N444 are associated with node N433.
 ここで、項目抽出部1は、ノードN441の亜熱帯という項目に対して、亜熱帯の上位概念の仮項目Y1を設定し、ノードN442の草原という項目に対して、草原の上位概念の仮項目Y2を設定することができる。 Here, the item extraction unit 1 sets a temporary item Y1 of the subtropical superordinate concept for the item of the subtropical node N441, and sets a temporary item Y2 of the superordinate concept of the grassland for the item of the grassland of the node N442. Can be set.
 ノード抽出部3は、項目抽出部1にて図8(b)のノードの階層構造が生成されると、下層構造が1段である下位ノードN441、N442に紐付くノードN432を抽出することができる。あるいは、ノード抽出部3は、下層構造が1段である下位ノードN443、N444に紐付くノードN433抽出してもよい。 When the item extraction unit 1 generates the hierarchical structure of the nodes shown in FIG. 8B, the node extraction unit 3 can extract the nodes N432 associated with the lower nodes N441 and N442 having one lower layer structure. it can. Alternatively, the node extraction unit 3 may extract the node N433 associated with the lower nodes N443 and N444 whose lower layer structure is one stage.
 図9(a)は、図7(a)の階層構造のノードの概念の抽象度解析に基づく統合例を示す図、図9(b)は、図8(a)の階層構造のノードの概念の抽象度解析に基づく概念の統合例を示す図、図9(c)は、図7(c)の階層構造のノードの概念の抽象度解析に基づく統合例を示す図、図9(d)は、図8(b)の階層構造のノードの概念の抽象度解析に基づく統合例を示す図である。 9 (a) is a diagram showing an integration example based on the abstraction degree analysis of the concept of the hierarchical structure node of FIG. 7 (a), and FIG. 9 (b) is the concept of the hierarchical structure node of FIG. 8 (a). 9 (c) is a diagram showing an integration example of the concept based on the abstraction degree analysis of FIG. 9 (c), and FIG. Is a diagram showing an integration example based on the abstraction degree analysis of the concept of the node of the hierarchical structure shown in FIG. 8B.
 図9(a)において、図1のノード抽出部3は、図7(a)のノードの階層構造から、下層構造が1段である下位ノードN131~N135に紐付くノードN121を抽出したものとする。また、図9(b)において、ノード抽出部3は、図7(b)のノードの階層構造から、下層構造が1段である下位ノードN231~N234に紐付くノードN221を抽出したものとする。 In FIG. 9A, the node extraction unit 3 of FIG. 1 extracts the node N121 associated with the lower nodes N131 to N135 having the lower layer structure of one stage from the hierarchical structure of the nodes of FIG. 7A. To do. Further, in FIG. 9B, it is assumed that the node extraction unit 3 extracts the node N221 associated with the lower nodes N231 to N234 having the lower layer structure in one stage from the hierarchical structure of the nodes in FIG. 7B. ..
 そして、ノード統合部4は、抽象度解析に基づいて、図9(a)の下位ノードN135の生息海域という項目を生息地域という項目に統合する。これにより、図9(a)の下位ノードN135の項目名を、図9(b)の下位ノードN234の項目名に一致させることができ、下位ノードの表記の揺れを除去することができる。 Then, the node integration unit 4 integrates the item of the habitat area of the lower node N135 in FIG. 9 (a) into the item of the habitat area based on the abstraction degree analysis. As a result, the item name of the lower node N135 in FIG. 9A can be matched with the item name of the lower node N234 in FIG. 9B, and the fluctuation of the notation of the lower node can be eliminated.
 また、図9(c)において、ノード抽出部3は、図7(c)のノードの階層構造から、下層構造が1段である下位ノードN341~N342に紐付くノードN332を抽出したものとする。さらに、図9(d)において、ノード抽出部3は、図8(b)のノードの階層構造から、下層構造が1段である下位ノードN441~N442に紐付くノードN432を抽出したものとする。 Further, in FIG. 9C, it is assumed that the node extraction unit 3 extracts the node N332 associated with the lower nodes N341 to N342 having the lower layer structure in one stage from the hierarchical structure of the nodes in FIG. 7C. .. Further, in FIG. 9D, it is assumed that the node extraction unit 3 extracts the node N432 associated with the lower nodes N441 to N442 having the lower layer structure in one stage from the hierarchical structure of the nodes in FIG. 8B. ..
 そして、ノード統合部4は、抽象度解析に基づいて、下位ノードN341の仮項目X1を気候という項目に統合し、下位ノードN342の仮項目X2を植生という項目に統合する。さらに、ノード統合部4は、抽象度解析に基づいて、下位ノードN441の仮項目Y1を気候という項目に統合し、下位ノードN442の仮項目Y2を植生という項目に統合する。これにより、図9(c)の下位ノードN341、N342の項目名と、図9(b)の下位ノードN441、N442の項目名とをそれぞれ一致させることができ、下位ノードの表記の揺れを除去することができる。 Then, the node integration unit 4 integrates the temporary item X1 of the lower node N341 into the item of climate and the temporary item X2 of the lower node N342 into the item of vegetation based on the abstraction degree analysis. Further, the node integration unit 4 integrates the temporary item Y1 of the lower node N441 into the item of climate and the temporary item Y2 of the lower node N442 into the item of vegetation based on the abstraction degree analysis. As a result, the item names of the lower nodes N341 and N342 in FIG. 9C can be matched with the item names of the lower nodes N441 and N442 in FIG. 9B, respectively, and the fluctuation of the notation of the lower node can be eliminated. can do.
 図10は、図1のドキュメントから抽出された情報内容と項目との対応関係の一例を示す図である。
 図10において、概念モデル9は、例えば、海水、汽水および淡水という下位概念を水質という上位概念に関連付け、インド太平洋、赤道付近の海域、インド洋、太平洋、東アジア河川という下位概念を生息地域という上位概念に関連付け、温帯および亜熱帯という下位概念を気候という上位概念に関連付け、草原および竹林という下位概念を植生という上位概念に関連付ける。
FIG. 10 is a diagram showing an example of a correspondence relationship between items and information contents extracted from the document of FIG.
In FIG. 10, in the conceptual model 9, for example, the subordinate concepts of seawater, steam water, and freshwater are associated with the superordinate concept of water quality, and the subordinate concepts of the Indo-Pacific, the sea area near the equator, the Indian Ocean, the Pacific Ocean, and the East Asian rivers are referred to as habitats. Associate the superordinate concept, the temperate and subtropical subconcepts with the climate superordinate concept, and the grassland and bamboo forest subordinate concepts with the vegetation superordinate concept.
 ノード統合部4は、概念モデル9を参照することにより、抽象度の異なる下位ノードの概念の項目名を統合することができる。例えば、図2のドキュメントD1には、インド太平洋および赤道付近という情報内容に対して生息海域という項目が記載されている。このとき、図10の概念モデル9には、インド太平洋および赤道付近の海域という情報内容に対して生息地域という項目が関連付けられている。このため、ノード統合部4は、図10の概念モデル9を参照することにより、図9(a)の下位ノードN135の生息海域という項目を生息地域という項目に統合することができる。 The node integration unit 4 can integrate the item names of the concepts of the lower nodes having different abstractions by referring to the concept model 9. For example, document D1 of FIG. 2 describes the item of habitat for the information content of the Indo-Pacific and the vicinity of the equator. At this time, in the conceptual model 9 of FIG. 10, the item of habitat is associated with the information content of the Indo-Pacific and the sea area near the equator. Therefore, the node integration unit 4 can integrate the item of the habitat area of the lower node N135 of FIG. 9A into the item of the habitat area by referring to the conceptual model 9 of FIG.
 また、概念モデル9には、温帯および亜熱帯という情報内容に対して気候という項目が関連付けられ、草原および竹林という情報内容に対して植生という項目が関連付けられている。このため、ノード統合部4は、図10の概念モデル9を参照することにより、図9(c)および図9(d)の下位ノードN341、N441の仮項目X1、Y1を気候という項目に統合し、下位ノードN342、N442の仮項目X2、Y2を植生という項目に統合することができる。 Further, in the conceptual model 9, the item of climate is associated with the information content of temperate zone and subtropical zone, and the item of vegetation is associated with the information content of grassland and bamboo grove. Therefore, the node integration unit 4 integrates the provisional items X1 and Y1 of the lower nodes N341 and N441 of FIGS. 9 (c) and 9 (d) into the item of climate by referring to the conceptual model 9 of FIG. Then, the provisional items X2 and Y2 of the lower nodes N342 and N442 can be integrated into the item called vegetation.
 図11(a)は、各ドキュメントの生息環境に紐付く下位ノードの紐付き方のパタンの分類例を示す図、図11(b)は、図11(a)のパタンP1の数理モデルの一例を示す図、図11(c)は、図11(a)のパタンP2の数理モデルの一例を示す図、図11(d)は、図11(a)のパタンP3の数理モデルの一例を示す図である。 FIG. 11 (a) is a diagram showing a classification example of patterns of how to associate lower nodes associated with the habitat of each document, and FIG. 11 (b) is an example of a mathematical model of pattern P1 of FIG. 11 (a). 11 (c) is a diagram showing an example of a mathematical model of pattern P2 of FIG. 11 (a), and FIG. 11 (d) is a diagram showing an example of a mathematical model of pattern P3 of FIG. 11 (a). Is.
 図11(a)において、例えば、クマノミ、トビウオ、イルカ、アユ、メダカ、パンダおよびライオンの生態に関するドキュメントについて、図1のノード抽出部3は、所定ノードとして生息環境という項目を抽出したものとする。 In FIG. 11A, for example, regarding a document relating to the ecology of anemone fish, flying fish, dolphin, sweetfish, medaka, panda and lion, the node extraction unit 3 of FIG. 1 assumes that the item of habitat is extracted as a predetermined node. ..
 このとき、クマノミの生態に関するドキュメントについては、生息環境という項目に紐付く下位ノードの項目として、水質、水深、温度、生息地域および共生があり、トビウオの生態に関するドキュメントについては、生息環境という項目に紐付く下位ノードの項目として、水質、水深、温度および生息地域があり、イルカの生態に関するドキュメントについては、生息環境という項目に紐付く下位ノードの項目として、水質、水深および温度があるものとする。 At this time, regarding the document on the ecology of bear flies, there are water quality, water depth, temperature, habitat and symbiosis as items of lower nodes linked to the item of habitat, and for the document on the ecology of Tobiuo, the item of habitat. The items of the subordinate nodes associated with it are water quality, water depth, temperature and habitat, and for the document on the ecology of dolphins, the items of the subordinate nodes associated with the item of habitat are water quality, water depth and temperature. ..
 また、アユの生態に関するドキュメントについては、生息環境という項目に紐付く下位ノードの項目として、水質、生息地域および流速があり、メダカの生態に関するドキュメントについては、生息環境という項目に紐付く下位ノードの項目として、水質および流速があるものとする。 In addition, regarding the document on the ecology of medaka, there are water quality, habitat and flow velocity as items of the lower node linked to the item of habitat, and for the document on the ecology of medaka, the item of the lower node linked to the item of habitat. It is assumed that the items are water quality and flow velocity.
 さらに、パンダの生態に関するドキュメントおよびライオンの生態に関するドキュメントについては、生息環境という項目に紐付く下位ノードの項目として、気候および植生があるものとする。 Furthermore, regarding the document on the ecology of pandas and the document on the ecology of lions, it is assumed that there are climate and vegetation as items of lower nodes linked to the item of habitat.
 そして、図1の分類部5は、生息環境という項目に紐付く下位ノードの項目に基づいて、クマノミ、トビウオ、イルカ、アユ、メダカ、パンダおよびライオンの生態に関する各ドキュメントにおける生息環境という項目を分類する。このとき、分類部5は、各ドキュメントにおける生息環境という項目を分類するための指標として、例えば、各ドキュメントの下位ノードの項目をベクトル化した時のベクトル間の距離を用いることができる。 Then, the classification unit 5 of FIG. 1 classifies the item of habitat in each document on the ecology of anemone fish, flying fish, dolphin, sweetfish, medaka, panda and lion based on the item of the lower node associated with the item of habitat. To do. At this time, the classification unit 5 can use, for example, the distance between the vectors when the items of the lower nodes of each document are vectorized as an index for classifying the item of habitat in each document.
 このとき、分類部5は、下位ノードの項目の有無に応じて1または0という成分が付与されたベクトルを生成することができる。例えば、分類部5は、クマノミについては、(1,1,1,1,1,0,0,0)というベクトルを生成し、トビウオについては、(1,1,1,1,0,0,0,0)というベクトルを生成し、イルカについては、(1,1,1,0,0,0,0,0)というベクトルを生成し、アユについては、(1,0,0,1,0,1,0,0)というベクトルを生成し、メダカについては、(1,0,0,0,0,1,0,0)というベクトルを生成し、パンダおよびライオンについては、(0,0,0,0,0,0,1,1)というベクトルを生成する。 At this time, the classification unit 5 can generate a vector to which a component of 1 or 0 is added depending on the presence or absence of the item of the lower node. For example, the classification unit 5 generates a vector (1,1,1,1,1,0,0,0) for bear flies and (1,1,1,1,0,0) for tobiuo. , 0,0) is generated, for dolphins, the vector (1,1,1,0,0,0,0,0) is generated, and for sweetfish, (1,0,0,1) is generated. , 0,1,0,0), for medaka, (1,0,0,0,0,1,0,0), for pandas and lions, (0) , 0,0,0,0,0,1,1) is generated.
 クマノミ、トビウオおよびイルカについては、ベクトル間の距離は1または2である。アユおよびメダカについては、ベクトル間の距離は1である。パンダおよびライオンについては、ベクトル間の距離は0である。クマノミ、トビウオおよびイルカは、アユおよびメダカと、距離が3以上離れている。クマノミ、トビウオおよびイルカは、パンダおよびライオンと、距離が5以上離れている。アユおよびメダカとは、パンダおよびライオンと、距離が4以上離れている。 For anemone fish, flying fish and dolphins, the distance between the vectors is 1 or 2. For sweetfish and medaka, the distance between the vectors is 1. For pandas and lions, the distance between the vectors is zero. Clownfish, flying fish and dolphins are more than three distances from sweetfish and killifish. Clownfish, flying fish and dolphins are more than five distances from pandas and lions. Ayu and medaka are at least 4 distances from pandas and lions.
 このため、分類部5は、下位ノードのベクトル間の距離の閾値を3に設定することで、ベクトル間の距離が3より小さい下位ノードに紐付く生息環境という項目を同一グループに分類し、ベクトル間の距離が3以上の下位ノードに紐付く生息環境という項目を別グループに分類することができる。 Therefore, by setting the threshold of the distance between the vectors of the lower nodes to 3, the classification unit 5 classifies the item of the habitat associated with the lower nodes whose distance between the vectors is smaller than 3 into the same group, and the vectors. The item of habitat associated with lower nodes with a distance of 3 or more can be classified into another group.
 なお、分類部5は、第1ノードに紐付く下位ノードの概念が、第2ノードに紐付く下位ノードの概念となり得ない場合、第1ノードと第2ノードを異なるグループに分類するようにしてもよい。例えば、パンダおよびライオンの生態についての気候および植生という項目は、クマノミ、トビウオ、イルカ、アユおよびメダカの生態についての項目とはなり得ない。このため、パンダおよびライオンについての生息環境という項目は、クマノミ、トビウオ、イルカ、アユおよびメダカについての生息環境という項目と別グループに分類することができる。また、アユおよびメダカの生態についての流速という項目は、クマノミ、トビウオおよびイルカの生態についての項目とはなり得ない。このため、アユおよびメダカについての生息環境という項目は、クマノミ、トビウオおよびイルカについての生息環境という項目と別グループに分類することができる。 Note that the classification unit 5 classifies the first node and the second node into different groups when the concept of the lower node associated with the first node cannot be the concept of the lower node associated with the second node. May be good. For example, the item Climate and Vegetation for the ecology of pandas and lions cannot be the item for the ecology of anemone fish, flying fish, dolphins, sweetfish and medaka. Therefore, the item of habitat for pandas and lions can be classified into a different group from the item of habitat for clownfish, flying fish, dolphins, sweetfish and medaka. In addition, the item of flow velocity for the ecology of sweetfish and medaka cannot be the item for the ecology of anemone fish, flying fish and dolphin. Therefore, the item of habitat for sweetfish and medaka can be classified into a different group from the item of habitat for anemone fish, flying fish and dolphin.
 次に、モデル化部6は、クマノミ、トビウオおよびイルカについての生息環境という項目への紐付き方を示す生息環境パタンP1、アユおよびメダカについての生息環境という項目への紐付き方を示す生息環境パタンP2およびパンダおよびライオンについての生息環境という項目への紐付き方を示す生息環境パタンP3を生成する。 Next, the modeling unit 6 has a habitat pattern P1 showing how to link to the item habitat for anemone fish, flying fish and dolphin, and a habitat pattern P2 showing how to tie to the item habitat for sweetfish and medaka. And generate a habitat pattern P3 that shows how to tie to the item habitat for pandas and lions.
 このとき、モデル化部6は、各生息環境パタンP1~P3に紐付く下位ノードの情報に基づいて、各生息環境パタンP1~P3についての数理モデルを推定することができる。各生息環境パタンP1~P3の数理モデルでは、例えば、下位項目の存在確率、各生息環境パタンP1~P3の下位ノードのまとまり度または各下位項目に紐付く情報の分布モデルを用いることができる。下位項目に紐付く情報は、下位項目よりさらに下位の項目または情報内容を用いることができる。下位ノードのまとまり度は、生息環境パタンP1~P3ごとの下位項目の存在確率の分散に基づいて算出することができる。各生息環境パタンP1~P3に属するベクトルの代表ベクトルからの平均距離に基づいて下位ノードのまとまり度を求めてもよい。 At this time, the modeling unit 6 can estimate the mathematical model for each habitat pattern P1 to P3 based on the information of the lower nodes associated with each habitat pattern P1 to P3. In the mathematical model of each habitat pattern P1 to P3, for example, the existence probability of the subordinate items, the cohesiveness of the subordinate nodes of each habitat pattern P1 to P3, or the distribution model of the information associated with each subordinate item can be used. As the information associated with the subordinate items, items or information contents further subordinate to the subordinate items can be used. The degree of cohesion of the lower nodes can be calculated based on the variance of the existence probability of the lower items for each of the habitat patterns P1 to P3. The degree of cohesion of the lower nodes may be obtained based on the average distance from the representative vector of the vectors belonging to each habitat pattern P1 to P3.
 例えば、図11(b)に示すように、生息環境パタンP1において、水質、水深、温度、生息地域および共生という項目の存在確率は、それぞれ1.0、1.0、1.0、0.67、0.33である。この結果、生息環境パタンP1についてのまとまり度は、0.45となる。また、生息環境パタンP1の生息地域という項目において、太平洋という情報内容が0.5の割合で存在し、インド洋という情報内容が0.3の割合で存在するという分布モデルを生成することができる。 For example, as shown in FIG. 11B, in the habitat pattern P1, the existence probabilities of the items of water quality, water depth, temperature, habitat and symbiosis are 1.0, 1.0, 1.0 and 0, respectively. 67, 0.33. As a result, the cohesiveness of the habitat pattern P1 is 0.45. In addition, in the item of habitat area of habitat pattern P1, it is possible to generate a distribution model in which the information content of the Pacific Ocean exists at a ratio of 0.5 and the information content of the Indian Ocean exists at a ratio of 0.3. ..
 また、図11(c)に示すように、生息環境パタンP2において、水質、生息地域および流速という項目の存在確率は、それぞれ1.0、0.5、1.0である。この結果、生息環境パタンP2についてのまとまり度は0.7となる。また、生息環境パタンP2の水質という項目において、淡水という単語は出現するが、汽水および海水という単語は出現しない場合、(淡水、汽水、海水)=(1.0,0.0,0.0)という分布モデルを生成することができる。さらに、生息環境パタンP2の生息地域という項目において、東アジア河川という単語は出現するが、それ以外に河川という単語は出現しない場合、(東アジア河川)=(1.0)という分布モデルを生成することができる。 Further, as shown in FIG. 11C, the existence probabilities of the items of water quality, habitat area and flow velocity in the habitat pattern P2 are 1.0, 0.5 and 1.0, respectively. As a result, the degree of cohesion for the habitat pattern P2 is 0.7. Also, in the item of water quality of habitat pattern P2, if the word freshwater appears but the words brackish water and seawater do not appear, (freshwater, brackish water, seawater) = (1.0, 0.0, 0.0). ) Can be generated. Furthermore, if the word East Asian river appears in the item of habitat pattern P2 habitat, but the word river does not appear other than that, a distribution model of (East Asian river) = (1.0) is generated. can do.
 また、図11(d)に示すように、生息環境パタンP3において、気候および植生という項目の存在確率は、それぞれ1.0、1.0である。この結果、生息環境パタンP3についてのまとまり度は1.0となる。また、生息環境パタンP3の気候という項目において、亜熱帯および温帯という単語が均等に出現する場合、(亜熱帯、温帯)=(0.5,0.5)という分布モデルを生成し、生息環境パタンP3の植生という項目において、竹林および草原という単語が均等に出現する場合、(竹林、草原)=(0.5,0.5)という分布モデルを生成することができる。 Further, as shown in FIG. 11D, the existence probabilities of the items of climate and vegetation in the habitat pattern P3 are 1.0 and 1.0, respectively. As a result, the degree of cohesion for the habitat pattern P3 is 1.0. In addition, when the words subtropical and temperate appear evenly in the item of climate of vegetation pattern P3, a distribution model of (subtropical, temperate) = (0.5,0.5) is generated, and vegetation pattern P3 If the words bamboo grove and grassland appear evenly in the item of vegetation, a distribution model of (bamboo grove, grassland) = (0.5, 0.5) can be generated.
 図12(a)は、図11(b)のパタンP1に基づく下位ノードの分割例を示す図、図12(b)は、図11(c)のパタンP2に基づく下位ノードの分割例を示す図、図12(c)は、図11(d)のパタンP3に基づく下位ノードの分割例を示す図である。 FIG. 12A shows an example of dividing the lower node based on the pattern P1 of FIG. 11B, and FIG. 12B shows an example of dividing the lower node based on the pattern P2 of FIG. 11C. FIG. 12 (c) is a diagram showing an example of division of lower nodes based on the pattern P3 of FIG. 11 (d).
 図12(a)において、図1のノード分割部7は、生息環境パタンP1に紐付く下位ノードの項目を、生息環境パタンP1に特有の具体的な項目に分割する。例えば、ノード分割部7は、生息環境パタンP1の生息地域という項目では、太平洋およびインド洋などの海域を表す情報内容しか現れない場合、生息環境パタンP1の生息地域という項目を海域という項目に変更する。 In FIG. 12A, the node division portion 7 of FIG. 1 divides the items of the lower nodes associated with the habitat pattern P1 into specific items specific to the habitat pattern P1. For example, the node division 7 changes the item of habitat pattern P1 habitat to the item of sea area when only the information content representing the sea area such as the Pacific Ocean and the Indian Ocean appears in the item of habitat pattern P1 habitat. To do.
 また、図12(b)において、ノード分割部7は、生息環境パタンP2の生息地域という項目では、(東アジア河川)=(1.0)という分布モデルを示す場合、生息環境パタンP2の生息地域という項目を河川域という項目に変更する。 Further, in FIG. 12B, when the node division portion 7 shows the distribution model of (East Asian river) = (1.0) in the item of the habitat area of the habitat pattern P2, the habitat of the habitat pattern P2 Change the item "Region" to the item "River area".
 ここで、生息環境パタンP1、P2に紐付く下位ノードの項目を、各生息環境パタンP1、P2に特有の具体的な項目に分割することにより、例えば、コイという淡水魚について論文を書くものとすると、アユおよびメダカの生態に関する生息環境パタンP2を参照することができる。このため、アユおよびメダカ以外のコイという淡水魚について論文を書く場合に、生息環境という項目から流速という項目が抜け落ちるのを防止することが可能となるとともに、生息環境という項目に水深などの余計な項目が挿入されるのを防止することができ、論文の品質を向上させることができる。 Here, by dividing the items of the lower nodes associated with the habitat patterns P1 and P2 into specific items specific to each habitat pattern P1 and P2, for example, it is assumed that a paper is written about a freshwater fish called carp. , Ayu and Medaka ecology habitat patterns P2 can be referred to. For this reason, when writing a dissertation on freshwater fish called carp other than sweetfish and medaka, it is possible to prevent the item of flow velocity from falling out from the item of habitat, and the item of habitat is an extra item such as water depth. Can be prevented from being inserted, and the quality of the paper can be improved.
 また、図12(c)において、ノード分割部7は、生息環境パタンP3の気候という項目では、気候を具体化した情報内容に偏りがない場合、生息環境パタンP3の気候という項目をそのまま維持する。また、ノード分割部7は、生息環境パタンP3の植生という項目では、植生を具体化した情報内容に偏りがない場合、生息環境パタンP3の植生という項目をそのまま維持する。 Further, in FIG. 12C, the node division 7 maintains the item of climate of habitat pattern P3 as it is in the item of climate of habitat pattern P3 when there is no bias in the information content embodying the climate. .. Further, the node dividing unit 7 maintains the item of vegetation of habitat pattern P3 as it is in the item of vegetation of habitat pattern P3 when there is no bias in the information content embodying the vegetation.
 ここで、下位ノードの項目の情報内容に偏りがない場合、その項目の抽象度をそのまま維持することにより、寒帯に生息する動物または森林に生息する動物について論文を書く場合に、生息環境パタンP3を参照することができる。 Here, if there is no bias in the information content of the item of the lower node, by maintaining the abstraction level of the item as it is, when writing a paper about animals living in the boreal zone or animals living in the forest, the habitat pattern P3 Can be referred to.
 図13(a)は、図7(a)の階層構造に基づく下位ノードの統合または分割の対象となる所定ノードのその他の抽出例を示す図、図13(b)は、図8(b)の階層構造に基づく下位ノードの統合または分割の対象となる所定ノードのその他の抽出例を示す図である。 13 (a) is a diagram showing another extraction example of a predetermined node to be integrated or divided of lower nodes based on the hierarchical structure of FIG. 7 (a), and FIG. 13 (b) is FIG. 8 (b). It is a figure which shows the other extraction example of the predetermined node which is the target of integration or division of the lower node based on the hierarchical structure of.
 図13(a)において、図1のノード抽出部3は、図1の分類部5の処理結果が反映されたノードの階層構造から所定ノードを抽出する。例えば、ノード抽出部3は、図7(a)のノードN121の生息環境という項目の下位ノードN131~N135をパタンPAとしてノードN121の情報内容として設定する。このとき、生態という項目が割り当てられたノードN111の下位ノードN121~N125の階層は1段となる。このため、ノード抽出部3は、下位ノードの階層が1段以下のノードを抽出することにより、所定ノードとして生態という項目が割り当てられたノードN111を抽出することができる。この結果、図7(a)の階層構造からは抽出されなかった生態という項目についても、分類部5によるパタン分類の対象とすることができる。 In FIG. 13A, the node extraction unit 3 of FIG. 1 extracts a predetermined node from the hierarchical structure of the node reflecting the processing result of the classification unit 5 of FIG. For example, the node extraction unit 3 sets the lower nodes N131 to N135 of the item of the habitat of the node N121 in FIG. 7A as the pattern PA as the information content of the node N121. At this time, the hierarchy of the lower nodes N121 to N125 of the node N111 to which the item of ecology is assigned becomes one stage. Therefore, the node extraction unit 3 can extract the node N111 to which the item of ecology is assigned as a predetermined node by extracting the nodes whose lower node hierarchy is one stage or less. As a result, the item of ecology that was not extracted from the hierarchical structure of FIG. 7A can also be subject to pattern classification by the classification unit 5.
 また、図13(b)において、ノード抽出部3は、図8(b)のノードN432の生息環境という項目の下位ノードN441、N442をパタンPBとしてノードN432の情報内容として設定する。このとき、生息地域という項目が割り当てられたノードN421の下位ノードN431、N432の階層は1段となる。このため、ノード抽出部3は、下位ノードの階層が1段以下のノードを抽出することにより、所定ノードとして生息地域という項目が割り当てられたノードN421を抽出することができる。この結果、図8(b)の階層構造からは抽出されなかった生息地域という項目についても、分類部5によるパタン分類の対象とすることができる。 Further, in FIG. 13B, the node extraction unit 3 sets the lower nodes N441 and N442 of the item of the habitat of the node N432 in FIG. 8B as the pattern PB as the information content of the node N432. At this time, the hierarchy of the lower nodes N431 and N432 of the node N421 to which the item of habitat is assigned becomes one stage. Therefore, the node extraction unit 3 can extract the node N421 to which the item of habitat is assigned as a predetermined node by extracting the nodes whose lower node hierarchy is one stage or less. As a result, the item of habitat that was not extracted from the hierarchical structure of FIG. 8B can also be subject to pattern classification by the classification unit 5.
 図14は、実施形態に係る下位構造に基づくパタン分類と数理モデル化処理を示すフローチャートである。
 図14において、分析対象のノード名Nと、ノード名Nの下位ノード名リストを取得する(S11)。
FIG. 14 is a flowchart showing pattern classification and mathematical modeling processing based on the substructure according to the embodiment.
In FIG. 14, the node name N to be analyzed and the subordinate node name list of the node name N are acquired (S11).
 次に、全ドキュメントからノード名Nのノードを抽出し、ノードiの下位ノードベクトルをvとして、抽出した全ノードにそれぞれ対応する下位ノードベクトルvを算出する(S12)。下位ノードベクトルvは、ノード名Nのあるノードiに実際に紐付く下位ノード群と、ノード名Nの下位ノード名リストMに記載されるノード情報との対応関係を数値化したベクトルである。 Then extracted node node name N from all documents, the lower node vector of the node i as v i, extracting corresponding respectively to all nodes that calculates a lower node vector v i (S12). The lower node vector v i is a vector that quantifies the correspondence between the lower node group actually associated with the node i having the node name N and the node information described in the lower node name list M of the node name N. ..
 次に、下位ノードベクトルvをクラスタリングして、全ドキュメントから抽出したノード名NのノードをK(Kは正の整数)個のグループに分類する(S13)。クラスタリングは、任意のクラスタリング手法を用いることができる。例えば、予め分類数を決めてK-means法で分類し、ベクトル間の類似度に関する閾値を任意に設定して階層型クラスタリングを実施することができる。 Next, clustering descendants vector v i, node K of the node name N extracted from all documents (K is a positive integer) are classified into pieces of a group (S13). Any clustering method can be used for clustering. For example, the number of classifications can be determined in advance, classified by the K-means method, and the threshold value regarding the similarity between vectors can be arbitrarily set to perform hierarchical clustering.
 次に、k=1に設定する(S14)。 Next, set k = 1 (S14).
 次に、k番目のクラスタリンググループに属するノードをノード名Nのkパターンノードとして、グループidであるkを付与する(S15)。 Next, the nodes belonging to the k-th cluster group as k pattern node of the node name N, imparts k N is the group id (S15).
 次に、kグループのノードの下位ノードベクトルvの平均ベクトルを、kグループのノードに各下位ノードが紐付く確率ベクトルPkNとして算出する(S16)。 Then, k the mean vector of the lower node vectors v i of N groups of nodes, k N each lower node to a node of the group is calculated as the cord attached probability vector P kN (S16).
 また、kグループのノードのバラツキ度の指標σkNとして、kグループのノードの下位ノードベクトルvの分散を計算する(S17)。 Furthermore, as k N group node variation of the index sigma kN of calculating the variance of the lower node vectors v i of k N group of nodes (S17).
 次に、kグループのノードの各下位ノードに実際に格納される情報内容の数理モデルを推定する(S18)。 Next, a mathematical model of the information content actually stored in each subnode of the node of the kN group is estimated (S18).
 次に、k=k+1に設定する(S19)。 Next, set k = k + 1 (S19).
 次に、k≦Kかどうかを判断する(S20)。k≦Kの場合、S15の処理に戻り、k≦Kでない場合、処理を終了する。 Next, it is determined whether k ≦ K (S20). If k ≦ K, the process returns to S15, and if k ≦ K, the process ends.
 図15は、図14のS18の処理の具体例を示すフローチャートである。
 図15において、kグループのノードに紐付く下位ノード名数M(Mは正の整数)を取得する(S31)。
FIG. 15 is a flowchart showing a specific example of the process of S18 in FIG.
In Figure 15, k N lower node name number tied to a node of the group M (M is a positive integer) to acquire (S31).
 次に、m=1に設定する(S32)。 Next, set m = 1 (S32).
 次に、kグループのノードs(s=1,…,S(Sは正の整数))に関して、その下位ノードmに格納されている情報内容y (s=1,…,S)を抽出する(S33)。このとき、ノードsが、下位ノードmと紐付いていない場合、y =0とする。 Next, the node s of k N group (s = 1, ..., S (S is a positive integer)) with respect to the information content y s m stored in the lower node m (s = 1, ..., S) Is extracted (S33). At this time, if the node s is not associated with the lower node m, y s m = 0.
 次に、ベースとなる数理モデルYが存在するか判断する(S34)。例えば、身長という項目が割り当てられたノードのベースモデルは、正規分布とすることができる。 Next, it is determined whether or not the base mathematical model Y m exists (S34). For example, the base model of a node to which the item height is assigned can be a normal distribution.
 次に、ベースとなる数理モデルYが存在する場合、情報内容y に基づき数理モデルYのパラメータを計算し、kグループの下位ノードmの情報内容y に関する数理モデルY kNを算出し(S35)、S40に進む。 Then, if the mathematical model Y m as a base is present, the information content y s parameters of the mathematical model Y m calculated on the basis of m, mathematical models for k information content lower nodes m of N group y s m Y m Calculate kN (S35) and proceed to S40.
 一方、ベースとなる数理モデルYが存在しない場合、下位ノードmに格納され得る各要素zの情報内容y における存在確率p kN(z)を算出する(S36)。 On the other hand, if the mathematical model Y m as a base is absent, it calculates the presence probability p m kN (z) in the information content y s m of each element z, which may be stored in the lower node m (S36).
 次に、情報内容y に同時に複数の要素zが存在するかを判断する(S37)。 Next, the information content y s m a plurality of elements z determines whether there simultaneously (S37).
 情報内容y に同時に複数の要素zが存在する場合、それらの要素zについてz*p kN(z)の総和をとることで数理モデルY kNを算出し(S38)、S40に進む。 When a plurality of elements z exist at the same time in the information content y s m , the mathematical model Y m kN is calculated by taking the sum of z * p m kN (z) for those elements z (S38), and the process proceeds to S40. ..
 一方、情報内容y に同時に複数の要素zが存在しない場合、存在確率p kN(z)の全要素zに関するベクトルP kNを、数理モデルY kNに格納する(S39)。 On the other hand, when a plurality of elements z do not exist at the same time in the information content y s m , the vector P m kN relating to all the elements z of the existence probability p m kN (z) is stored in the mathematical model Y m kN (S39).
 次に、m=m+1に設定する(S40)。 Next, set m = m + 1 (S40).
 次に、m≦Mかどうかを判断する(S41)。m≦Mの場合、S32の処理に戻り、m≦Mでない場合、処理を終了する。 Next, it is determined whether m ≦ M (S41). If m ≦ M, the process returns to S32, and if m ≦ M, the process ends.
 図16は、実施形態に係る抽象度再設定に基づくノードの分割処理の一例を示すフローチャートである。
 図16において、kが異なる複数のグループkにおいて、それぞれD(Dは正の整数)個以上のデータを有する下位ノードu(u=1,…,U(Uは正の整数))を抽出する(S51)。Dは、任意に設定した閾値である。例えば、D=1としたとき、図11(c)の例では、水質と生息地域のノードが抽出される。
FIG. 16 is a flowchart showing an example of node division processing based on the abstraction degree reset according to the embodiment.
In FIG. 16, in a plurality of groups k N having different k, lower nodes u (u = 1, ..., U (U is a positive integer)) each having D (D is a positive integer) or more data are extracted. (S51). D is an arbitrarily set threshold value. For example, when D = 1, in the example of FIG. 11C, the nodes of water quality and habitat are extracted.
 次に、u=1に設定する(S52)。 Next, set u = 1 (S52).
 次に、グループ間で下位ノードuに格納され得る要素(p>0となる要素)を比較し、対象グループの要素を説明するが、他グループの要素を含まない最大抽象度の概念名に対象グループに関するノードuのノード名を再設定する(S53)。 Next, the elements that can be stored in the lower node u (elements with p> 0) are compared between the groups, and the elements of the target group are explained, but the target is the concept name of the maximum abstraction that does not include the elements of other groups. The node name of the node u related to the group is reset (S53).
 次に、u=u+1に設定する(S54)。 Next, set u = u + 1 (S54).
 次に、u≦Uかどうかを判断する(S55)。u≦Uの場合、S52の処理に戻り、u≦Uでない場合、処理を終了する。 Next, it is determined whether u ≦ U (S55). If u ≦ U, the process returns to S52, and if u ≦ U, the process ends.
 図17は、実施形態に係る抽象度再設定に基づくノードの分割処理のその他の例を示すフローチャートである。
 図17において、m=1に設定する(S61)。
FIG. 17 is a flowchart showing another example of the node division process based on the abstraction degree reset according to the embodiment.
In FIG. 17, m = 1 is set (S61).
 次に、ノード名Nの下位ノードmについて、抽象化前に付与されていたノード名リストLを取得する(S62)。 Next, for the lower node m of the node name N, the node name list L given before the abstraction is acquired (S62).
 次に、k=1に設定する(S63)。 Next, set k = 1 (S63).
 次に、リストLのノード名と、グループkにおける下位ノードmに格納される情報内容y を比較し、情報内容y を包含するノード名のうち最も抽象度の低いノード名を下位ノードmのノード名に再設定する(S64)。 Next, the node name in the list L, to compare the information content y s m which is stored in the lower node m in the group k N, the node name lowest abstract of information content y s m encompassing the node name Reset to the node name of the lower node m (S64).
 次に、k=k+1に設定する(S65)。 Next, set k = k + 1 (S65).
 次に、k≦Kかどうかを判断する(S66)。k≦Kの場合、S63の処理に戻り、k≦Kでない場合、S67の処理に進む。 Next, it is determined whether k ≦ K (S66). If k ≦ K, the process returns to S63, and if k ≦ K, the process proceeds to S67.
 次に、m=m+1に設定する(S67)。 Next, set m = m + 1 (S67).
 次に、m≦Mかどうかを判断する(S68)。m≦Mの場合、S61の処理に戻り、m≦Mでない場合、処理を終了する。 Next, it is determined whether m ≦ M (S68). If m ≦ M, the process returns to S61, and if m ≦ M, the process ends.
 図18は、実施形態に係る抽象度再設定に基づくノードの分割処理のさらにその他の例を示すフローチャートである。
 図18において、m=1に設定する(S71)。
FIG. 18 is a flowchart showing still another example of the node division process based on the abstraction degree reset according to the embodiment.
In FIG. 18, m = 1 is set (S71).
 次に、k=1に設定する(S72)。 Next, set k = 1 (S72).
 次に、グループkにおける下位ノードmに格納される情報内容y が0でないデータに含まれる要素の個数Xを算出する(S73)。個数Xでは、情報内容y に複数の要素が存在する場合はそれらを全て足す。 Next, the number X of the elements included in the data in which the information content y s m stored in the lower node m in the group k N is not 0 is calculated (S73). In the number X, if a plurality of elements exist in the information content y s m , all of them are added.
 次に、ノードmと概念的に同等又は下位に位置するノードoに関して、ベースとなる数理モデルYが存在するかどうかを判断する(S74)。 Next, it is determined whether or not the base mathematical model Yo exists for the node o that is conceptually equivalent to or lower than the node m (S74).
 ベースとなる数理モデルYが存在する場合、要素の個数がX個のデータ集合が各数理モデルYに属するかどうかを判断するための閾値を取得する(S75)。 If the underlying mathematical model Y o is present, the number of elements X data set to obtain a threshold value for determining whether belonging to each mathematical model Y o (S75).
 次に、各数理モデルYに対して、グループkにおける下位ノードmに格納される情報内容y が属する確率を算出し、閾値を下回る数理モデルYのうち最も抽象度の低いノードoを下位ノードmのノード名に再設定し(S76)、S79の処理に進む。 Then, for each mathematical model Y o, a group k N calculates the probability that information content y s m which is stored in the lower node m belongs in the lowest level of abstraction node of mathematical models Y o below the threshold Reset o to the node name of the lower node m (S76), and proceed to the process of S79.
 一方、ベースとなる数理モデルYが存在しない場合、要素の個数がX個のデータ集合がある概念に属するかどうかを判断する基準となる対象概念の要素種類数の閾値を取得する(S77)。 On the other hand, when the base mathematical model Yo does not exist, the threshold value of the number of element types of the target concept as a reference for determining whether the number of elements belongs to a certain concept is acquired (S77). ..
 次に、グループkにおける下位ノードmに格納される情報内容y を包含するノードのうち、閾値を下回る要素種類数のノードの中で最も下位ノードとなるノードoを下位ノードmのノード名に再設定する(S78)。 Next, among the nodes including the information content y s m stored in the lower node m in the group k N, the node o which is the lowest node among the nodes having the number of element types lower than the threshold value is the node of the lower node m. Reset to the first name (S78).
 次に、k=k+1に設定する(S79)。 Next, set k = k + 1 (S79).
 次に、k≦Kかどうかを判断する(S80)。k≦Kの場合、S72の処理に戻り、k≦Kでない場合、S81の処理に進む。 Next, it is determined whether k ≦ K (S80). If k ≦ K, the process returns to S72, and if k ≦ K, the process proceeds to S81.
 次に、m=m+1に設定する(S81)。 Next, set m = m + 1 (S81).
 次に、m≦Mかどうかを判断する(S82)。m≦Mの場合、S71の処理に戻り、m≦Mでない場合、処理を終了する。 Next, it is determined whether m ≦ M (S82). If m ≦ M, the process returns to S71, and if m ≦ M, the process ends.
 図19は、図1の情報管理装置のハードウェア構成例を示すブロック図である。
 図19において、情報管理装置101は、プロセッサ11、通信制御デバイス12、通信インタフェース13、主記憶デバイス14および外部記憶デバイス15を備える。プロセッサ11、通信制御デバイス12、通信インタフェース13、主記憶デバイス14および外部記憶デバイス15は、内部バス16を介して相互に接続されている。主記憶デバイス14および外部記憶デバイス15は、プロセッサ11からアクセス可能である。
FIG. 19 is a block diagram showing a hardware configuration example of the information management device of FIG.
In FIG. 19, the information management device 101 includes a processor 11, a communication control device 12, a communication interface 13, a main storage device 14, and an external storage device 15. The processor 11, the communication control device 12, the communication interface 13, the main storage device 14, and the external storage device 15 are connected to each other via the internal bus 16. The main storage device 14 and the external storage device 15 are accessible from the processor 11.
 また、情報管理装置101の外部には、入力装置20および出力装置21が設けられている。入力装置20および出力装置21は、入出力インタフェース17を介して内部バス16に接続されている。入力装置20は、例えば、キーボード、マウス、タッチパネル、カードリーダ、音声入力装置等である。出力装置21は、例えば、画面表示装置(液晶モニタ、有機EL(Electro Luminescence)ディスプレイ、グラフィックカード等)、音声出力装置(スピーカ等)、印字装置等である。 Further, an input device 20 and an output device 21 are provided outside the information management device 101. The input device 20 and the output device 21 are connected to the internal bus 16 via the input / output interface 17. The input device 20 is, for example, a keyboard, a mouse, a touch panel, a card reader, a voice input device, or the like. The output device 21 is, for example, a screen display device (liquid crystal monitor, organic EL (Electro Luminescence) display, graphic card, etc.), an audio output device (speaker, etc.), a printing device, and the like.
 プロセッサ11は、情報管理装置101全体の動作制御を司るハードウェアである。プロセッサ11は、CPU(Central Processing Unit)であってもよいし、GPU(Graphics Processing Unit)であってもよい。プロセッ11は、シングルコアロセッサであってもよいし、マルチコアロセッサであってもよい。プロセッサ11は、処理の一部または全部を行うハードウェア回路(例えば、FPGA(Field-Programmable Gate Array)またはASIC(Application Specific Integrated Circuit))を備えていてもよい。プロセッサ11は、ニューラルネットワークを備えていてもよい。 The processor 11 is hardware that controls the operation of the entire information management device 101. The processor 11 may be a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit). The processor 11 may be a single core losser or a multi-core losser. The processor 11 may include a hardware circuit (for example, FPGA (Field-Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)) that performs a part or all of the processing. The processor 11 may include a neural network.
 主記憶デバイス14は、例えば、SRAMまたはDRAMなどの半導体メモリから構成することができる。主記憶デバイス14には、プロセッサ11が実行中のプログラムを格納したり、プロセッサ11がプログラムを実行するためのワークエリアを設けたりすることができる。 The main storage device 14 can be composed of, for example, a semiconductor memory such as SRAM or DRAM. The main storage device 14 can store a program being executed by the processor 11 or provide a work area for the processor 11 to execute the program.
 外部記憶デバイス15は、大容量の記憶容量を備える記憶デバイスであり、例えば、ハードディスク装置またはSSD(Solid State Drive)である。外部記憶デバイス15は、各種プログラムの実行ファイルやプログラムの実行に用いられるデータを保持することができる。外部記憶デバイス15には、情報管理プログラム15Aを格納することができる。情報管理プログラム15Aは、情報管理装置101にインストール可能なソフトウェアであってもよいし、情報管理装置101にファームウェアとして組み込まれていてもよい。 The external storage device 15 is a storage device having a large storage capacity, and is, for example, a hard disk device or an SSD (Solid State Drive). The external storage device 15 can hold an executable file of various programs and data used for executing the program. The information management program 15A can be stored in the external storage device 15. The information management program 15A may be software that can be installed in the information management device 101, or may be incorporated as firmware in the information management device 101.
 通信制御デバイス12は、外部との通信を制御する機能を備えるハードウェアである。通信制御デバイス12は、通信インタフェース13を介してネットワーク19に接続される。ネットワーク19は、インターネットなどのWAN(Wide Area Network)であってもよいし、WiFiまたはイーサネット(登録商標)などのLAN(Local Area Network)であってもよいし、WANとLANが混在していてもよい。 The communication control device 12 is hardware having a function of controlling communication with the outside. The communication control device 12 is connected to the network 19 via the communication interface 13. The network 19 may be a WAN (Wide Area Network) such as the Internet, a LAN (Local Area Network) such as WiFi or Ethernet (registered trademark), or a mixture of WAN and LAN. May be good.
 入出力インタフェース17は、入力装置20から入力されるデータをプロセッサ11が処理可能なデータ形式に変換したり、プロセッサ11から出力されるデータを出力装置21が処理可能なデータ形式に変換したりする。 The input / output interface 17 converts the data input from the input device 20 into a data format that can be processed by the processor 11, and converts the data output from the processor 11 into a data format that can be processed by the output device 21. ..
 プロセッサ11が情報管理プログラム15Aを主記憶デバイス14に読み出し、情報管理プログラム15Aを実行することにより、概念化された情報に割り当てられたノードの階層構造から所定ノードを抽出し、その所定ノードに紐付けられた下位ノードの情報に基づいて所定ノードを分類することができる。このとき、プロセッサ11は、図1の項目抽出部1、ノード候補生成部2、ノード抽出部3、ノード統合部4、分類部5、モデル化部6およびノード分割部7の機能を実現することができる。 The processor 11 reads the information management program 15A into the main storage device 14 and executes the information management program 15A to extract a predetermined node from the hierarchical structure of the nodes assigned to the conceptualized information and associate it with the predetermined node. A predetermined node can be classified based on the information of the subordinate node. At this time, the processor 11 realizes the functions of the item extraction unit 1, the node candidate generation unit 2, the node extraction unit 3, the node integration unit 4, the classification unit 5, the modeling unit 6, and the node division unit 7 in FIG. Can be done.
 なお、情報管理プログラム15Aの実行は、複数のプロセッサやコンピュータに分担させてもよい。あるいは、プロセッサ11は、ネットワーク19を介してクラウドコンピュータなどに情報管理プログラム15Aの全部または一部の実行を指示し、その実行結果を受け取るようにしてもよい。 The execution of the information management program 15A may be shared by a plurality of processors and computers. Alternatively, the processor 11 may instruct a cloud computer or the like to execute all or a part of the information management program 15A via the network 19 and receive the execution result.
 本発明は、上記した実施形態に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施形態は、本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施形態の構成の一部を他の実施形態の構成に置き換えることが可能であり、また、ある実施形態の構成に他の実施形態の構成を加えることも可能である。また、各実施形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。 The present invention is not limited to the above-described embodiment, and includes various modifications. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to those having all the described configurations. Further, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Further, it is possible to add / delete / replace a part of the configuration of each embodiment with another configuration. Further, each of the above configurations, functions, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit.
 1 項目抽出部、2 ノード候補生成部、3 ノード抽出部、4 ノード統合部、5 分類部、6 モデル化部、7 ノード分割部、8 シソーラス辞書、9 概念モデル

 
1 item extraction unit, 2 node candidate generation unit, 3 node extraction unit, 4 node integration unit, 5 classification unit, 6 modeling unit, 7 node division unit, 8 thesaurus dictionary, 9 conceptual model

Claims (15)

  1.  概念化された情報に割り当てられたノードの階層構造から所定ノードを抽出する抽出部と、
     前記抽出部にて抽出された前記所定ノードに紐付けられた下位ノードの情報に基づいて、前記抽出部で抽出された前記所定ノードを分類する分類部とを備える情報管理装置。
    An extraction unit that extracts a predetermined node from the hierarchical structure of the nodes assigned to the conceptualized information,
    An information management device including a classification unit that classifies the predetermined node extracted by the extraction unit based on the information of a lower node associated with the predetermined node extracted by the extraction unit.
  2.  前記ノードは、ドキュメントの項目が割り当てられる請求項1に記載の情報管理装置。 The information management device according to claim 1, wherein the node is assigned a document item.
  3.  前記所定ノードは、前記下位ノードの階層が1段以下のノードである請求項1に記載の情報管理装置。 The information management device according to claim 1, wherein the predetermined node is a node in which the hierarchy of the lower node is one stage or less.
  4.  前記抽出部は、前記分類部で同一グループに分類された前記所定ノードおよび前記所定ノードに紐付く下位ノードを単一ノードとみなす請求項3に記載の情報管理装置。 The information management device according to claim 3, wherein the extraction unit considers the predetermined node classified into the same group in the classification unit and a lower node associated with the predetermined node as a single node.
  5.  前記分類部は、前記所定ノードに紐付く前記下位ノードの概念の組み合わせに基づいて、前記所定ノードを分類する請求項1に記載の情報管理装置。 The information management device according to claim 1, wherein the classification unit classifies the predetermined node based on a combination of concepts of the lower node associated with the predetermined node.
  6.  前記抽出部は、前記所定ノードとして第1ノードおよび第2ノードを抽出し、
     前記分類部は、前記第1ノードに紐付く下位ノードの概念が、前記第2ノードに紐付く下位ノードの概念となり得ない場合、前記第1ノードを前記第2ノードと異なるグループに分類する請求項1に記載の情報管理装置。
    The extraction unit extracts the first node and the second node as the predetermined node,
    The classification unit requests that the first node be classified into a group different from the second node when the concept of the lower node associated with the first node cannot be the concept of the lower node associated with the second node. Item 1. The information management device according to item 1.
  7.  前記分類部による分類された前記所定ノードに紐付く前記下位ノードの情報に基づいて、前記下位ノードの紐付き方をモデル化するモデル化部をさらに備える請求項1に記載の情報管理装置。 The information management device according to claim 1, further comprising a modeling unit that models how to associate the lower node based on the information of the lower node associated with the predetermined node classified by the classification unit.
  8.  前記モデル化部は、前記分類部にて同一グループに分類された前記所定ノードに紐付く前記下位ノードの紐付き方のパタンを生成する請求項7に記載の情報管理装置。 The information management device according to claim 7, wherein the modeling unit generates a pattern of how the lower nodes are associated with the predetermined nodes classified in the same group by the classification unit.
  9.  前記所定ノードの分類結果に基づいて、前記所定ノードに紐付く下位ノードの概念を分割する分割部をさらに備える請求項1に記載の情報管理装置。 The information management device according to claim 1, further comprising a dividing unit that divides the concept of a lower node associated with the predetermined node based on the classification result of the predetermined node.
  10.  前記分割部は、異なるグループに分類された前記所定ノードにそれぞれ紐付く下位ノードの概念を、各グループに特有の具体化された概念に分割する請求項9に記載の情報管理装置。 The information management device according to claim 9, wherein the division unit divides the concept of a subordinate node associated with each of the predetermined nodes classified into different groups into a concrete concept peculiar to each group.
  11.  前記所定ノードに紐付く下位ノードの概念の抽象度を統合する統合部をさらに備える請求項1に記載の情報管理装置。 The information management device according to claim 1, further comprising an integration unit that integrates the degree of abstraction of the concept of a lower node associated with the predetermined node.
  12.  前記統合部は、前記下位ノードの情報内容に基づいて、前記下位ノードの概念を統合する請求項11に記載の情報管理装置。 The information management device according to claim 11, wherein the integration unit integrates the concept of the lower node based on the information content of the lower node.
  13.  形態素解析および類語分析に基づいて、前記ドキュメントから抽出された同一概念の項目の名称を統一するノード候補生成部をさらに備える請求項2に記載の情報管理装置。 The information management device according to claim 2, further comprising a node candidate generation unit that unifies the names of items of the same concept extracted from the document based on morphological analysis and synonym analysis.
  14.  プロセッサにて実行される情報管理方法であって、
     前記プロセッサは、
     項目が割り当てられたノードの階層構造から所定ノードを抽出し、
     前記所定ノードに紐付けられた下位ノードの項目に基づいて前記所定ノードを分類する情報管理方法。
    Information management method executed by the processor
    The processor
    Extract the specified node from the hierarchical structure of the node to which the item is assigned,
    An information management method for classifying the predetermined node based on the item of the lower node associated with the predetermined node.
  15.  前記プロセッサは、
     形態素解析および類語分析に基づいて、ドキュメントから抽出された同一概念の項目の名称を統一し、
     前記所定ノードに紐付く下位ノードの項目の抽象度を統合し、
     前記抽象度が統合された下位ノードの項目に基づいて前記所定ノードを分類し、
     前記所定ノードに紐付く前記下位ノードの情報に基づいて、前記下位ノードの紐付き方のモデルを推定し、
     前記下位ノードの紐付き方のモデルに基づいて、前記所定ノードに紐付く下位ノードの項目を分割する請求項14に記載の情報管理方法。

     
    The processor
    Based on morphological analysis and synonym analysis, unify the names of items of the same concept extracted from the document,
    Integrate the abstraction level of the items of the lower node associated with the predetermined node,
    The predetermined node is classified based on the item of the lower node in which the abstraction level is integrated.
    Based on the information of the lower node associated with the predetermined node, a model of how the lower node is associated is estimated.
    The information management method according to claim 14, wherein the items of the lower node associated with the predetermined node are divided based on the model of the lower node association method.

PCT/JP2020/008353 2019-03-22 2020-02-28 Information management device and information management method WO2020195545A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-054851 2019-03-22
JP2019054851A JP7099976B2 (en) 2019-03-22 2019-03-22 Information management device and information management method

Publications (1)

Publication Number Publication Date
WO2020195545A1 true WO2020195545A1 (en) 2020-10-01

Family

ID=72559317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/008353 WO2020195545A1 (en) 2019-03-22 2020-02-28 Information management device and information management method

Country Status (2)

Country Link
JP (1) JP7099976B2 (en)
WO (1) WO2020195545A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009136426A1 (en) * 2008-05-08 2009-11-12 三菱電機株式会社 Search query providing equipment
JP2010501947A (en) * 2006-08-31 2010-01-21 スウィーニー,ピーター System, method and computer program for consumer-defined information architecture
US20160062993A1 (en) * 2014-08-21 2016-03-03 Samsung Electronics Co., Ltd. Method and electronic device for classifying contents
JP2016139229A (en) * 2015-01-27 2016-08-04 日本放送協会 Device and program for generating personal profile, and content recommendation device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010501947A (en) * 2006-08-31 2010-01-21 スウィーニー,ピーター System, method and computer program for consumer-defined information architecture
WO2009136426A1 (en) * 2008-05-08 2009-11-12 三菱電機株式会社 Search query providing equipment
US20160062993A1 (en) * 2014-08-21 2016-03-03 Samsung Electronics Co., Ltd. Method and electronic device for classifying contents
JP2016139229A (en) * 2015-01-27 2016-08-04 日本放送協会 Device and program for generating personal profile, and content recommendation device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AOKI, CHIZURU ET AL.: "A legal ontology refinement environment using a general ontology and a case ontology", THE 27TH HUMAN INTERFACE AND COGNITIVE MODEL WORKSHOP MATERIAL, 25 March 1996 (1996-03-25), pages 9 - 16 *
ICHISE, RYUTARO ET AL.: "Instance-based hierarchical knowledge source integration", THE 11TH SPECIAL INTEREST GROUP ON AI CHALLENGE, 12 March 2001 (2001-03-12), pages 61 - 66 *
YAMAMOTO, KOUHEI ET AL.: "A hierarchical topic model for expanding category hierarchies , The 6th Forum on Data Engineering and Information Management", THE 12TH DBSJ ANNUAL, 3 May 2014 (2014-05-03), pages 1 - 8, Retrieved from the Internet <URL:http://db-event.jpn.org/deim2014/final/proceedings/C4-6.pdf> *

Also Published As

Publication number Publication date
JP2020154991A (en) 2020-09-24
JP7099976B2 (en) 2022-07-12

Similar Documents

Publication Publication Date Title
Dinh et al. Clustering mixed numerical and categorical data with missing values
Gupta et al. Scalable machine‐learning algorithms for big data analytics: a comprehensive review
Wang et al. Locating structural centers: A density-based clustering method for community detection
Alinezhad et al. Community detection in attributed networks considering both structural and attribute similarities: two mathematical programming approaches
Pan et al. Clustering of designers based on building information modeling event logs
CN102609528A (en) Frequent mode association sorting method based on probabilistic graphical model
Laclavík et al. Emails as graph: relation discovery in email archive
Ye et al. A web services classification method based on GCN
Praveen et al. A novel approach to improve the performance of divisive clustering-BST
Lee et al. Learning multi-resolution representations of research patterns in bibliographic networks
Xiao et al. A survey of parallel clustering algorithms based on spark
Boden et al. MiMAG: mining coherent subgraphs in multi-layer graphs with edge labels
Wang et al. Link prediction in heterogeneous collaboration networks
Bernard et al. Contextual and behavioral customer journey discovery using a genetic approach
Jiang et al. A Chinese expert disambiguation method based on semi-supervised graph clustering
Levatić et al. Semi‐Supervised Predictive Clustering Trees for (Hierarchical) Multi‐Label Classification
Jiménez et al. A clustering approach to extract data from HTML tables
Nath Style change detection by threshold based and window merge clustering methods.
WO2020195545A1 (en) Information management device and information management method
Wang et al. Maximal sub-prevalent co-location patterns and efficient mining algorithms
JP2011003156A (en) Data classification device, data classification method, and data classification program
CN110162580A (en) Data mining and depth analysis method and application based on distributed early warning platform
Sun et al. Graph embedding with rich information through heterogeneous network
Zhu et al. Classification trees for Imbalanced Data: surface-to-volume regularization
JP2009176072A (en) System, method and program for extracting element group

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20779688

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20779688

Country of ref document: EP

Kind code of ref document: A1