WO2015118619A1 - Document analysis system, document analysis method, and document analysis program - Google Patents

Document analysis system, document analysis method, and document analysis program Download PDF

Info

Publication number
WO2015118619A1
WO2015118619A1 PCT/JP2014/052581 JP2014052581W WO2015118619A1 WO 2015118619 A1 WO2015118619 A1 WO 2015118619A1 JP 2014052581 W JP2014052581 W JP 2014052581W WO 2015118619 A1 WO2015118619 A1 WO 2015118619A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
information
lawsuit
investigation
classification code
Prior art date
Application number
PCT/JP2014/052581
Other languages
French (fr)
Japanese (ja)
Inventor
守本 正宏
秀樹 武田
和巳 蓮子
彰晃 花谷
菜々子 吉田
Original Assignee
株式会社Ubic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Ubic filed Critical 株式会社Ubic
Priority to PCT/JP2014/052581 priority Critical patent/WO2015118619A1/en
Priority to TW104103850A priority patent/TW201539217A/en
Publication of WO2015118619A1 publication Critical patent/WO2015118619A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/107Computer-aided management of electronic mailing [e-mailing]

Definitions

  • the present invention relates to a document analysis system, a document analysis method, and a document analysis program.
  • Patent Document 1 discloses a digital document in which a specific person is designated from at least one or more users included in the user information and is accessed based on access history information regarding the specified specific person. Extracts only the information, sets the accompanying information indicating whether each extracted digital document information document file is related to a lawsuit, and outputs a document file related to the lawsuit based on the supplementary information
  • a forensic system is disclosed.
  • Patent Document 2 recorded digital information is displayed, and for each of a plurality of document files, a user identification indicating which of the users included in the user information relates to the user is specified. Information is set, the set user identification information is set to be recorded in the storage unit, at least one user is specified, and the user identification information corresponding to the specified user is set Searches the document file, sets incidental information indicating whether or not the retrieved document file is related to the lawsuit, and outputs the document file related to the lawsuit based on the supplementary information. A forensic system is disclosed.
  • Patent Document 3 accepts designation of at least one or more document files included in the digital document information, accepts designation of which language the designated document file is translated into, and designates the document file for which designation is accepted.
  • Translated into the language that accepted the specification extracted from the digital document information recorded in the recording unit a common document file showing the same content as the specified document file, the extracted common document file was translated
  • a forensic system that generates translation-related information indicating that a document file has been translated by using the translation content of the document file, and outputs a document file related to a lawsuit based on the translation-related information.
  • Patent Document 1 a huge amount of document information of users using a plurality of computers and servers is collected.
  • the present invention has an object to provide a document analysis system, a document analysis method, and a document analysis program for facilitating analysis of document information used in a lawsuit.
  • the document analysis system of the present invention is a document analysis system that acquires information recorded in a predetermined computer or server, and analyzes document information that is included in the acquired information and is composed of a plurality of documents.
  • a generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs is stored for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation is stored in the lawsuit or fraud investigation.
  • An investigation basic database that further stores for each category to which fraud investigation belongs and the generation process model, and further stores time-series information indicating the temporal order of the phases, information related to the lawsuit or fraud investigation, and the generation process Analyzing the document information based on the model and the time series information to indicate the possibility of the predetermined action
  • the document analysis system determines a survey category to be surveyed based on a survey category input receiving unit that receives input of the category of the lawsuit or fraud survey, and a category received by the survey category input receiving unit, A survey type determination unit that extracts a type of necessary information from the survey basic database may be further included.
  • the document analysis system may further include an information extraction unit that extracts keywords and / or sentences included in the document information from the document information as information related to the lawsuit or fraud investigation.
  • the document analysis system may further include a search unit that searches the keywords and / or sentences from the plurality of documents.
  • the document analysis system may further include an automatic classification code assigning unit that automatically assigns a classification code to each of the plurality of documents, and the keyword and / or the sentence may be used for assigning the classification code. it can.
  • the document analysis method of the present invention is a document analysis method for acquiring information recorded in a predetermined computer or server and analyzing document information composed of a plurality of documents included in the acquired information.
  • a generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs is stored for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation is stored in the lawsuit or fraud investigation.
  • Information related to the lawsuit or fraud investigation is further stored by referring to the investigation basic database further storing for each category to which the fraud investigation belongs and the generation process model, and further storing time series information indicating the temporal order of the phases.
  • the document information is analyzed based on the generation process model and the time series information, and the predetermined action occurs.
  • the indicator of the potential contains a calculation step of calculating the result of the analysis.
  • the document analysis program of the present invention is a document analysis program for acquiring information recorded in a predetermined computer or server and analyzing document information composed of a plurality of documents included in the acquired information.
  • the computer stores a generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs, for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation,
  • the lawsuit or fraud investigation is further stored for each category to which the lawsuit or fraud investigation belongs and the generation process model, and the lawsuit or fraud investigation is referred to by referring to a survey basic database further storing time series information indicating the temporal order of the phases.
  • the document information is classified based on related information, the generation process model, and the time series information. And, an index indicating the likelihood that the predetermined action is caused to realize the calculation function to calculate the result of the analysis.
  • the document analysis system the document analysis method, and the document analysis program of the present invention, it is possible to facilitate the analysis of document information used in a lawsuit.
  • FIG. 1 is a block diagram showing a main configuration of a document analysis system according to an embodiment of the present invention.
  • Table showing the list of possible phases in this embodiment (A) is a schematic diagram showing that the process in which the predetermined action occurs is modeled as the generation process model for each phase, and (b) is information related to the lawsuit or fraud investigation, Schematic diagram showing that the lawsuit or fraud investigation is stored for each category and the above generation process model 1 is a detailed configuration diagram of a document analysis system according to an embodiment of the present invention.
  • the chart which shows the flow of a process of the document analysis method concerning embodiment of this invention The chart which shows the flow of a detailed process in the document analysis method concerning embodiment of this invention
  • the chart which shows the flow of the investigation and the classification process according to the investigation type in the document analysis method according to the embodiment of the present invention The chart which shows the flow of predictive coding according to the investigation kind in the document analysis method concerning embodiment of this invention
  • the chart which showed the flow of processing for every step in an embodiment The chart which shows the processing flow of the keyword database in an embodiment
  • the chart which showed the processing flow of the related term database in this embodiment The chart which showed the processing flow of the 1st automatic classification part in this embodiment
  • the graph which showed the analysis result in the document analysis part in this embodiment The chart which showed the processing flow of the 3rd automatic classification part in one example of this embodiment
  • FIG. 1 is a block diagram showing a main configuration of a document analysis system 1 according to an embodiment of the present invention.
  • the document analysis system 1 is a system that acquires information recorded in a predetermined computer or server, and analyzes document information including a plurality of documents included in the acquired information.
  • the document analysis system 1 includes a survey category input reception unit 20, a survey type determination unit 22, an information extraction unit 24, a survey basic database 103, an analysis unit 26, a calculation unit 28, a search unit 30, and The automatic classification code assigning unit 32 is provided.
  • the investigation category input receiving unit 20 receives an input of a lawsuit or fraud investigation category by the user.
  • the category of the lawsuit or fraud investigation represents the nature of the case relating to the lawsuit or fraud investigation. For example, antitrust, patent, foreign bribery prohibition (FCPA), product liability (PL), information It may be a leak or a fictitious claim.
  • FCPA foreign bribery prohibition
  • PL product liability
  • the survey category input reception unit 20 outputs the category to the survey type determination unit 22.
  • the survey type determination unit 22 determines a category to be surveyed based on the category received by the survey category input reception unit 20 and extracts a necessary information type from the survey basic database 103. For example, when the document information is any one of an email, a presentation material, a spreadsheet, a meeting material, a contract, an organization chart, or a business plan, the survey type determination unit 22 needs the email as described above. It outputs to the information extraction part 24 as a kind of information.
  • the information extraction unit 24 extracts a plurality of documents from the document information. Specifically, the information extraction unit 24 uses information input from the survey type determination unit 22 (for example, e-mail, presentation materials, spreadsheet materials, meeting materials, contracts, organization charts, business plans, etc.). The keywords and / or sentences included in the information are extracted as information related to lawsuits or fraud investigations, and the extracted results are stored in the investigation basic database 103.
  • the survey type determination unit 22 for example, e-mail, presentation materials, spreadsheet materials, meeting materials, contracts, organization charts, business plans, etc.
  • the keywords and / or sentences included in the information are extracted as information related to lawsuits or fraud investigations, and the extracted results are stored in the investigation basic database 103.
  • the investigation basic database 103 stores a generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs for each phase classified according to the progress of the predetermined action.
  • the lawsuit may be a lawsuit regarding, for example, antitrust, patent, foreign bribery prohibition (FCPA), product liability (PL), and the like.
  • the fraud investigation may be an investigation relating to information leakage, fictitious billing, and the like.
  • the prescribed actions are related to fraudulent actions such as antitrust, patents, overseas bribery prohibition, product liability, information leakage, and fictitious claims (for example, participating in price adjustment meetings with competitors). It may be an action.
  • FIG. 2 is a table showing a list of possible phases in the present embodiment.
  • the phase is an index indicating each stage in which the predetermined action progresses (classified according to the progress of the predetermined action).
  • the phase “Relationship Building” (relationship building) is a premise of the phase of competition (competition), and is a step of building a relationship with a customer / competition.
  • the “Preparation” phase refers to a stage in which information regarding competition is exchanged with competitors (which may be third parties).
  • the phase of “Competition” refers to the stage of presenting a price to a customer, obtaining feedback, and communicating with the competitor regarding the feedback.
  • the generation process model is based on information related to litigation or fraud investigations (for example, keywords extracted from document information), and a predetermined action subject (organization composed of individuals or multiple persons) It is a model about the process leading to.
  • Examples of the generation process model include a personality pattern model, an action pattern model, and a group pattern model.
  • (A) of FIG. 3 is a schematic diagram showing that the process in which the predetermined action occurs is modeled as the generation process model for each phase.
  • the survey basic database 103 stores the generation process model for each phase.
  • one generation process model is associated with the phase “Relationship Building” (relationship building)
  • another generation process model is associated with the phase “Preparation” (preparation). It is associated. That is, the process in which the predetermined action occurs is modeled as the generation process model for each phase.
  • the investigation basic database 103 further stores information related to the lawsuit or fraud investigation for each category to which the lawsuit or fraud investigation belongs and the generation process model.
  • the information related to the lawsuit or the fraud investigation may be a keyword, a combination of keywords, or meta information extracted from the document information by the information extraction unit 24.
  • the meta information is information indicating a predetermined attribute of the document information. For example, when the document information is an e-mail, the meta information may be a date and time when the e-mail is transmitted / received.
  • FIG. 3B is a schematic diagram showing that information related to the lawsuit or fraud investigation is stored for each category to which the lawsuit or fraud investigation belongs and the generation process model.
  • the investigation basic database 103 stores information related to the lawsuit or fraud investigation for each category to which the lawsuit or fraud investigation belongs and the generation process model. For example, for the category “antitrust” and one generation process model, information related to the lawsuit or the fraud investigation is stored in the investigation basic database 103.
  • the survey basic database 103 further stores time series information.
  • the time series information is information indicating a temporal order of the phases.
  • the time-series information has a phase of “Relationship Building” (relationship building) that has evolved into a phase of “Competition” through a phase of “Preparation” (preparation). It may be information indicating a series of transitions.
  • the analysis unit 26 analyzes the document information based on the information related to the lawsuit or fraud investigation, the generation process model, and the time series information. Specifically, the analysis unit 26 reads information related to the lawsuit or fraud investigation, the generation process model, and the time series information from the investigation basic database 103, and performs morphological analysis and keyword analysis of the investigation target data. To extract an action corresponding to the predetermined action. The analysis unit 26 outputs the analysis result (the extracted predetermined action) to the calculation unit 28.
  • the calculating unit 28 calculates an index (case index) indicating the possibility of the predetermined action from the result of the analysis. Specifically, an increment of an index is arbitrarily set for each predetermined action that causes a lawsuit or fraud investigation, and the calculation unit 28 sets the index corresponding to the extracted predetermined action to the above-described index. Increase by increments. For example, when a predetermined action belonging to the phase “Relationship Building” (relationship building) is extracted, the calculation unit 28 may increase the index corresponding to the predetermined action by one. In the example shown in FIG. 2, the increment of the index for an arbitrary action is set to “1”, but the increment can be arbitrarily set. The upper limit value of the index may be set to 10, for example.
  • the search unit 30 searches the document information for keywords or related terms recorded in the database. That is, the search unit 30 searches the plurality of documents for keywords (for example, words such as “infringement” and “lawsuit”) and / or sentences.
  • keywords for example, words such as “infringement” and “lawsuit”
  • the automatic classification code assigning unit 32 automatically assigns a classification code to each of the plurality of documents. At this time, the keyword and / or the sentence are used for assigning the classification code.
  • the document analysis system 1 it is possible to objectively grasp the risk level of a predetermined action by indexing the possibility of a predetermined action (for example, an illegal action) causing a lawsuit or a fraud investigation. can do.
  • a predetermined action for example, an illegal action
  • the predetermined action can be monitored by reporting according to the movement of the index. Therefore, the document analysis system 1 can facilitate analysis of document information used for a lawsuit.
  • FIG. 4 shows a detailed configuration example of the document analysis system 1 according to the embodiment of the present invention.
  • the document analysis system 1 can include a data storage unit 100 that stores information and data.
  • the data storage unit 100 stores digital information acquired from a plurality of computers or servers in the digital information storage area 101 for use in analysis of lawsuits or fraud investigations.
  • the data storage unit 100 includes, for example, a category attribute, company name, person in charge, which indicates which category of anti-trust, patent, FCPA, PL lawsuit or information leak, and fraud investigation including fictitious claims.
  • Survey basic database 103 for storing the configuration of the custodian and the survey or classification input screen, a specific classification code of the document included in the acquired digital information, a keyword closely related to the specific classification code, and
  • a keyword database 104 for registering keyword correspondence information indicating a correspondence relationship between the specific classification code and the keyword, a predetermined classification code, and a word having a high appearance frequency in the document to which the predetermined classification code is assigned.
  • a database 105 which stores the score calculation database 106 for registering the weighting of words contained in the document in order to calculate a score indicating the strength of the connection between document and sorting code.
  • the survey basic database 103 stores a generation process model in which a predetermined action causing a lawsuit or a fraud investigation occurs for each phase classified according to the progress of the predetermined action.
  • the survey basic database 103 also stores time-series information indicating the temporal order of the phases.
  • the data storage unit 100 stores a report creation database 107 for registering a report format determined according to the category, custodian, and contents of sorting work. As shown in FIG. 4, the data storage unit 100 may be installed in the document analysis system 1 or may be installed outside the document analysis system 1 as a separate storage device.
  • the document analysis system 1 includes a database management unit 109 that manages updating of data contents of a survey basic database 103, a keyword database 104, a related term database 105, a score calculation database 106, and a report creation database 107. Prepare.
  • the database management unit 109 can be connected to the information storage device 902 via a dedicated connection line or the Internet line 901. Then, based on the data contents stored in the information storage device 902, the database management unit 109 stores the data contents of the survey basic database 103, the keyword database 104, the related term database 105, the score calculation database 106, and the report creation database 107. Can be updated.
  • the document analysis system 1 includes the survey category input reception unit 20, the survey type determination unit 22, the information extraction unit 24, the analysis unit 26, the calculation unit 28, and the search unit 30. ing.
  • the automatic classification code assigning unit 32 is realized as a first automatic classification unit 201, a second automatic classification unit, and a third automatic classification unit 401.
  • the document analysis system 1 searches a keyword recorded in the keyword database 104 by the search unit 30 and a score calculation unit 116 that calculates a score indicating the strength of association between a document and a classification code,
  • a document including a keyword is extracted from the document information, a first automatic classification unit 201 that automatically assigns a specific classification code to the extracted document based on the keyword correspondence information, and from the document information to the related term database
  • a document including the recorded related terms is extracted, a score is calculated based on the evaluation value of the related terms included in the extracted document and the number of the related terms, and the score is constant among the documents including the related terms.
  • a second automatic classification unit 301 that automatically assigns a predetermined classification code to a document that exceeds the value based on the score and related term correspondence information It is possible.
  • the document analysis system 1 includes a document display unit 130 that displays a plurality of documents extracted from document information on a screen, and a plurality of documents that are not assigned a classification code extracted from document information.
  • the classification code assigned by the user based on the relevance to the lawsuit is received, and the classification code reception / giving unit 131 for assigning the classification code and the document to which the classification code is given by the classification code reception / giving unit 131 are analyzed.
  • the classification code is obtained.
  • a third automatic sorting unit 401 that automatically applies can be provided.
  • the document analysis system 1 translates the extracted document automatically by accepting the language determination unit 120 that determines the language type of the extracted document and the user's specification.
  • a translation unit 122 may be provided.
  • the language delimiter in the language determination unit 120 is set to be smaller than one sentence so as to be able to cope with a single sentence multilingual compound language. Furthermore, a process of removing an HTML header or the like from a translation target may be performed.
  • the document analysis system 1 in order to perform the analysis by the document analysis unit 118, the classification that each document has based on the type of word, the number of occurrences, and the evaluation value of the word included in each document You may provide the tendency information generation part 124 which produces
  • the document analysis system 1 compares the classification code received by the classification code reception / giving unit 131 with the classification code given by the trend information in the document analysis unit 118, and the classification code reception / granting unit 131. May include a quality inspection unit 501 that verifies the validity of the classification code received.
  • the document analysis system 1 may include a learning unit 601 that learns the weighting of each keyword or related term based on the result of the document analysis processing.
  • the document analysis system 1 includes a report creation unit 701 for outputting an optimal investigation report according to a lawsuit case or an investigation type of fraud investigation based on the result of document analysis processing.
  • Litigation cases include, for example, antitrust (cartel), patents, foreign bribery prohibition (FCPA), or product liability (PL).
  • the fraud investigation includes, for example, information leakage and fictitious claims.
  • the document analysis system 1 can include, for example, a lawyer review reception unit 133 that receives a review of a chief attorney or a chief patent attorney in order to improve the quality of the classification survey and the report.
  • Classification code refers to an identifier used when classifying documents, and indicates the degree of relevance with a lawsuit so that it can be easily used in a lawsuit. For example, when document information is used as evidence in a lawsuit, it may be given according to the type of evidence.
  • Document means data containing one or more words. Examples of “documents” include e-mail, presentation materials, spreadsheet materials, meeting materials, contracts, organization charts, business plans, and the like.
  • “Word” refers to a group of the smallest character strings that have meaning. For example, in a sentence “document means data including one or more words”, the words “document” “one” “more” “word” “include” “data” “say” Is included.
  • Keyword refers to a group of character strings having a certain meaning in a certain language. For example, if a keyword is selected from a sentence “classify a document”, it can be “document”, “classify”, or the like. In the embodiment, keywords such as “infringement”, “lawsuit”, and “patent publication XX” are selected with priority.
  • the keyword includes a morpheme.
  • key correspondence information refers to information indicating the correspondence between a keyword and a specific classification code. For example, if the classification code “important” representing an important document in a lawsuit has a close relationship with the keyword “infringer”, the above “keyword correspondence information” links the classification code “important” with the keyword “infringer”. It may also be information that is managed.
  • a related term refers to a word having an evaluation value equal to or higher than a certain value among words having a high appearance frequency in common with a document to which a predetermined classification code is assigned.
  • the appearance frequency refers to the rate at which related terms appear in the total number of words that appear in one document.
  • evaluation value refers to the amount of information that is exhibited in a document with each word.
  • the “evaluation value” may be calculated based on the amount of transmitted information.
  • the “related term” may refer to the name of the technical field to which the product belongs, the country where the product is sold, the name of a similar product of the product, and the like.
  • “related terms” in the case of assigning the product name of the apparatus that performs the image encoding process as a classification code includes “encoding process”, “Japan”, “encoder”, and the like.
  • “Related term correspondence information” refers to information indicating correspondence between related terms and classification codes. For example, when the classification code “product A” which is the product name related to the lawsuit has a related term “image encoding” which is a function of the product A, the “related term correspondence information” is classified into the classification code “product A”. And the related term “image coding” may be associated with each other and managed.
  • “Score” refers to a document that quantitatively evaluates the strength of connection with a specific classification code. In each embodiment of the present invention, for example, the score is calculated from the words appearing in the document and the evaluation value possessed by each word using the following equation (1).
  • the document analysis system 1 may extract words that frequently appear in documents having a common classification code assigned by the user.
  • the extracted word type, the evaluation value of each word, and the trend information of the number of appearances included in each document are analyzed for each document, and the classification code reception / giving unit 131 does not accept the classification code.
  • a common classification code may be assigned to a document having the same tendency as the analyzed trend information.
  • trend information refers to the degree of similarity between each document and a document to which a classification code is assigned, and is based on the type of word, the number of occurrences, and the word evaluation value included in each document.
  • the degree of relevance with a predetermined classification code For example, when each document is similar in degree of relevance between a document assigned a predetermined classification code and the predetermined classification code, the two documents have the same tendency information.
  • documents having the same evaluation value and the same number of occurrences may be documents having the same tendency.
  • FIG. 5 is a chart showing the flow of processing of the document analysis method (document analysis system control method) according to the embodiment of the present invention.
  • the analysis unit 26 reads the information related to the lawsuit or fraud investigation, the generation process model, and the time series information from the investigation basic database 103 (step 41, hereinafter “step” is abbreviated as “S”). ).
  • the analysis part 26 extracts the action applicable to the said predetermined
  • the calculation part 28 calculates the parameter
  • FIG. 6 is a detailed flowchart of the document analysis method according to the embodiment of the present invention. Note that the flow shown in FIG. 5 may be executed as a process independent of the flow shown in FIG. 6, or may be executed as a process included in any part of the flow shown in FIG. .
  • the use database such as the survey basic database and the document analysis database can be specified (S12).
  • the information storage device may be installed inside an organization that performs sorting or may be installed outside the organization. As a case where the information storage device is installed outside the organization, for example, there is a case where the information storage device is installed in an affiliated law firm or patent office.
  • the usage database such as the survey basic database and the document analysis database can be updated to the guideline database (S14).
  • the updated survey basic database is searched (S15), and the name of the company, the person in charge, and the custodian can be presented on the screen of the display device (S16).
  • the document analysis system can accept the user's correction input and specify the names of the actual person in charge and the custodian (S17).
  • digital document information can be extracted in order to perform document analysis work (S18).
  • the updated document analysis database the updated keyword database, related term database, and score calculation database can be searched (S19), and a classification code can be assigned to the extracted document information (S20).
  • the classification code by the reviewer can be received and the classification code can be given to the extracted document information (S21).
  • the database can be searched using the classification result as teacher data, and a classification code can be assigned to the extracted document information (S22).
  • the category is specified by the user's argument designation (S24), and the report creation database can be specified according to the specified category (S25).
  • the format of the report can be determined by the identified report creation database, and the report can be automatically output (S26).
  • FIG. 7 is a chart showing a flow of investigation and classification processing according to the investigation type in the document analysis method according to the embodiment of the present invention.
  • the survey type can be input (S31).
  • the user will try to carry out from a fraud investigation including antitrust, patents, litigation cases including overseas bribery prohibition (FCPA), product liability (PL) or information leakage, fictitious claims, etc. Enter the category corresponding to the survey and sorting work.
  • the document analysis system can accept a user category input and specify a category to be investigated.
  • the type of survey and document analysis processing and the type of database to be used can be determined (S32).
  • information stock stored in a usage database such as a survey basic database or a document analysis database may be accessed (S33).
  • the survey basic database is accessed according to the specified category, and each keyword input screen corresponding to the specified category can be displayed (S34).
  • the survey basic database is accessed according to the specified category, and keywords or documents can be extracted according to the specified category (S36).
  • the extracted documents and information can be narrowed down by performing a keyword search in the document analysis database (S38).
  • FIG. 8 is a chart showing the flow of predictive coding according to the investigation type in the document analysis method according to the embodiment of the present invention.
  • the document analysis system can ask the user for input according to the type of survey, and can accept the user's input for that. For example, regarding cartels in relation to the antitrust law, user input is requested for target products, parties (name and email address), related organizations (name and department), and time, and user input is accepted. it can. In addition, regarding related organizations, it is possible to request user input regarding competitor companies and customer companies, and accept user input in response to the input (S51).
  • the registration process, the classification process, and the inspection process are performed in the first to fifth stages.
  • keywords and related terms are updated and registered in advance using the results of past classification processing (STEP 100).
  • the keyword and the related term are updated and registered together with the keyword correspondence information and the related term correspondence information which are correspondence information between the classification code and the keyword or the related term.
  • a document including the keyword updated and registered in the first stage is extracted from all document information.
  • the updated keyword correspondence information recorded in the first stage is referred to, and the classification corresponding to the keyword is performed.
  • a first separation process for assigning a code is performed (STEP 200).
  • the document including the related term updated and registered in the first stage is extracted from the document information that has not been given the classification code in the second stage, and the score of the document including the related term is calculated.
  • a second classification process is performed in which a classification code is assigned (STEP 300).
  • the classification code given by the user is accepted for the document information that has not been given the classification code by the third stage, and the classification code accepted from the user is given to the document information.
  • the document information provided with the classification code received from the user is analyzed, the document without the classification code is extracted based on the analysis result, and the third classification for adding the classification code to the extracted document Process. For example, words that frequently appear in documents with a common classification code assigned by the user are extracted, and the types of extracted words, evaluation values possessed by each word, and trend information on the number of appearances are included for each document. And a common classification code is assigned to a document having the same tendency as the trend information (STEP 400).
  • the classification code to be given is determined based on the analyzed trend information for the document to which the user has given the classification code in the fourth stage, and the determined classification code and the classification code given by the user are determined.
  • the validity of the sorting process is verified by comparison (STEP 500). Moreover, you may perform a learning process based on the result of a document analysis process as needed.
  • the trend information used in the fourth and fifth stage processing refers to the degree of similarity between each document and the document to which the classification code is assigned.
  • the type of word included in each document the number of occurrences, This is based on the evaluation value of a word. For example, when each document is similar in degree of relevance between a document assigned a predetermined classification code and the predetermined classification code, the two documents have the same tendency information. In addition, even if the types of words included are different, documents having the same evaluation value and the same number of occurrences may be documents having the same tendency.
  • the keyword database 104 creates a management table for each classification code based on the result of classifying documents in past lawsuits, and specifies keywords corresponding to each classification code (STEP 111).
  • the document to which each classification code is assigned is analyzed, and the number of occurrences of each keyword in the document and the evaluation value are used.
  • a method, a method of manual selection by the user, or the like may be used.
  • the keyword correspondence information indicating that the keyword has a special relationship is created (STEP 112). Then, the identified keyword is registered in the keyword database 104. At this time, the identified keyword is associated with the keyword correspondence information and recorded in the management table of the classification code “important” in the keyword database 104 (STEP 113).
  • the related term database 105 creates a management table for each classification code based on the results of document classification in past lawsuits, and registers related terms corresponding to each classification code (STEP 121).
  • STEP 121 registers related terms corresponding to each classification code.
  • encoding process” and “product a” are registered as related terms of “product A”
  • decoding” and “product b” are registered as related terms of “product B”.
  • the related term correspondence information indicating which classification code each registered related term corresponds to is created (STEP 122) and recorded in each management table (STEP 123). At this time, the related term correspondence information also records a threshold value serving as a score necessary for determining an evaluation value and a classification code of each related term.
  • the keyword and the keyword correspondence information, and the related term and the related term correspondence information are updated and registered (STEP 113, STEP 123).
  • ⁇ Second stage (STEP 200)> A detailed processing flow of the first automatic sorting unit 201 in the second stage will be described with reference to FIG.
  • the first automatic classification unit 201 performs a process of assigning the classification code “important” to the document.
  • the first automatic sorting unit 201 extracts documents including the keywords “infringement” and “patent attorney” registered in the keyword database 104 in the first stage (STEP 100) from the document information (STEP 211).
  • the extracted document is referred to from the keyword correspondence information with reference to the management table in which the keyword is recorded (STEP 212), and a classification code of “important” is given (STEP 213).
  • the second automatic classification unit 301 assigns the classification codes “product A” and “product B” to the document information that has not been assigned the classification code in the second stage (STEP 200). Process.
  • the second automatic classification unit 301 records a document including related terms “encoding process”, “product a”, “decoding”, and “product b” recorded in the related term database 105 in the first stage. Extract (STEP 311). Based on the recorded appearance frequency and evaluation value of the four related terms, the score is calculated by the score calculation unit 116 using the expression (1) (STEP 312). The score represents the degree of association between each document and the classification codes “product A” and “product B”.
  • the appearance frequency of the related terms “encoding process” and “product a” and the evaluation value of the related term “encoding process” are high, and the score indicating the degree of association with the classification code “product A” is a threshold value. Is exceeded, the document is given a classification code “Product A”.
  • the second automatic classification unit 301 recalculates the evaluation value of the related term using the score calculated in STEP 432 in the fourth stage according to the following equation (2), and weights the evaluation value (STEP 315). ).
  • the classification code from the reviewer is given to the document information of a certain ratio extracted from the document information to which the classification code is not given. Acceptance and the accepted classification code are assigned to the document information.
  • the document information given the classification code received from the reviewer is analyzed, and based on the analysis result, the classification code is given to the document information not given the classification code.
  • a process of assigning classification codes of “important”, “product A”, and “product B” is performed on the document information. The fourth stage is further described below.
  • the information extraction unit 24 first samples a document at random and displays it on the document display unit 130.
  • 20% of the document information to be processed is extracted at random and set as a classification target by the reviewer.
  • Sampling may be an extraction method in which documents are arranged in order of document creation date and time or in order of name, and 30% of documents are selected from the top.
  • the user browses the display screen 11 shown in FIG. 20 displayed on the document display unit 130, and selects a classification code to be assigned to each document.
  • the classification code reception / giving unit 131 receives the classification code selected by the user (STEP 411), and sorts based on the given classification code (STEP 412).
  • the document analysis unit 118 extracts words that frequently appear in the documents classified by classification code by the classification code reception / giving unit 131 (STEP 421).
  • the evaluation value of the extracted common word is analyzed by Expression (2) (STEP 422), and the appearance frequency of the common word in the document is analyzed (STEP 423).
  • FIG. 16 is a graph showing a result of analyzing words frequently appearing in the document to which the classification code “important” is assigned in STEP424.
  • the vertical axis R_hot includes words selected as words associated with the classification code “important” among all documents to which the classification code “important” is assigned by the user, and the classification code “important” is assigned. Shows the percentage of documents that were used.
  • the horizontal axis indicates the ratio of documents including the words extracted in STEP 421 by the classification code receiving and assigning unit 131 among all the documents subjected to the classification process by the user.
  • STEP 421 to STEP 424 The processing of STEP 421 to STEP 424 is also executed for the documents to which the classification codes “product A” and “product B” are assigned, and the trend information of the documents is analyzed.
  • the third automatic classification unit 401 performs processing on a document whose classification code is not accepted by the classification code acceptance and grant unit 131 in STEP 411 out of the document information to be processed in the fourth stage.
  • a document having the same trend information as the trend information of the document to which the classification codes “important”, “product A”, and “product B” are assigned analyzed in STEP 424 from such a document.
  • Are extracted (STEP 431), and the score of the extracted document is calculated using the formula (1) based on the trend method (STEP 432).
  • an appropriate classification code is assigned to the document extracted in STEP 431 based on the trend information (STEP 433).
  • the third automatic sorting unit 401 further reflects the sorting result in each database using the score calculated in STEP 432 (STEP 434). Specifically, a process of lowering the evaluation values of keywords and related terms included in a document having a low score and increasing the evaluation values of keywords and related terms included in a document having a high score may be performed.
  • the third automatic classification unit 401 may perform a classification process on a document whose classification code is not given by the classification code reception and grant unit 131 in STEP 411 among the document information to be processed in the fourth stage. .
  • the third automatic sorting unit 401 when no argument is given (STEP 441: None), the same trend information as the trend information of the document to which the classification code “important” is assigned, analyzed from the document in STEP 424. Is extracted (STEP 442), and the score of the extracted document is calculated using equation (1) based on the trend information (STEP 443). Further, an appropriate classification code is assigned to the document extracted in STEP 442 based on the trend information (STEP 444).
  • the third automatic sorting unit 401 further reflects the sorting result in each database using the score calculated in STEP 443 (STEP 445). Specifically, the evaluation value of the keyword and the related term included in the document with a low score is lowered, while the evaluation value of the keyword and the related term included in the document with a high score is increased.
  • the data for score calculation is collectively stored in the score calculation database 106. May be stored.
  • ⁇ Fifth stage (STEP 500)> A detailed processing flow of the quality inspection unit 501 in the fifth stage will be described with reference to FIG.
  • the classification code reception / giving unit 131 determines the classification code to be given to the document received in STEP 411 based on the trend information analyzed by the document analysis unit 118 in STEP 424 (STEP 511). .
  • the classification code received by the classification code reception / giving unit 131 is compared with the classification code determined in STEP 511 (STEP 512), and the validity of the classification code received in STEP 411 is verified (STEP 513).
  • the document analysis system 1 may include a learning unit 601.
  • the learning unit 601 learns the weighting of each keyword or related term based on the first to fourth processing results using Expression (2).
  • the learning result may be reflected in the keyword database 104, the related term database 105, or the score calculation database 106.
  • the document analysis system 1 is based on the result of the document analysis processing, and a lawsuit case (for example, a cartel / patent / FCPA / PL if a lawsuit) or a fraud investigation (for example, information leakage, It is possible to provide a report creation unit 701 for outputting an optimum survey report according to the survey type (eg, fictitious billing).
  • a lawsuit case for example, a cartel / patent / FCPA / PL if a lawsuit
  • a fraud investigation for example, information leakage
  • the contents of the survey vary depending on the survey type. For example,
  • a method of analyzing a document that has already been given a classification code corresponding to similar search information and adjusting a range to which the classification code is assigned based on the analysis result is used.
  • the method of adjusting the range to which the classification code is assigned by clustering similar search information corresponding to the similar search information There is a method to perform prediction classification by learning.
  • a common classification code may be given to the reply document of the reply document of the original document.
  • the same or similar classification codes are given to similar search information by learning to integrate similar search information for the classification results.
  • the reliability of the analysis result varies depending on the number of documents to be analyzed.
  • a statistical method may be added to the total number of documents to be classified to determine at what time point the percentage of all documents to be adjusted for the range to which the classification code is assigned based on the analysis results. .
  • the classification is performed by clustering the search information corresponding to the similar search information.
  • the range of the document to which the classification code is assigned may be adjusted by executing both the method of adjusting the range to be performed and the method of performing the prediction classification by learning the classification result. Accordingly, in another example of the embodiment of the present invention, it is possible to quickly and accurately assign a classification code, and to reduce the burden associated with the classification work.
  • a display screen control unit that controls a display screen that presents the type of information extracted by the survey type determination unit to the user may be provided.
  • an input receiving unit that receives a keyword and / or sentence input by a user corresponding to the type of information presented on the display screen control unit may be provided.
  • the document analysis program of the present invention is a document analysis program for acquiring information recorded in a predetermined computer or server and analyzing document information composed of a plurality of documents included in the acquired information.
  • the computer stores a generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs, for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation,
  • the lawsuit or fraud investigation is further stored for each category to which the lawsuit or fraud investigation belongs and the generation process model, and the lawsuit or fraud investigation is referred to by referring to a survey basic database further storing time series information indicating the temporal order of the phases.
  • the document information is classified based on related information, the generation process model, and the time series information. And, an index indicating the likelihood that the predetermined action is caused to realize the calculation function to calculate the result of the analysis.
  • the calculation function can be realized by the calculation unit. Details are as described above.
  • the embodiment of the present invention automatically updates the database according to a category by accepting a user input for a category of litigation case or fraud investigation case.
  • a category of litigation case or fraud investigation case As a result, the burden of office work for inputting the names of persons in charge, custodians, etc. is reduced.
  • the search word is adjusted by the database automatically updated according to the category, and a classification code is automatically assigned to the document information using the adjusted search word. This reduces the burden of sorting the document information used for litigation or fraud investigation cases.
  • the control block of the document analysis system 1 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software using a CPU (Central Processing Unit). .
  • the document analysis system 1 includes a CPU that executes instructions of a program (control program) that is software that implements each function, and a ROM (in which the program and various data are recorded so as to be readable by the computer (or CPU) A Read Only Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like.
  • the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it.
  • a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.
  • the program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program.
  • the present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
  • a document analysis system comprising: a survey category determination unit that determines a survey category to be surveyed based on a category and extracts a necessary type of information from the survey basic database.
  • the document analysis system further includes a display screen control unit that controls a display screen for presenting a type of information extracted by the survey type determination unit to the user.
  • the document analysis system further includes an input reception unit that receives an input of a keyword and / or a sentence by a user corresponding to the type of information presented on the display screen control unit.
  • the document analysis system further includes an information extraction unit that extracts keywords and / or sentences corresponding to the type of information extracted by the survey type determination unit from the survey basic database. .
  • the document analysis system further includes a search unit that searches the document for the keyword and / or the sentence.
  • the document analysis system further includes an automatic classification code assigning unit that automatically assigns a classification code to the document, and the keyword and / or the sentence are used for assigning the classification code.
  • Document analysis system includes an automatic classification code assigning unit that automatically assigns a classification code to the document, and the keyword and / or the sentence are used for assigning the classification code.
  • An analysis method comprising: a survey category input receiving step for receiving an input of a category of the lawsuit or fraud investigation; and a survey category to be investigated based on the category received by the survey category input receiving step;
  • a document analysis method comprising: a survey type determination step for extracting a type of necessary information from a survey basic database that stores information related to litigation or fraud investigation.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Computer Hardware Design (AREA)
  • Technology Law (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention facilitates the analysis of document information used in litigation. This document analysis system is provided with: an investigation base database that stores an occurrence process model, by which predetermined actions that are causes of litigation or an investigation of impropriety occur, by phases categorized in accordance with the progress of the predetermined actions, further stores information pertaining to litigation or an investigation of impropriety by occurrence process model and category to which the litigation or investigation of impropriety belongs, and further stores time sequence information indicating a temporal order of phases; and a calculation unit that analyzes document information on the basis of the information pertaining to the litigation or investigation of impropriety, the occurrence process model, and the time series information, and calculates from the analysis results an indicator indicating the likelihood of the predetermined actions occurring.

Description

文書分析システム及び文書分析方法並びに文書分析プログラムDocument analysis system, document analysis method, and document analysis program
 本発明は、文書分析システム及び文書分析方法並びに文書分析プログラムに関する。 The present invention relates to a document analysis system, a document analysis method, and a document analysis program.
 従来、不正アクセスや機密情報漏洩などコンピュータに関する犯罪や法的紛争が生じた際に、原因究明や捜査に必要な機器やデータ、電子的記録を収集・分析し、その法的な証拠性を明らかにする手段や技術が提案されている。 Conventionally, when computer crimes and legal disputes such as unauthorized access and leakage of confidential information occur, the equipment, data, and electronic records necessary for investigation and investigation are collected and analyzed, and the legal evidence is revealed. Means and techniques to make it have been proposed.
 特に、米国民事訴訟ではeDiscovery(電子証拠開示)等が求められており、当該訴訟の原告および被告のいずれもが、関連するデジタル情報をすべて証拠として提出する責任を負う。そのため、コンピュータやサーバに記録されたデジタル情報を証拠として、提出しなければならない。 In particular, eDiscovery (electronic evidence disclosure), etc. is required in US civil lawsuits, and both the plaintiff and the defendant in the lawsuit are responsible for submitting all relevant digital information as evidence. Therefore, digital information recorded on a computer or server must be submitted as evidence.
 一方、ITの急速な発達と普及に伴い、今日のビジネスの世界ではほとんどの情報がコンピュータを用いて作成されているため、同一企業内であっても多くのデジタル情報が氾濫している。 On the other hand, with the rapid development and spread of IT, most information is created using computers in today's business world, so a lot of digital information is flooded even within the same company.
 そのため、法廷への証拠資料提出のための準備作業を行う過程において、当該訴訟に必ずしも関連しない機密なデジタル情報までも証拠資料として含めてしまうミスが生じやすい。また、当該訴訟に関連しない機密な文書情報を提出してしまうことが問題になっていた。 Therefore, in the process of preparing for submission of evidence to the court, it is easy to make mistakes that include confidential digital information not necessarily related to the lawsuit as evidence. Moreover, it has been a problem to submit confidential document information not related to the lawsuit.
 近年、フォレンジックシステムにおける文書情報に関する技術が、特許文献1乃至特許文献3に提案されている。特許文献1には、利用者情報に含まれる少なくとも1人以上の利用者から、特定の者を指定し、指定された特定の者に関するアクセス履歴情報に基づいて、特定の者がアクセスしたデジタル文書情報のみを抽出し、抽出されたデジタル文書情報の文書ファイルそれぞれが、訴訟に関連するものであるか否かを示す付帯情報を設定し、付帯情報に基づき、訴訟に関連する文書ファイルを出力するフォレンジックシステムについて開示されている。 In recent years, technologies relating to document information in a forensic system have been proposed in Patent Documents 1 to 3. Patent Document 1 discloses a digital document in which a specific person is designated from at least one or more users included in the user information and is accessed based on access history information regarding the specified specific person. Extracts only the information, sets the accompanying information indicating whether each extracted digital document information document file is related to a lawsuit, and outputs a document file related to the lawsuit based on the supplementary information A forensic system is disclosed.
 また、特許文献2には、記録されたデジタル情報を表示し、複数の文書ファイル毎に、利用者情報に含まれる利用者のうちいずれの利用者に関連するものであるかを示す利用者特定情報を設定し、該設定された利用者特定情報を記憶部に記録するように設定し、少なくとも一人以上の利用者を指定し、指定された利用者に対応する利用者特定情報が設定された文書ファイルを検索し、表示部を介して、検索された文書ファイルが、訴訟に関連するものであるか否かを示す付帯情報を設定し、付帯情報に基づき、訴訟に関連する文書ファイルを出力するフォレンジックシステムについて開示されている。 In Patent Document 2, recorded digital information is displayed, and for each of a plurality of document files, a user identification indicating which of the users included in the user information relates to the user is specified. Information is set, the set user identification information is set to be recorded in the storage unit, at least one user is specified, and the user identification information corresponding to the specified user is set Searches the document file, sets incidental information indicating whether or not the retrieved document file is related to the lawsuit, and outputs the document file related to the lawsuit based on the supplementary information. A forensic system is disclosed.
 さらに、特許文献3には、デジタル文書情報に含まれる少なくとも1以上の文書ファイルの指定を受け付け、指定された文書ファイルをいずれの言語に翻訳するかの指定を受け付け、指定を受け付けた文書ファイルを、指定を受け付けた言語に翻訳し、記録部に記録されたデジタル文書情報から、指定された文書ファイルと同一の内容を示す共通文書ファイルを抽出し、抽出された共通文書ファイルが、翻訳された文書ファイルの翻訳内容を援用することにより翻訳されたことを示す翻訳関連情報を生成し、翻訳関連情報に基づいて、訴訟に関連する文書ファイルを出力するフォレンジックシステムについて開示されている。 Further, Patent Document 3 accepts designation of at least one or more document files included in the digital document information, accepts designation of which language the designated document file is translated into, and designates the document file for which designation is accepted. , Translated into the language that accepted the specification, extracted from the digital document information recorded in the recording unit a common document file showing the same content as the specified document file, the extracted common document file was translated There has been disclosed a forensic system that generates translation-related information indicating that a document file has been translated by using the translation content of the document file, and outputs a document file related to a lawsuit based on the translation-related information.
特開2011-209930号公報JP 2011-209930 A 特開2011-209931号公報JP 2011-209931 A 特開2012-32859号公報JP 2012-32859 A
 しかしながら、例えば、特許文献1乃至特許文献3のようなフォレンジックシステムにおいては、複数のコンピュータ及びサーバを利用した利用者の膨大な文書情報を収集することになる。 However, for example, in a forensic system such as Patent Document 1 to Patent Document 3, a huge amount of document information of users using a plurality of computers and servers is collected.
 このようなデジタル化された膨大な文書情報を訴訟の証拠資料として妥当であるか否かの分別をする作業は、レビュワーと呼ばれるユーザが目視により確認し、当該文書情報をひとつひとつ分別していく必要があり、多大な労力と費用がかかるという問題があった。 To sort out whether such a large amount of digitized document information is valid as evidence for a lawsuit, it is necessary for a user called a reviewer to visually check and separate the document information one by one. There was a problem that it took a lot of labor and cost.
 本発明は、訴訟に利用する文書情報の分析を容易にするための文書分析システム及び文書分析方法並びに文書分析プログラムを提供することを目的とするものである。 The present invention has an object to provide a document analysis system, a document analysis method, and a document analysis program for facilitating analysis of document information used in a lawsuit.
 本発明の文書分析システムは、所定のコンピュータまたはサーバに記録された情報を取得し、当該取得された情報に含まれる、複数の文書から構成される文書情報を分析する文書分析システムであって、訴訟または不正調査の原因となる所定の行為が生じる生成過程モデルを、当該所定の行為の進展に応じて分類するフェーズごとに格納するとともに、前記訴訟または不正調査に関連する情報を、当該訴訟または不正調査が属するカテゴリおよび前記生成過程モデルごとにさらに格納し、前記フェーズの時間的な序列を示す時系列情報をさらに格納する調査基礎データベースと、前記訴訟または不正調査に関連する情報、前記生成過程モデル、および前記時系列情報に基づいて前記文書情報を分析し、前記所定の行為が生じる可能性を示す指標を当該分析した結果から算出する算出部とを備えている。 The document analysis system of the present invention is a document analysis system that acquires information recorded in a predetermined computer or server, and analyzes document information that is included in the acquired information and is composed of a plurality of documents. A generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs is stored for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation is stored in the lawsuit or fraud investigation. An investigation basic database that further stores for each category to which fraud investigation belongs and the generation process model, and further stores time-series information indicating the temporal order of the phases, information related to the lawsuit or fraud investigation, and the generation process Analyzing the document information based on the model and the time series information to indicate the possibility of the predetermined action The target and a calculation unit for calculating the result of the analysis.
 上記文書分析システムは、前記訴訟または不正調査のカテゴリの入力を受け付ける調査カテゴリ入力受付部と、前記調査カテゴリ入力受付部によって受け付けられたカテゴリに基づいて、調査の対象とする調査カテゴリを判定し、前記調査基礎データベースから、必要な情報の種類を抽出する調査種類判定部とをさらに備えることができる。 The document analysis system determines a survey category to be surveyed based on a survey category input receiving unit that receives input of the category of the lawsuit or fraud survey, and a category received by the survey category input receiving unit, A survey type determination unit that extracts a type of necessary information from the survey basic database may be further included.
 上記文書分析システムは、前記文書情報に含まれるキーワードおよび/または文章を、前記訴訟または不正調査に関連する情報として当該文書情報から抽出する情報抽出部をさらに備えることができる。 The document analysis system may further include an information extraction unit that extracts keywords and / or sentences included in the document information from the document information as information related to the lawsuit or fraud investigation.
 上記文書分析システムは、前記キーワードおよび/または文章を、前記複数の文書の中から検索する検索部をさらに備えることができる。 The document analysis system may further include a search unit that searches the keywords and / or sentences from the plurality of documents.
 上記文書分析システムは、前記複数の文書のそれぞれに対して自動で分別符号を付与する自動分別符号付与部をさらに備え、前記キーワードおよび/または文章は、前記分別符号の付与に利用されることができる。 The document analysis system may further include an automatic classification code assigning unit that automatically assigns a classification code to each of the plurality of documents, and the keyword and / or the sentence may be used for assigning the classification code. it can.
 本発明の文書分析方法は、所定のコンピュータまたはサーバに記録された情報を取得し、当該取得された情報に含まれる、複数の文書から構成される文書情報を分析する文書分析方法であって、訴訟または不正調査の原因となる所定の行為が生じる生成過程モデルを、当該所定の行為の進展に応じて分類するフェーズごとに格納するとともに、前記訴訟または不正調査に関連する情報を、当該訴訟または不正調査が属するカテゴリおよび前記生成過程モデルごとにさらに格納し、前記フェーズの時間的な序列を示す時系列情報をさらに格納する調査基礎データベースを参照することによって、前記訴訟または不正調査に関連する情報、前記生成過程モデル、および前記時系列情報に基づいて前記文書情報を分析し、前記所定の行為が生じる可能性を示す指標を当該分析した結果から算出する算出ステップを含んでいる。 The document analysis method of the present invention is a document analysis method for acquiring information recorded in a predetermined computer or server and analyzing document information composed of a plurality of documents included in the acquired information. A generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs is stored for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation is stored in the lawsuit or fraud investigation. Information related to the lawsuit or fraud investigation is further stored by referring to the investigation basic database further storing for each category to which the fraud investigation belongs and the generation process model, and further storing time series information indicating the temporal order of the phases. The document information is analyzed based on the generation process model and the time series information, and the predetermined action occurs. The indicator of the potential contains a calculation step of calculating the result of the analysis.
 本発明の文書分析プログラムは、所定のコンピュータまたはサーバに記録された情報を取得し、当該取得された情報に含まれる、複数の文書から構成される文書情報を分析する文書分析プログラムであって、コンピュータに、訴訟または不正調査の原因となる所定の行為が生じる生成過程モデルを、当該所定の行為の進展に応じて分類するフェーズごとに格納するとともに、前記訴訟または不正調査に関連する情報を、当該訴訟または不正調査が属するカテゴリおよび前記生成過程モデルごとにさらに格納し、前記フェーズの時間的な序列を示す時系列情報をさらに格納する調査基礎データベースを参照することによって、前記訴訟または不正調査に関連する情報、前記生成過程モデル、および前記時系列情報に基づいて前記文書情報を分析し、前記所定の行為が生じる可能性を示す指標を当該分析した結果から算出させる算出機能を実現させる。 The document analysis program of the present invention is a document analysis program for acquiring information recorded in a predetermined computer or server and analyzing document information composed of a plurality of documents included in the acquired information. The computer stores a generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs, for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation, The lawsuit or fraud investigation is further stored for each category to which the lawsuit or fraud investigation belongs and the generation process model, and the lawsuit or fraud investigation is referred to by referring to a survey basic database further storing time series information indicating the temporal order of the phases. The document information is classified based on related information, the generation process model, and the time series information. And, an index indicating the likelihood that the predetermined action is caused to realize the calculation function to calculate the result of the analysis.
 本発明の文書分析システム、文書分析方法、および、文書分析プログラムによれば、訴訟に利用する文書情報の分析を容易にすることができる。 According to the document analysis system, the document analysis method, and the document analysis program of the present invention, it is possible to facilitate the analysis of document information used in a lawsuit.
本発明の実施形態に係る文書分析システムの要部構成を示すブロック図1 is a block diagram showing a main configuration of a document analysis system according to an embodiment of the present invention. 本実施形態において想定されるフェーズを一覧可能に示す表Table showing the list of possible phases in this embodiment (a)は、上記所定の行為が生じる過程が、フェーズごとに上記生成過程モデルとしてモデル化されていることを示す模式図、(b)は、上記訴訟または不正調査に関連する情報が、当該訴訟または不正調査が属するカテゴリおよび上記生成過程モデルごとに格納されていることを示す模式図(A) is a schematic diagram showing that the process in which the predetermined action occurs is modeled as the generation process model for each phase, and (b) is information related to the lawsuit or fraud investigation, Schematic diagram showing that the lawsuit or fraud investigation is stored for each category and the above generation process model 本発明の実施形態に係る文書分析システムの詳細な構成図1 is a detailed configuration diagram of a document analysis system according to an embodiment of the present invention. 本発明の実施形態に係る文書分析方法の処理の流れを示すチャートThe chart which shows the flow of a process of the document analysis method concerning embodiment of this invention 本発明の実施形態に係る文書分析方法における詳細な処理の流れを示すチャートThe chart which shows the flow of a detailed process in the document analysis method concerning embodiment of this invention 本発明の実施形態に係る文書分析方法における調査種類に応じた調査及び分別処理の流れを示すチャートThe chart which shows the flow of the investigation and the classification process according to the investigation type in the document analysis method according to the embodiment of the present invention 本発明の実施形態に係る文書分析方法における調査種類に応じたプレディクティブコーディングの流れを示すチャートThe chart which shows the flow of predictive coding according to the investigation kind in the document analysis method concerning embodiment of this invention 実施形態における段階ごとの処理の流れを示したチャートThe chart which showed the flow of processing for every step in an embodiment 実施形態におけるキーワードデータベースの処理フローを示すチャートThe chart which shows the processing flow of the keyword database in an embodiment 本実施形態における関連用語データベースの処理フローを示したチャートThe chart which showed the processing flow of the related term database in this embodiment 本実施形態における第1自動分別部の処理フローを示したチャートThe chart which showed the processing flow of the 1st automatic classification part in this embodiment 本実施形態における第2自動分別部の処理フローを示したチャートThe chart which showed the processing flow of the 2nd automatic classification part in this embodiment 本実施形態における分別符号受付付与部の処理フローを示したチャートThe chart which showed the processing flow of the classification code reception grant part in this embodiment 本実施形態における文書解析部の処理フローを示したチャートChart showing the processing flow of the document analysis unit in this embodiment 本実施形態における文書解析部での解析結果を示したグラフThe graph which showed the analysis result in the document analysis part in this embodiment 本実施形態の一実施例における第3自動分別部の処理フローを示したチャートThe chart which showed the processing flow of the 3rd automatic classification part in one example of this embodiment 本実施形態の他の実施例における第3自動分別部の処理フローを示したチャートThe chart which showed the processing flow of the 3rd automatic classification part in other examples of this embodiment 本実施形態における品質検査部の処理フローを示したチャートThe chart which showed the processing flow of the quality inspection part in this embodiment 本実施形態における文書表示画面Document display screen in this embodiment
 図1は、本発明の実施形態に係る文書分析システム1の要部構成を示すブロック図である。文書分析システム1は、所定のコンピュータまたはサーバに記録された情報を取得し、当該取得された情報に含まれる、複数の文書から構成される文書情報を分析するシステムである。図1に示すように、上記文書分析システム1は、調査カテゴリ入力受付部20、調査種類判定部22、情報抽出部24、調査基礎データベース103、分析部26、算出部28、検索部30、および、自動分別符号付与部32を備えている。 FIG. 1 is a block diagram showing a main configuration of a document analysis system 1 according to an embodiment of the present invention. The document analysis system 1 is a system that acquires information recorded in a predetermined computer or server, and analyzes document information including a plurality of documents included in the acquired information. As shown in FIG. 1, the document analysis system 1 includes a survey category input reception unit 20, a survey type determination unit 22, an information extraction unit 24, a survey basic database 103, an analysis unit 26, a calculation unit 28, a search unit 30, and The automatic classification code assigning unit 32 is provided.
 調査カテゴリ入力受付部20は、ユーザによる訴訟または不正調査のカテゴリの入力を受け付ける。ここで、上記訴訟または不正調査のカテゴリは、当該訴訟または不正調査に係る事件の性質を表すものであり、例えば、反トラスト、特許、海外賄賂禁止(FCPA)、製造物責任(PL)、情報漏洩、架空請求などであってよい。カテゴリが入力された場合、調査カテゴリ入力受付部20は、当該カテゴリを調査種類判定部22に出力する。 The investigation category input receiving unit 20 receives an input of a lawsuit or fraud investigation category by the user. Here, the category of the lawsuit or fraud investigation represents the nature of the case relating to the lawsuit or fraud investigation. For example, antitrust, patent, foreign bribery prohibition (FCPA), product liability (PL), information It may be a leak or a fictitious claim. When a category is input, the survey category input reception unit 20 outputs the category to the survey type determination unit 22.
 調査種類判定部22は、上記調査カテゴリ入力受付部20によって受け付けられたカテゴリに基づいて、調査の対象とするカテゴリを判定し、調査基礎データベース103から必要な情報の種類を抽出する。例えば、上記文書情報が、電子メール、プレゼンテーション資料、表計算資料、打ち合わせ資料、契約書、組織図、または事業計画書のいずれかである場合、調査種類判定部22は、電子メールを上記必要な情報の種類として情報抽出部24に出力する。 The survey type determination unit 22 determines a category to be surveyed based on the category received by the survey category input reception unit 20 and extracts a necessary information type from the survey basic database 103. For example, when the document information is any one of an email, a presentation material, a spreadsheet, a meeting material, a contract, an organization chart, or a business plan, the survey type determination unit 22 needs the email as described above. It outputs to the information extraction part 24 as a kind of information.
 情報抽出部24は、文書情報から複数の文書を抽出する。具体的には、情報抽出部24は、上記調査種類判定部22から入力された情報(例えば、電子メール、プレゼンテーション資料、表計算資料、打ち合わせ資料、契約書、組織図、事業計画書など)から、当該情報に含まれるキーワードおよび/または文章を、訴訟または不正調査に関連する情報として抽出し、当該抽出した結果を調査基礎データベース103に格納する。 The information extraction unit 24 extracts a plurality of documents from the document information. Specifically, the information extraction unit 24 uses information input from the survey type determination unit 22 (for example, e-mail, presentation materials, spreadsheet materials, meeting materials, contracts, organization charts, business plans, etc.). The keywords and / or sentences included in the information are extracted as information related to lawsuits or fraud investigations, and the extracted results are stored in the investigation basic database 103.
 調査基礎データベース103は、訴訟または不正調査の原因となる所定の行為が生じる生成過程モデルを、当該所定の行為の進展に応じて分類するフェーズごとに格納する。ここで、上記訴訟は、例えば、反トラスト、特許、海外賄賂禁止(FCPA)、製造物責任(PL)などに関する訴訟であってよい。また、上記不正調査は、情報漏洩、架空請求などに関する調査であってよい。さらに、上記所定の行為は、例えば、反トラスト、特許、海外賄賂禁止、製造物責任、情報漏洩、架空請求などの不正な行為(例えば、競合との価格調整会議に参加するなど)に関連する行為であってよい。 The investigation basic database 103 stores a generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs for each phase classified according to the progress of the predetermined action. Here, the lawsuit may be a lawsuit regarding, for example, antitrust, patent, foreign bribery prohibition (FCPA), product liability (PL), and the like. The fraud investigation may be an investigation relating to information leakage, fictitious billing, and the like. In addition, the prescribed actions are related to fraudulent actions such as antitrust, patents, overseas bribery prohibition, product liability, information leakage, and fictitious claims (for example, participating in price adjustment meetings with competitors). It may be an action.
 図2は、本実施形態において想定されるフェーズを一覧可能に示す表である。前述したように、上記フェーズは、上記所定の行為が進展する各段階を示す(上記所定の行為の進展に応じて分類する)指標である。例えば、「Relationship Building」(関係構築)というフェーズは、Competition(競合)というフェーズの前提となる段階であって、顧客・競合と関係を構築する段階をいう。また、「Preparation」(準備)というフェーズは、競合他社(第三者であってもよい)と競合に関する情報を交換する段階をいう。さらに、「Competition」(競合)というフェーズは、顧客へ価格を提示し、フィードバックを得て、当該フィードバックに関して競合とコミュニケーションを取る段階をいう。 FIG. 2 is a table showing a list of possible phases in the present embodiment. As described above, the phase is an index indicating each stage in which the predetermined action progresses (classified according to the progress of the predetermined action). For example, the phase “Relationship Building” (relationship building) is a premise of the phase of competition (competition), and is a step of building a relationship with a customer / competition. The “Preparation” phase refers to a stage in which information regarding competition is exchanged with competitors (which may be third parties). Furthermore, the phase of “Competition” (competition) refers to the stage of presenting a price to a customer, obtaining feedback, and communicating with the competitor regarding the feedback.
 ここで、上記「Relationship Building」(関係構築)というフェーズにおいては、「顧客からの引き合い」という行為(訴訟または不正調査の原因となる所定の行為)が生じることが一般的である。また、上記「Preparation」(準備)というフェーズにおいては「競合の生産状況の入手」という行為(訴訟または不正調査の原因となる所定の行為)が生じることが多い。その他にも、上記フェーズのそれぞれに対応付けられるように、訴訟または不正調査の原因となり得る一般的な行為が明らかである。 Here, in the phase of “Relationship Building” (relationship building), an action “inquiry from a customer” (a predetermined action that causes a lawsuit or fraud investigation) generally occurs. Further, in the “preparation” phase, an action “obtaining competitive production status” (predetermined action causing a lawsuit or fraud investigation) often occurs. In addition, general actions that can cause litigation or fraud investigations are evident, as associated with each of the above phases.
 上記生成過程モデルは、訴訟または不正調査に関連する情報(例えば、文書情報から抽出されたキーワード)に応じて、所定の行動主体(個人または複数人で構成される組織)が、上記所定の行為に至る過程に関するモデルである。上記生成過程モデルには、例えば、性格パターンモデル、行動パターンモデル、グループパターンモデルなどが含まれる。 The generation process model is based on information related to litigation or fraud investigations (for example, keywords extracted from document information), and a predetermined action subject (organization composed of individuals or multiple persons) It is a model about the process leading to. Examples of the generation process model include a personality pattern model, an action pattern model, and a group pattern model.
 図3の(a)は、上記所定の行為が生じる過程が、フェーズごとに上記生成過程モデルとしてモデル化されていることを示す模式図である。前述したように、調査基礎データベース103は、上記生成過程モデルを上記フェーズごとに格納する。例えば、上記「Relationship Building」(関係構築)というフェーズに対して、1つの生成過程モデルが対応付けられており、上記「Preparation」(準備)というフェーズに対して、別の1つの生成過程モデルが対応付けられている。すなわち、上記所定の行為が生じる過程は、フェーズごとに上記生成過程モデルとしてモデル化されている。 (A) of FIG. 3 is a schematic diagram showing that the process in which the predetermined action occurs is modeled as the generation process model for each phase. As described above, the survey basic database 103 stores the generation process model for each phase. For example, one generation process model is associated with the phase “Relationship Building” (relationship building), and another generation process model is associated with the phase “Preparation” (preparation). It is associated. That is, the process in which the predetermined action occurs is modeled as the generation process model for each phase.
 調査基礎データベース103は、訴訟または不正調査に関連する情報を、当該訴訟または不正調査が属するカテゴリおよび上記生成過程モデルごとにさらに格納する。ここで、訴訟または不正調査に関連する情報は、情報抽出部24によって文書情報から抽出されたキーワード、キーワードの組み合わせ、またはメタ情報などであってよい。なお、上記メタ情報は、上記文書情報が有する所定の属性を示す情報であり、例えば、当該文書情報が電子メールである場合、当該電子メールが送受信された日時であってよい。 The investigation basic database 103 further stores information related to the lawsuit or fraud investigation for each category to which the lawsuit or fraud investigation belongs and the generation process model. Here, the information related to the lawsuit or the fraud investigation may be a keyword, a combination of keywords, or meta information extracted from the document information by the information extraction unit 24. The meta information is information indicating a predetermined attribute of the document information. For example, when the document information is an e-mail, the meta information may be a date and time when the e-mail is transmitted / received.
 図3の(b)は、上記訴訟または不正調査に関連する情報が、当該訴訟または不正調査が属するカテゴリおよび上記生成過程モデルごとに格納されていることを示す模式図である。前述したように、調査基礎データベース103は、上記訴訟または不正調査に関連する情報を、当該訴訟または不正調査が属するカテゴリおよび上記生成過程モデルごとに格納している。例えば、「反トラスト」というカテゴリと、1つの生成過程モデルとに対して、上記訴訟または不正調査に関連する情報が、調査基礎データベース103に格納されている。 FIG. 3B is a schematic diagram showing that information related to the lawsuit or fraud investigation is stored for each category to which the lawsuit or fraud investigation belongs and the generation process model. As described above, the investigation basic database 103 stores information related to the lawsuit or fraud investigation for each category to which the lawsuit or fraud investigation belongs and the generation process model. For example, for the category “antitrust” and one generation process model, information related to the lawsuit or the fraud investigation is stored in the investigation basic database 103.
 また、調査基礎データベース103は、時系列情報をさらに格納する。上記時系列情報は、上記フェーズの時間的な序列を示す情報である。図2に示す例によれば、上記時系列情報は、「Relationship Building」(関係構築)というフェーズが、「Preparation」(準備)というフェーズを経て、「Competition」(競合)というフェーズに発展するという一連の遷移を示す情報であってよい。 Also, the survey basic database 103 further stores time series information. The time series information is information indicating a temporal order of the phases. According to the example shown in FIG. 2, the time-series information has a phase of “Relationship Building” (relationship building) that has evolved into a phase of “Competition” through a phase of “Preparation” (preparation). It may be information indicating a series of transitions.
 分析部26は、上記訴訟または不正調査に関連する情報、上記生成過程モデル、および上記時系列情報に基づいて上記文書情報を分析する。具体的には、分析部26は、上記訴訟または不正調査に関連する情報、上記生成過程モデル、および上記時系列情報を調査基礎データベース103から読み出し、調査対象データの形態素解析およびキーワード分析を行うことによって、上記所定の行為に該当する行動を抽出する。分析部26は、当該分析した結果(抽出した所定の行為)を算出部28に出力する。 The analysis unit 26 analyzes the document information based on the information related to the lawsuit or fraud investigation, the generation process model, and the time series information. Specifically, the analysis unit 26 reads information related to the lawsuit or fraud investigation, the generation process model, and the time series information from the investigation basic database 103, and performs morphological analysis and keyword analysis of the investigation target data. To extract an action corresponding to the predetermined action. The analysis unit 26 outputs the analysis result (the extracted predetermined action) to the calculation unit 28.
 算出部28は、上記所定の行為が生じる可能性を示す指標(ケースインデックス)を上記分析した結果から算出する。具体的には、訴訟または不正調査の原因となる所定の行為のそれぞれに、指標の増分が任意に設定されており、算出部28は、抽出された所定の行為に対応する上記指標を、上記増分だけ増加させる。例えば、「Relationship Building」(関係構築)というフェーズに属する所定の行為が抽出された場合、算出部28は、当該所定の行為に対応する上記指標を1だけ増加させてよい。なお、図2に示す例においては、任意の行為に対する指標の増分は「1」に設定されているが、当該増分は任意に設定可能である。また、上記指標の上限値は、例えば10に設定されてよい。 The calculating unit 28 calculates an index (case index) indicating the possibility of the predetermined action from the result of the analysis. Specifically, an increment of an index is arbitrarily set for each predetermined action that causes a lawsuit or fraud investigation, and the calculation unit 28 sets the index corresponding to the extracted predetermined action to the above-described index. Increase by increments. For example, when a predetermined action belonging to the phase “Relationship Building” (relationship building) is extracted, the calculation unit 28 may increase the index corresponding to the predetermined action by one. In the example shown in FIG. 2, the increment of the index for an arbitrary action is set to “1”, but the increment can be arbitrarily set. The upper limit value of the index may be set to 10, for example.
 検索部30は、データベースに記録されたキーワード又は関連用語を文書情報から検索する。すなわち、検索部30は、キーワード(例えば、「侵害」、「訴訟」などの語)および/または文章を、上記複数の文書の中から検索する。 The search unit 30 searches the document information for keywords or related terms recorded in the database. That is, the search unit 30 searches the plurality of documents for keywords (for example, words such as “infringement” and “lawsuit”) and / or sentences.
 自動分別符号付与部32は、上記複数の文書のそれぞれに対して自動で分別符号を付与する。このとき、上記キーワードおよび/または文章は、上記分別符号の付与に利用される。 The automatic classification code assigning unit 32 automatically assigns a classification code to each of the plurality of documents. At this time, the keyword and / or the sentence are used for assigning the classification code.
 従来のフォレンジックシステムにおいては、訴訟に発展するおそれ(例えば、情報が漏洩する危険度など)を数値化することができなかった。このため、起こり得る危機が客観的に把握されないという問題があった。 In the conventional forensic system, the risk of developing into a lawsuit (for example, the risk of information leakage) could not be quantified. For this reason, there is a problem that a possible crisis cannot be grasped objectively.
 文書分析システム1によれば、訴訟または不正調査の原因となる所定の行為(例えば、不正行為など)が生じるおそれを指標化することによって、当該所定の行為の危険度を客観的に把握可能にすることができる。これにより、当該指標の動きに応じて報告を行うなどによって、上記所定の行為を監視することが可能となる。したがって、文書分析システム1は、訴訟に利用する文書情報の分析を容易にすることができる。 According to the document analysis system 1, it is possible to objectively grasp the risk level of a predetermined action by indexing the possibility of a predetermined action (for example, an illegal action) causing a lawsuit or a fraud investigation. can do. Thus, the predetermined action can be monitored by reporting according to the movement of the index. Therefore, the document analysis system 1 can facilitate analysis of document information used for a lawsuit.
 続いて、本発明の文書分析システムの詳細について、図面を参照しながら具体的に説明する。なお、以下に説明する例は一例であって、この例に限定されるものではない。 Subsequently, details of the document analysis system of the present invention will be specifically described with reference to the drawings. In addition, the example demonstrated below is an example, Comprising: It is not limited to this example.
 図4は、本発明の実施形態に係る文書分析システム1の詳細な構成例を示す。 FIG. 4 shows a detailed configuration example of the document analysis system 1 according to the embodiment of the present invention.
 図4に示すように、本実施形態に係る文書分析システム1は、情報及びデータを格納するデータ格納部100を有することができる。該データ格納部100は、訴訟又は不正調査の解析に利用するために、複数のコンピュータまたはサーバから取得したデジタル情報をデジタル情報格納領域101に格納する。 As shown in FIG. 4, the document analysis system 1 according to the present embodiment can include a data storage unit 100 that stores information and data. The data storage unit 100 stores digital information acquired from a plurality of computers or servers in the digital information storage area 101 for use in analysis of lawsuits or fraud investigations.
 そして、データ格納部100は、例えば、反トラスト、特許、FCPA、PLを含む訴訟案件又は情報漏洩、架空請求を含む不正調査のいずれのカテゴリに属するかを示すカテゴリ属性、会社名、担当者、カストディアン、及び調査又は分別入力画面の構成を格納する調査基礎データベース103と、取得されたデジタル情報に含まれる文書の特定の分別符号と、該特定の分別符号と密接な関係を有するキーワード、及び該特定の分別符号と該キーワードとの対応関係を示すキーワード対応情報を登録するキーワードデータベース104と、所定の分別符号と、該所定の分別符号が付与された文書中において出現頻度が高い単語からなる関連用語と、該所定の分別符号と関連用語との対応関係を示す関連用語対応情報とを登録する関連用語データベース105と、文書と分別符号との結びつきの強さを示すスコアを算出するために該文書に含まれるワードの重みづけを登録するスコア算出データベース106とを格納する。 The data storage unit 100 includes, for example, a category attribute, company name, person in charge, which indicates which category of anti-trust, patent, FCPA, PL lawsuit or information leak, and fraud investigation including fictitious claims. Survey basic database 103 for storing the configuration of the custodian and the survey or classification input screen, a specific classification code of the document included in the acquired digital information, a keyword closely related to the specific classification code, and A keyword database 104 for registering keyword correspondence information indicating a correspondence relationship between the specific classification code and the keyword, a predetermined classification code, and a word having a high appearance frequency in the document to which the predetermined classification code is assigned. Related terms for registering related terms and related term correspondence information indicating the correspondence between the predetermined classification code and the related terms A database 105, which stores the score calculation database 106 for registering the weighting of words contained in the document in order to calculate a score indicating the strength of the connection between document and sorting code.
 なお、前述したように、上記調査基礎データベース103は、訴訟または不正調査の原因となる所定の行為が生じる生成過程モデルを、当該所定の行為の進展に応じて分類するフェーズごとに格納する。また、上記調査基礎データベース103は、上記フェーズの時間的な序列を示す時系列情報も格納する。 Note that, as described above, the survey basic database 103 stores a generation process model in which a predetermined action causing a lawsuit or a fraud investigation occurs for each phase classified according to the progress of the predetermined action. The survey basic database 103 also stores time-series information indicating the temporal order of the phases.
 更に、データ格納部100は、カテゴリ、カストディアン、分別作業の内容に応じて定められる報告書の形式を登録する報告作成データベース107を格納する。このデータ格納部100は、図4に示されるように、文書分析システム1内に設置されても良く、別個のストレージ装置として文書分析システム1の外部に設置されても良い。 Furthermore, the data storage unit 100 stores a report creation database 107 for registering a report format determined according to the category, custodian, and contents of sorting work. As shown in FIG. 4, the data storage unit 100 may be installed in the document analysis system 1 or may be installed outside the document analysis system 1 as a separate storage device.
 本発明の実施形態に係る文書分析システム1は、調査基礎データベース103、キーワードデータベース104、関連用語データベース105、スコア算出データベース106、及び報告作成データベース107のデータ内容の更新を管理するデータベース管理部109を備える。 The document analysis system 1 according to the embodiment of the present invention includes a database management unit 109 that manages updating of data contents of a survey basic database 103, a keyword database 104, a related term database 105, a score calculation database 106, and a report creation database 107. Prepare.
 データベース管理部109は、専用接続線又はインターネット回線901を経由して情報格納装置902に接続されることができる。そして、データベース管理部109は、情報格納装置902に記憶されるデータの内容に基づいて、調査基礎データベース103、キーワードデータベース104、関連用語データベース105、スコア算出データベース106、及び報告作成データベース107のデータ内容を更新することができる。 The database management unit 109 can be connected to the information storage device 902 via a dedicated connection line or the Internet line 901. Then, based on the data contents stored in the information storage device 902, the database management unit 109 stores the data contents of the survey basic database 103, the keyword database 104, the related term database 105, the score calculation database 106, and the report creation database 107. Can be updated.
 本発明の実施形態に係る文書分析システム1は、前述したように、調査カテゴリ入力受付部20、調査種類判定部22、情報抽出部24、分析部26、算出部28、および検索部30を備えている。なお、自動分別符号付与部32は、第1自動分別部201、第2自動分別部、および第3自動分別部401として実現されている。 As described above, the document analysis system 1 according to the embodiment of the present invention includes the survey category input reception unit 20, the survey type determination unit 22, the information extraction unit 24, the analysis unit 26, the calculation unit 28, and the search unit 30. ing. The automatic classification code assigning unit 32 is realized as a first automatic classification unit 201, a second automatic classification unit, and a third automatic classification unit 401.
 本発明の実施形態に係る文書分析システム1は、文書と分別符号との結びつきの強さを示すスコアを算出するスコア算出部116と、検索部30によりキーワードデータベース104に記録したキーワードを検索し、キーワードを含む文書を文書情報から抽出し、該抽出した文書に対して、キーワード対応情報に基づき特定の分別符号を自動的に付与する第1自動分別部201と、文書情報から、関連用語データベースに記録した関連用語を含む文書を抽出し、該抽出した文書に含まれる関連用語の評価値及び該関連用語の数に基づいて、スコアを算出し、関連用語を含む文書のうち、該スコアが一定値を超過した文書に対して、スコアおよび関連用語対応情報に基づいて、所定の分別符号を自動的に付与する第2自動分別部301を有することができる。 The document analysis system 1 according to the embodiment of the present invention searches a keyword recorded in the keyword database 104 by the search unit 30 and a score calculation unit 116 that calculates a score indicating the strength of association between a document and a classification code, A document including a keyword is extracted from the document information, a first automatic classification unit 201 that automatically assigns a specific classification code to the extracted document based on the keyword correspondence information, and from the document information to the related term database A document including the recorded related terms is extracted, a score is calculated based on the evaluation value of the related terms included in the extracted document and the number of the related terms, and the score is constant among the documents including the related terms. A second automatic classification unit 301 that automatically assigns a predetermined classification code to a document that exceeds the value based on the score and related term correspondence information It is possible.
 更に、実施形態に係る文書分析システム1は、文書情報から抽出された複数の文書を画面上に表示する文書表示部130と、文書情報から抽出された分別符号が付与されていない複数の文書に対して、ユーザが訴訟との関連性に基づいて付与した分別符号を受け付け、分別符号を付与する分別符号受付付与部131と、分別符号受付付与部131により分別符号を付与された文書を解析する文書解析部118と、文書情報から抽出された複数の文書に対して、分別符号受付付与部131により分別符号を付与された文書を文書解析部118により解析した解析結果に基づいて、分別符号を自動的に付与する第3自動分別部401を備えることができる。 Furthermore, the document analysis system 1 according to the embodiment includes a document display unit 130 that displays a plurality of documents extracted from document information on a screen, and a plurality of documents that are not assigned a classification code extracted from document information. On the other hand, the classification code assigned by the user based on the relevance to the lawsuit is received, and the classification code reception / giving unit 131 for assigning the classification code and the document to which the classification code is given by the classification code reception / giving unit 131 are analyzed. Based on the analysis result of the document analysis unit 118 analyzing the document to which the classification code is given by the classification code reception and grant unit 131 for the plurality of documents extracted from the document information and the document analysis unit 118, the classification code is obtained. A third automatic sorting unit 401 that automatically applies can be provided.
 また、本発明の実施形態に係る文書分析システム1は、抽出した文書の言語の種類を判定する言語判定部120と、ユーザの指定を受け付けて、又は、自動的に、抽出した文書を翻訳する翻訳部122とを備えても良い。1文多言語の複合言語にも対応できるように、言語判定部120における言語の区切りを1文より小さくする。更に、HTMLのヘッダ等を翻訳の対象から除く処理を行うようにしても良い。 Further, the document analysis system 1 according to the embodiment of the present invention translates the extracted document automatically by accepting the language determination unit 120 that determines the language type of the extracted document and the user's specification. A translation unit 122 may be provided. The language delimiter in the language determination unit 120 is set to be smaller than one sentence so as to be able to cope with a single sentence multilingual compound language. Furthermore, a process of removing an HTML header or the like from a translation target may be performed.
 また、本発明の実施形態に係る文書分析システム1は、文書解析部118による解析を行うために、各文書が含む単語の種類、出現数、単語の評価値に基づいて、各文書が持つ分別符号が付与された文書との類似の度合いを表す傾向情報を生成する傾向情報生成部124を備えても良い。 In addition, the document analysis system 1 according to the embodiment of the present invention, in order to perform the analysis by the document analysis unit 118, the classification that each document has based on the type of word, the number of occurrences, and the evaluation value of the word included in each document You may provide the tendency information generation part 124 which produces | generates the tendency information showing the degree of similarity with the document to which the code | symbol was provided.
 また、本発明の実施形態に係る文書分析システム1は、分別符号受付付与部131が受け付けた分別符号と文書解析部118において傾向情報により付与された分別符号を比較し、分別符号受付付与部131が受け付けた分別符号の妥当性を検証する品質検査部501を備えても良い。 Also, the document analysis system 1 according to the embodiment of the present invention compares the classification code received by the classification code reception / giving unit 131 with the classification code given by the trend information in the document analysis unit 118, and the classification code reception / granting unit 131. May include a quality inspection unit 501 that verifies the validity of the classification code received.
 更に、本発明の実施形態に係る文書分析システム1は、文書分析処理の結果をもとに、各キーワード又は関連用語の重みづけを学習する学習部601を備えても良い。 Furthermore, the document analysis system 1 according to the embodiment of the present invention may include a learning unit 601 that learns the weighting of each keyword or related term based on the result of the document analysis processing.
 本発明の実施形態に係る文書分析システム1は、文書分析処理の結果をもとに、訴訟案件又は不正調査の調査種類に合わせて最適な調査レポートの出力を行うための報告作成部701を備えることができる。訴訟案件には、例えば、反トラスト(カルテル)、特許、海外賄賂禁止(FCPA)、又は製造物責任(PL)が含まれる。また、不正調査には、例えば、情報漏洩、架空請求が含まれる。 The document analysis system 1 according to the embodiment of the present invention includes a report creation unit 701 for outputting an optimal investigation report according to a lawsuit case or an investigation type of fraud investigation based on the result of document analysis processing. be able to. Litigation cases include, for example, antitrust (cartel), patents, foreign bribery prohibition (FCPA), or product liability (PL). The fraud investigation includes, for example, information leakage and fictitious claims.
 本発明の実施形態に係る文書分析システム1は、分別調査と報告の質を向上するために、例えば、主任弁護士又は主任弁理士のレビューを受け付ける弁護士レビュー受付部133を備えることができる。 The document analysis system 1 according to the embodiment of the present invention can include, for example, a lawyer review reception unit 133 that receives a review of a chief attorney or a chief patent attorney in order to improve the quality of the classification survey and the report.
 本発明の実施形態に係る文書分析システム1の理解を容易とするために、実施形態に特有な用語について以下に記載する。 In order to facilitate understanding of the document analysis system 1 according to the embodiment of the present invention, terms specific to the embodiment are described below.
 「分別符号」とは、文書を分類する際に用いる識別子のことをいい、訴訟への利用が容易になるように、訴訟との関連度を示すものをいう。例えば、訴訟で文書情報を証拠として利用する際において、証拠の種類に応じて付与してもよい。 “Classification code” refers to an identifier used when classifying documents, and indicates the degree of relevance with a lawsuit so that it can be easily used in a lawsuit. For example, when document information is used as evidence in a lawsuit, it may be given according to the type of evidence.
 「文書」とは、1つ以上の単語を含むデータをいう。「文書」の一例として、電子メール、プレゼンテーション資料、表計算資料、打ち合わせ資料、契約書、組織図、事業計画書等が挙げられる。 “Document” means data containing one or more words. Examples of “documents” include e-mail, presentation materials, spreadsheet materials, meeting materials, contracts, organization charts, business plans, and the like.
 「単語」とは、意味をもつ最少の文字列のまとまりをいう。例えば、「文書とは、1つ以上の単語を含むデータをいう。」という文章の中には、「文書」「1つ」「以上」「単語」「含む」「データ」「いう」という単語が含まれる。 “Word” refers to a group of the smallest character strings that have meaning. For example, in a sentence “document means data including one or more words”, the words “document” “one” “more” “word” “include” “data” “say” Is included.
 「キーワード」とは、ある言語において、一定の意味を持つ文字列のまとまりをいう。例えば、「文書を分別する」という文章からキーワードを選定すると、「文書」「分別」などとすることができる。実施形態においては、「侵害」や「訴訟」、「特許公報○○号」といったキーワードが重点的に選定される。 “Keyword” refers to a group of character strings having a certain meaning in a certain language. For example, if a keyword is selected from a sentence “classify a document”, it can be “document”, “classify”, or the like. In the embodiment, keywords such as “infringement”, “lawsuit”, and “patent publication XX” are selected with priority.
 本実施形態においては、キーワードには、形態素が含まれるものとする。 In this embodiment, it is assumed that the keyword includes a morpheme.
 また、「キーワード対応情報」とは、キーワードと特定の分別符号との対応関係を表すものをいう。例えば、訴訟において重要な文書を表す「重要」という分別符号が「侵害者」というキーワードと密接な関係を持つ場合、上記「キーワード対応情報」は分別符号「重要」とキーワード「侵害者」を紐づけて管理する情報のことをいってもよい。 In addition, “keyword correspondence information” refers to information indicating the correspondence between a keyword and a specific classification code. For example, if the classification code “important” representing an important document in a lawsuit has a close relationship with the keyword “infringer”, the above “keyword correspondence information” links the classification code “important” with the keyword “infringer”. It may also be information that is managed.
 「関連用語」とは、所定の分別符号が付与された文書に共通して出現頻度が高い単語のうち、評価値が一定値以上のものをいう。例えば、出現頻度は、ひとつの文書中に登場する単語の総数のうち、関連用語が出現する割合をいう。 “A related term” refers to a word having an evaluation value equal to or higher than a certain value among words having a high appearance frequency in common with a document to which a predetermined classification code is assigned. For example, the appearance frequency refers to the rate at which related terms appear in the total number of words that appear in one document.
 また、「評価値」は、各単語がある文書中で発揮する情報量をいう。「評価値」は伝達情報量を基準に算出してもよい。例えば、所定の商品名を分別符号として付与する場合、上記「関連用語」は、該商品が属する技術分野の名称、該商品の販売国、該商品の類似商品名等を指してもよい。具体的には、画像符号化処理を行う装置の商品名を分別符号として付与する場合の「関連用語」は、「符号化処理」、「日本」、「エンコーダ」等が挙げられる。 Also, “evaluation value” refers to the amount of information that is exhibited in a document with each word. The “evaluation value” may be calculated based on the amount of transmitted information. For example, when a predetermined product name is assigned as a classification code, the “related term” may refer to the name of the technical field to which the product belongs, the country where the product is sold, the name of a similar product of the product, and the like. Specifically, “related terms” in the case of assigning the product name of the apparatus that performs the image encoding process as a classification code includes “encoding process”, “Japan”, “encoder”, and the like.
 「関連用語対応情報」とは、関連用語と分別符号との対応関係を表すものをいう。例えば、訴訟にかかる商品名である「製品A」という分別符号が製品Aの機能である、「画像符号化」という関連用語を持つ場合、「関連用語対応情報」は、分別符号「製品A」と関連用語「画像符号化」とを紐づけて管理する情報のことをいってもよい。 “Related term correspondence information” refers to information indicating correspondence between related terms and classification codes. For example, when the classification code “product A” which is the product name related to the lawsuit has a related term “image encoding” which is a function of the product A, the “related term correspondence information” is classified into the classification code “product A”. And the related term “image coding” may be associated with each other and managed.
 「スコア」とは、ある文書において、特定の分別符号との結びつきの強さを定量的に評価したものをいう。本発明の各実施形態では、例えば、スコアを以下の式(1)を用いて、文書中に出現する単語と、各単語の持つ評価値とにより算出している。 “Score” refers to a document that quantitatively evaluates the strength of connection with a specific classification code. In each embodiment of the present invention, for example, the score is calculated from the words appearing in the document and the evaluation value possessed by each word using the following equation (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 また、本発明の実施形態に係る文書分析システム1は、ユーザが付与した分別符号が共通である文書中に頻出する単語を抽出しても良い。そして、文書ごとに含まれる、該抽出した単語の種類、各単語がもつ評価値及び出現数の傾向情報を文書ごとに解析し、分別符号受付付与部131により分別符号を受け付けていない文書のうち、解析した傾向情報と同じ傾向をもつ文書に対して、共通の分別符号の付与を行ってもよい。 In addition, the document analysis system 1 according to the embodiment of the present invention may extract words that frequently appear in documents having a common classification code assigned by the user. The extracted word type, the evaluation value of each word, and the trend information of the number of appearances included in each document are analyzed for each document, and the classification code reception / giving unit 131 does not accept the classification code. A common classification code may be assigned to a document having the same tendency as the analyzed trend information.
 ここで、「傾向情報」とは、各文書が持つ、分別符号が付与された文書との類似の度合いを表すものをいい、各文書が含む単語の種類、出現数、単語の評価値に基づく、所定の分別符号との関連度で表される。例えば、各文書が、所定の分別符号を付与された文書と、該所定の分別符号との関連度において類似である場合に、該2つの文書は同じ傾向情報を持つという。また、含まれる単語の種類は異なっていても、評価値が同じ単語を同じ出現数で含む文書について、同じ傾向を持つ文書としてもよい。 Here, “trend information” refers to the degree of similarity between each document and a document to which a classification code is assigned, and is based on the type of word, the number of occurrences, and the word evaluation value included in each document. The degree of relevance with a predetermined classification code. For example, when each document is similar in degree of relevance between a document assigned a predetermined classification code and the predetermined classification code, the two documents have the same tendency information. In addition, even if the types of words included are different, documents having the same evaluation value and the same number of occurrences may be documents having the same tendency.
 次に、本発明の文書分析方法について説明する。 Next, the document analysis method of the present invention will be described.
 図5は、本発明の実施形態に係る文書分析方法(文書分析システムの制御方法)の処理の流れを示すチャートである。 FIG. 5 is a chart showing the flow of processing of the document analysis method (document analysis system control method) according to the embodiment of the present invention.
 最初に、分析部26は、上記訴訟または不正調査に関連する情報、上記生成過程モデル、および上記時系列情報を調査基礎データベース103から読み出す(ステップ41、以下「ステップ」を「S」と略記する)。次に、分析部26は、調査対象データの形態素解析およびキーワード分析を行うことによって(S42)、上記所定の行為に該当する行動を抽出する(S43)。そして、算出部28は、上記所定の行為が生じる可能性を示す指標(ケースインデックス)を上記分析した結果から算出する(S44、算出ステップ)。 First, the analysis unit 26 reads the information related to the lawsuit or fraud investigation, the generation process model, and the time series information from the investigation basic database 103 (step 41, hereinafter “step” is abbreviated as “S”). ). Next, the analysis part 26 extracts the action applicable to the said predetermined | prescribed action by performing the morphological analysis and keyword analysis of investigation object data (S43). And the calculation part 28 calculates the parameter | index (case index) which shows possibility that the said predetermined | prescribed action will arise from the result of the said analysis (S44, calculation step).
 続いて、本発明の文書分析方法の詳細について、図面を参照しながら具体的に説明する。なお、以下に説明する例は一例であって、この例に限定されるものではない。 Subsequently, details of the document analysis method of the present invention will be specifically described with reference to the drawings. In addition, the example demonstrated below is an example, Comprising: It is not limited to this example.
 図6は、本発明の実施形態に係る文書分析方法の詳細なフローチャートである。なお、図5に示されたフローは、図6に示されるフローから独立した処理として実行されてもよいし、図6に示されるフローの任意の箇所に内包される処理として実行されてもよい。 FIG. 6 is a detailed flowchart of the document analysis method according to the embodiment of the present invention. Note that the flow shown in FIG. 5 may be executed as a process independent of the flow shown in FIG. 6, or may be executed as a process included in any part of the flow shown in FIG. .
 表示部の表示画面の表示に応じてユーザから引数の指定を受け付けて、例えば、反トラスト、特許、FCPA、PLを含む訴訟案件、又は情報漏洩、架空請求を含む不正調査から対応するカテゴリを特定することができる(S11)。 Accepts designation of arguments from the user according to the display screen on the display unit, and identifies the corresponding category from litigation cases including antitrust, patents, FCPA, PL, or fraud investigations including information leakage, fictitious claims, etc. (S11).
 特定されたカテゴリに応じて、調査基礎データベース、文書分析データベース等の使用データベースを特定することができる(S12)。 使用 According to the specified category, the use database such as the survey basic database and the document analysis database can be specified (S12).
 使用データベースが最新のものかどうかを確認するために、最新データベースを格納する情報格納装置にアクセスすることができる。情報格納装置は、分別を実施する組織の内部に設置される場合と、組織の外部に設置される場合がある。情報格納装置が組織の外部に設置される場合として、例えば、提携する法律事務所又は特許事務所に設置される場合がある。 In order to check whether the database used is the latest, it is possible to access an information storage device that stores the latest database. The information storage device may be installed inside an organization that performs sorting or may be installed outside the organization. As a case where the information storage device is installed outside the organization, for example, there is a case where the information storage device is installed in an affiliated law firm or patent office.
 情報格納装置にアクセスする場合には、セキュリティーを保持するために、ID及びパスワードによる認証が行われることができる(S13)。 When accessing the information storage device, authentication by ID and password can be performed to maintain security (S13).
 認証が行われた後に、情報石納装置にアクセスすることが許可され、調査基礎データベース、文書分析データベース等の使用データベースが指針のデータベースに更新されることができる(S14)。 After the authentication is performed, access to the information stone storage apparatus is permitted, and the usage database such as the survey basic database and the document analysis database can be updated to the guideline database (S14).
 更新された調査基礎データベースを検索し(S15)、表示装置の画面に会社名、担当者、カストディアンの名前が提示されることができる(S16)。 The updated survey basic database is searched (S15), and the name of the company, the person in charge, and the custodian can be presented on the screen of the display device (S16).
 表示装置の画面に表示される担当者とカストディアンの名前が実際の担当者とカストディアンの名前と異なる場合は、ユーザは表示装置の画面で担当者とカストディアンの名前を修正する。文書分析システムは、ユーザの修正入力を受け付けて、実際の担当者とカストディアンの名前を特定することができる(S17)。 If the name of the person in charge and the custodian displayed on the screen of the display device is different from the name of the person in charge and the custodian actually, the user corrects the names of the person in charge and the custodian on the screen of the display device. The document analysis system can accept the user's correction input and specify the names of the actual person in charge and the custodian (S17).
 次に、文書分析作業を実施するために、デジタル文書情報を抽出することができる(S18)。 Next, digital document information can be extracted in order to perform document analysis work (S18).
 更新された文書分析データベースとして、更新されたキーワードデータベース、関連用語データベース、及びスコア算出データベースを検索して(S19)、抽出文書情報に分別符号を付与することができる(S20)。 As the updated document analysis database, the updated keyword database, related term database, and score calculation database can be searched (S19), and a classification code can be assigned to the extracted document information (S20).
 また、レビュアーによる分別符号を受け付けて、抽出文書情報に分別符号を付与することができる(S21)。 Also, the classification code by the reviewer can be received and the classification code can be given to the extracted document information (S21).
 分別結果を教師データとして、データベースを検索し、抽出文書情報に分別符号を付与することができる(S22)。 The database can be searched using the classification result as teacher data, and a classification code can be assigned to the extracted document information (S22).
 主任弁護士又は弁理士によるレビューを受け付けることができる(S23)。これにより、調査の質を向上させることができる。 [Reviews by the chief attorney or patent attorney can be accepted (S23). This can improve the quality of the survey.
 ユーザの引数指定によりカテゴリを特定し(S24)、特定されたカテゴリに応じて報告作成データベースを特定することができる(S25)。特定された報告作成データベースにより、報告書の形式を定め、報告書を自動出力することができる(S26)。 The category is specified by the user's argument designation (S24), and the report creation database can be specified according to the specified category (S25). The format of the report can be determined by the identified report creation database, and the report can be automatically output (S26).
 図7は、本発明の実施形態に係る文書分析方法における調査種類に応じた調査及び分別処理の流れを示すチャートである。 FIG. 7 is a chart showing a flow of investigation and classification processing according to the investigation type in the document analysis method according to the embodiment of the present invention.
 最初に、調査種類を入力することができる(S31)。すなわち、表示画面の表示に応じて、ユーザが、例えば、反トラスト、特許、海外賄賂禁止(FCPA)、製造物責任(PL)を含む訴訟案件又は情報漏洩、架空請求を含む不正調査から実施しようとする調査及び分別作業と対応するカテゴリを入力する。文書分析システムは、ユーザのカテゴリの入力を受け付けて、調査対象となるカテゴリを特定することができる。 First, the survey type can be input (S31). In other words, depending on the display screen, the user will try to carry out from a fraud investigation including antitrust, patents, litigation cases including overseas bribery prohibition (FCPA), product liability (PL) or information leakage, fictitious claims, etc. Enter the category corresponding to the survey and sorting work. The document analysis system can accept a user category input and specify a category to be investigated.
 特定されたカテゴリに応じて、調査及び文書分析処理の種類と使用するデータベースの種類を判定することができる(S32)。 Depending on the specified category, the type of survey and document analysis processing and the type of database to be used can be determined (S32).
 特定されたカテゴリに応じて、調査基礎データベース、文書分析データベース等の使用データベースに記憶された情報のストックにアクセスしてもよい(S33)。 Depending on the specified category, information stock stored in a usage database such as a survey basic database or a document analysis database may be accessed (S33).
 特定されたカテゴリに応じて調査基礎データベースにアクセスし、特定されたカテゴリに応じた各キーワード入力画面を表示することができる(S34)。 調査 The survey basic database is accessed according to the specified category, and each keyword input screen corresponding to the specified category can be displayed (S34).
 特定されたカテゴリに応じて調査基礎データベースにアクセスし、特定されたカテゴリに応じた各文章入力画面を表示することができる(S35) ∙ Access the survey basic database according to the specified category, and display each text entry screen according to the specified category (S35)
 特定されたカテゴリに応じて調査基礎データベースにアクセスし、特定されたカテゴリに応じてキーワードもしくは文書を抽出することができる(S36)。 調査 The survey basic database is accessed according to the specified category, and keywords or documents can be extracted according to the specified category (S36).
 上述の処理を実行することにより、自動分別符号付与(プレディクティブコーディング)の教師データに重み付けを追加して行うことができる(S37)。 By executing the above-described processing, it is possible to add the weighting to the automatic classification code assignment (predictive coding) teacher data (S37).
 文書分析データベースをキーワード検索することにより、抽出文書及び情報の絞り込みを行うことができる(S38)。 The extracted documents and information can be narrowed down by performing a keyword search in the document analysis database (S38).
 図8は、本発明の実施形態に係る文書分析方法における調査種類に応じたプレディクティブコーディングの流れを示すチャートである。 FIG. 8 is a chart showing the flow of predictive coding according to the investigation type in the document analysis method according to the embodiment of the present invention.
 本発明の実施形態に係る文書分析方法では、最初に、文書分析システムが調査の種類に応じてユーザに入力を求め、それに対するユーザの入力を受け付けることができる。例えば、反トラスト法と関連してカルテルについて、対象製品、関係者(氏名とメールアドレス)、関係組織(名称と部門)及び時期について、ユーザの入力を求め、それに対するユーザの入力を受け付けることができる。その他に、関係組織については、競争相手企業と顧客企業に関してユーザの入力を求め、それに対するユーザの入力を受け付けることができる(S51)。 In the document analysis method according to the embodiment of the present invention, first, the document analysis system can ask the user for input according to the type of survey, and can accept the user's input for that. For example, regarding cartels in relation to the antitrust law, user input is requested for target products, parties (name and email address), related organizations (name and department), and time, and user input is accepted. it can. In addition, regarding related organizations, it is possible to request user input regarding competitor companies and customer companies, and accept user input in response to the input (S51).
 次に、入力キーワードによって、分別符号付与に対する重み付けを行うことができる(S52)。そして、プレディクティブコーディングを行うことができる(S53)。 Next, it is possible to weight the classification code with the input keyword (S52). Then, predictive coding can be performed (S53).
 本発明の実施形態では、一例として、図9に示すようなフローチャートに従い、第1段階~第5段階で、登録処理、分別処理、及び検査処理を行う。 In the embodiment of the present invention, as an example, according to the flowchart shown in FIG. 9, the registration process, the classification process, and the inspection process are performed in the first to fifth stages.
 第1段階では、過去の分別処理の結果を用いて、事前にキーワードと関連用語の更新登録を行う(STEP100)。このとき、キーワード及び関連用語は、分別符号とキーワード又は関連用語の対応情報であるキーワード対応情報及び関連用語対応情報とともに更新登録される。 In the first stage, keywords and related terms are updated and registered in advance using the results of past classification processing (STEP 100). At this time, the keyword and the related term are updated and registered together with the keyword correspondence information and the related term correspondence information which are correspondence information between the classification code and the keyword or the related term.
 第2段階では、第1段階で更新登録されたキーワードを含む文書を全文書情報から抽出し、該文書を発見すると第1段階で記録した更新キーワード対応情報を参照し、該キーワードに対応する分別符号を付与する第1分別処理を行う(STEP200)。 In the second stage, a document including the keyword updated and registered in the first stage is extracted from all document information. When the document is found, the updated keyword correspondence information recorded in the first stage is referred to, and the classification corresponding to the keyword is performed. A first separation process for assigning a code is performed (STEP 200).
 第3段階では、第1段階で更新登録された関連用語を含む文書を、第2段階で分別符号を付与されなかった文書情報から抽出し、該関連用語を含む文書のスコアを算出する。該算出したスコアと第1段階で更新登録された関連用語対応情報を参照し、分別符号の付与を実行する第2分別処理を行う(STEP300)。 In the third stage, the document including the related term updated and registered in the first stage is extracted from the document information that has not been given the classification code in the second stage, and the score of the document including the related term is calculated. With reference to the calculated score and the related term correspondence information updated and registered in the first stage, a second classification process is performed in which a classification code is assigned (STEP 300).
 第4段階では、第3段階までに分別符号を付与されなかった文書情報に対して、ユーザが付与した分別符号を受け付け、該文書情報に対してユーザから受け付けた分別符号を付与する。次に、ユーザから受け付けた分別符号を付与された文書情報を解析し、解析結果に基づいて、分別符号が付与されていない文書を抽出して、抽出した文書に分別符号を付与する第3分別処理を行う。例えば、該ユーザが付与した分別符号が共通である文書中に頻出する語を抽出し、文書ごとに含まれる、抽出した単語の種類、各単語が持つ評価値及び出現数の傾向情報を文書ごとに解析し、該傾向情報と同じ傾向を持つ文書に対して、共通の分別符号の付与を行う(STEP400)。 In the fourth stage, the classification code given by the user is accepted for the document information that has not been given the classification code by the third stage, and the classification code accepted from the user is given to the document information. Next, the document information provided with the classification code received from the user is analyzed, the document without the classification code is extracted based on the analysis result, and the third classification for adding the classification code to the extracted document Process. For example, words that frequently appear in documents with a common classification code assigned by the user are extracted, and the types of extracted words, evaluation values possessed by each word, and trend information on the number of appearances are included for each document. And a common classification code is assigned to a document having the same tendency as the trend information (STEP 400).
 第5段階では、第4段階でユーザが分別符号を付与した文書に対して、解析した傾向情報に基づいて付与すべき分別符号を決定し、該決定した分別符号とユーザの付与した分別符号を比較し、分別処理の妥当性の検証を行う(STEP500)。また、必要に応じて、文書分析処理の結果に基づいて学習処理を行っても良い。 In the fifth stage, the classification code to be given is determined based on the analyzed trend information for the document to which the user has given the classification code in the fourth stage, and the determined classification code and the classification code given by the user are determined. The validity of the sorting process is verified by comparison (STEP 500). Moreover, you may perform a learning process based on the result of a document analysis process as needed.
 第4段階及び第5段階の処理に用いられる傾向情報は、各文書が持つ、分別符号が付与された文書との類似の度合いを表すものをいい、各文書が含む単語の種類、出現数、単語の評価値に基づくものをいう。例えば、各文書が、所定の分別符号を付与された文書と、該所定の分別符号との関連度において類似である場合に、該2つの文書は同じ傾向情報を持つという。また、含まれる単語の種類は異なっていても、評価値が同じ単語を同じ出現数で含む文書について、同じ傾向を持つ文書としてもよい。 The trend information used in the fourth and fifth stage processing refers to the degree of similarity between each document and the document to which the classification code is assigned. The type of word included in each document, the number of occurrences, This is based on the evaluation value of a word. For example, when each document is similar in degree of relevance between a document assigned a predetermined classification code and the predetermined classification code, the two documents have the same tendency information. In addition, even if the types of words included are different, documents having the same evaluation value and the same number of occurrences may be documents having the same tendency.
 第1段階から第5段階の各段階における詳細な処理フローを以下で説明する。 The detailed processing flow in each stage from the first stage to the fifth stage will be described below.
 <第1段階(STEP100)>
 第1段階におけるキーワードデータベース104の詳細な処理フローを図10を用いて説明する。
<First stage (STEP 100)>
A detailed processing flow of the keyword database 104 in the first stage will be described with reference to FIG.
 キーワードデータベース104は、過去の訴訟において文書を分別した結果を踏まえ、それぞれの分別符号ごとに管理用のテーブルを作成し、各分別符号に対応するキーワードを特定する(STEP111)。この特定は、本発明の実施形態においては、各分別符号が付与された文書を解析し、該文書中の各キーワードの出現数及び評価値を用いて行うが、キーワードが持つ伝達情報量を用いる方法や、ユーザが手動で選択する方法等を用いてもよい。 The keyword database 104 creates a management table for each classification code based on the result of classifying documents in past lawsuits, and specifies keywords corresponding to each classification code (STEP 111). In the embodiment of the present invention, in the embodiment of the present invention, the document to which each classification code is assigned is analyzed, and the number of occurrences of each keyword in the document and the evaluation value are used. A method, a method of manual selection by the user, or the like may be used.
 本発明の実施形態においては、例えば、分別符号「重要」のキーワードとして「侵害」及び「弁理士」というキーワードが特定された場合、「侵害」及び「弁理士」が分別符号「重要」と密接な関係を持つキーワードであることを示すキーワード対応情報を作成する(STEP112)。そして、特定されたキーワードをキーワードデータベース104に登録する。この際、特定されたキーワードとキーワード対応情報を関係付けてキーワードデータベース104の分別符号「重要」の管理テーブルに記録する(STEP113)。 In the embodiment of the present invention, for example, when keywords “infringement” and “patent attorney” are specified as keywords of the classification code “important”, “infringement” and “patent attorney” are closely related to the classification code “important”. The keyword correspondence information indicating that the keyword has a special relationship is created (STEP 112). Then, the identified keyword is registered in the keyword database 104. At this time, the identified keyword is associated with the keyword correspondence information and recorded in the management table of the classification code “important” in the keyword database 104 (STEP 113).
 次に、関連用語データベース105の詳細な処理フローを図11を用いて説明する。関連用語データベース105は、過去の訴訟において文書を分別した結果を踏まえ、それぞれの分別符号ごとに管理用のテーブルを作成し、各分別符号に対応する関連用語を登録する(STEP121)。本発明の実施形態においては、例えば、「製品A」の関連用語として「符号化処理」及び「製品a」並びに「製品B」の関連用語として「復号化」及び「製品b」を登録する。 Next, a detailed processing flow of the related term database 105 will be described with reference to FIG. The related term database 105 creates a management table for each classification code based on the results of document classification in past lawsuits, and registers related terms corresponding to each classification code (STEP 121). In the embodiment of the present invention, for example, “encoding process” and “product a” are registered as related terms of “product A”, and “decoding” and “product b” are registered as related terms of “product B”.
 登録したそれぞれの関連用語がどの分別符号に対応するものかを示す関連用語対応情報を作成し(STEP122)、各管理テーブルに記録する(STEP123)。このとき、関連用語対応情報には、各関連用語の持つ評価値及び分別符号を決定するのに必要なスコアとなる閾値も併せて記録される。 The related term correspondence information indicating which classification code each registered related term corresponds to is created (STEP 122) and recorded in each management table (STEP 123). At this time, the related term correspondence information also records a threshold value serving as a score necessary for determining an evaluation value and a classification code of each related term.
 実際に分別作業を行う前に、キーワードとキーワード対応情報、及び関連用語と関連用語対応情報を最新のものに更新登録する(STEP113、STEP123)。 Before actually performing the sorting work, the keyword and the keyword correspondence information, and the related term and the related term correspondence information are updated and registered (STEP 113, STEP 123).
 <第2段階(STEP200)>
 第2段階における第1自動分別部201の詳細な処理フローを、図12を用いて説明する。本発明の実施形態において、第2段階では、第1自動分別部201によって、分別符号「重要」を文書に付与する処理を行う。
<Second stage (STEP 200)>
A detailed processing flow of the first automatic sorting unit 201 in the second stage will be described with reference to FIG. In the embodiment of the present invention, in the second stage, the first automatic classification unit 201 performs a process of assigning the classification code “important” to the document.
 第1自動分別部201では、第1段階(STEP100)でキーワードデータベース104に登録したキーワード「侵害」及び「弁理士」を含む文書を文書情報から抽出する(STEP211)。該抽出した文書に対して、キーワード対応情報から、該キーワードが記録されている管理テーブルを参照し(STEP212)、「重要」という分別符号を付与する(STEP213)。 The first automatic sorting unit 201 extracts documents including the keywords “infringement” and “patent attorney” registered in the keyword database 104 in the first stage (STEP 100) from the document information (STEP 211). The extracted document is referred to from the keyword correspondence information with reference to the management table in which the keyword is recorded (STEP 212), and a classification code of “important” is given (STEP 213).
 <第3段階(STEP300)>
 第3段階における第2自動分別部301の詳細な処理フローを、図13を用いて説明する。
<Third stage (STEP 300)>
A detailed processing flow of the second automatic sorting unit 301 in the third stage will be described with reference to FIG.
 本発明の実施形態において、第2自動分別部301では、第2段階(STEP200)で分別符号を付与しなかった文書情報に対して、「製品A」及び「製品B」という分別符号を付与する処理を行う。 In the embodiment of the present invention, the second automatic classification unit 301 assigns the classification codes “product A” and “product B” to the document information that has not been assigned the classification code in the second stage (STEP 200). Process.
 第2自動分別部301は、該文書情報から、第1段階で関連用語データベース105に記録した関連用語「符号化処理」、「製品a」、「復号化」及び「製品b」を含む文書を抽出する(STEP311)。該抽出した文書に対して、記録した4つの関連用語の出現頻度、評価値に基づいて、式(1)を用いて、スコア算出部116によりスコアを算出する(STEP312)。該スコアは各文書と分別符号「製品A」及び「製品B」との関連度を表している。 From the document information, the second automatic classification unit 301 records a document including related terms “encoding process”, “product a”, “decoding”, and “product b” recorded in the related term database 105 in the first stage. Extract (STEP 311). Based on the recorded appearance frequency and evaluation value of the four related terms, the score is calculated by the score calculation unit 116 using the expression (1) (STEP 312). The score represents the degree of association between each document and the classification codes “product A” and “product B”.
 該スコアが閾値を超過した場合、関連用語対応情報を参照し(STEP313)、適切な分別符号を付与する(STEP314)。 When the score exceeds the threshold, the related term correspondence information is referred to (STEP 313), and an appropriate classification code is assigned (STEP 314).
 例えば、ある文書において関連用語「符号化処理」及び「製品a」の出現頻度並びに関連用語「符号化処理」が持つ評価値が高く、分別符号「製品A」との関連度を示すスコアが閾値を超過した際、該文書には分別符号「製品A」が付与される。 For example, in a document, the appearance frequency of the related terms “encoding process” and “product a” and the evaluation value of the related term “encoding process” are high, and the score indicating the degree of association with the classification code “product A” is a threshold value. Is exceeded, the document is given a classification code “Product A”.
 このとき、該文書に関連用語「製品b」の出現頻度も高く、分別符号「製品B」との関連度を示すスコアが閾値を超過した場合、該文書には分別符号「製品A」と併せて、「製品B」も付与される。一方、該文書に関連用語「製品b」の出現頻度が低く、分別符号「製品B」との関連度を示すスコアが閾値を超過しなかった場合には、該文書には分別符号「製品A」のみが付与される。 At this time, when the appearance frequency of the related term “product b” is high in the document and the score indicating the degree of association with the classification code “product B” exceeds the threshold, the document is also combined with the classification code “product A”. "Product B" is also given. On the other hand, when the appearance frequency of the related term “product b” is low in the document and the score indicating the degree of association with the classification code “product B” does not exceed the threshold, the classification code “product A” is included in the document. "Is granted.
 第2自動分別部301では、第4段階のSTEP432において算出されるスコアを用いて以下に示す式(2)により、関連用語の評価値を再計算し、該評価値の重みづけを行う(STEP315)。 The second automatic classification unit 301 recalculates the evaluation value of the related term using the score calculated in STEP 432 in the fourth stage according to the following equation (2), and weights the evaluation value (STEP 315). ).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 例えば、「復号化」の出現頻度が非常に高いがスコアが一定値以上低い、という文書が一定数以上発生した場合、関連用語「復号化」の評価値を下げて再度、関連用語対応情報に記録する。 For example, if there are more than a certain number of documents where the appearance frequency of “decryption” is very high but the score is lower than a certain value, the evaluation value of the related term “decoding” is lowered and the related term correspondence information is again displayed. Record.
 <第4段階(STEP400)>
 第4段階では、図14に示すように、第3段階までの処理において、分別符号が付与されなかった文書情報から抽出した一定の割合の文書情報に対して、レビュワーからの分別符号の付与を受け付け、当該文書情報に受け付けた分別符号を付与する。次に、図15に示すように、レビュワーから受け付けた分別符号を付与された文書情報を解析し、その解析結果に基づいて、分別符号が付与されていない文書情報に分別符号を付与する。なお、本発明の実施形態においては、該文書情報に対して、第4段階では、例えば、「重要」、「製品A」及び「製品B」という分別符号を付与する処理を行う。第4段階について、更に以下に記載する。
<Fourth stage (STEP 400)>
In the fourth stage, as shown in FIG. 14, in the process up to the third stage, the classification code from the reviewer is given to the document information of a certain ratio extracted from the document information to which the classification code is not given. Acceptance and the accepted classification code are assigned to the document information. Next, as shown in FIG. 15, the document information given the classification code received from the reviewer is analyzed, and based on the analysis result, the classification code is given to the document information not given the classification code. In the embodiment of the present invention, in the fourth stage, for example, a process of assigning classification codes of “important”, “product A”, and “product B” is performed on the document information. The fourth stage is further described below.
 第4段階における分別符号受付付与部131の詳細な処理フローを、図14を用いて説明する。第4段階での処理対象となる文書情報からまず情報抽出部24が、ランダムに文書をサンプリングし、文書表示部130上で表示する。本発明の実施形態では、処理対象となる文書情報のうち2割の文書をランダムに抽出し、レビュワーによる分別対象とする。サンプリングは、文書の作成日時順や、名称順に文書を並べ、上から3割の文書を選ぶという抽出の仕方をしてもよい。 The detailed processing flow of the classification code reception assigning unit 131 in the fourth stage will be described with reference to FIG. From the document information to be processed in the fourth stage, the information extraction unit 24 first samples a document at random and displays it on the document display unit 130. In the embodiment of the present invention, 20% of the document information to be processed is extracted at random and set as a classification target by the reviewer. Sampling may be an extraction method in which documents are arranged in order of document creation date and time or in order of name, and 30% of documents are selected from the top.
 ユーザは文書表示部130上に表示される図20に示す表示用画面11を閲覧し、各文書に対して付与する分別符号を選択する。分別符号受付付与部131は、該ユーザが選択した分別符号を受け付け(STEP411)、付与された分別符号に基づいて分別する(STEP412)。 The user browses the display screen 11 shown in FIG. 20 displayed on the document display unit 130, and selects a classification code to be assigned to each document. The classification code reception / giving unit 131 receives the classification code selected by the user (STEP 411), and sorts based on the given classification code (STEP 412).
 次に、文書解析部118の詳細な処理フローを、図15を用いて説明する。文書解析部118では、分別符号受付付与部131で分別符号ごとに分別された文書に共通して頻出する単語を抽出する(STEP421)。抽出した共通の単語の評価値を式(2)により解析し(STEP422)、該共通の単語の文書中の出現頻度を解析する(STEP423)。 Next, a detailed processing flow of the document analysis unit 118 will be described with reference to FIG. The document analysis unit 118 extracts words that frequently appear in the documents classified by classification code by the classification code reception / giving unit 131 (STEP 421). The evaluation value of the extracted common word is analyzed by Expression (2) (STEP 422), and the appearance frequency of the common word in the document is analyzed (STEP 423).
 さらに、STEP422及びSTEP423によって解析した結果を踏まえて、「重要」という分別符号が付与された文書の傾向情報を解析する(STEP424)。 Further, based on the results analyzed in STEP 422 and STEP 423, the trend information of the document to which the classification code “important” is assigned is analyzed (STEP 424).
 図16は、STEP424によって、「重要」という分別符号が付与された文書に共通して頻出する単語を解析した結果のグラフである。 FIG. 16 is a graph showing a result of analyzing words frequently appearing in the document to which the classification code “important” is assigned in STEP424.
 図16において、縦軸R_hotは、ユーザによって分別符号「重要」が付与された全文書のうち、分別符号「重要」に紐づく単語として選定された単語を含み、かつ分別符号「重要」が付与された文書の割合を示している。横軸は、ユーザが分別処理を実施した全文書のうち、分別符号受付付与部131によってSTEP421で抽出された単語を含む文書の割合を示している。 In FIG. 16, the vertical axis R_hot includes words selected as words associated with the classification code “important” among all documents to which the classification code “important” is assigned by the user, and the classification code “important” is assigned. Shows the percentage of documents that were used. The horizontal axis indicates the ratio of documents including the words extracted in STEP 421 by the classification code receiving and assigning unit 131 among all the documents subjected to the classification process by the user.
 本発明の実施形態において、分別符号受付付与部131では、直線R_hot=R_allよりも上部にプロットされるような単語を、分別符号「重要」における共通の単語として抽出する。 In the embodiment of the present invention, the classification code receiving / giving unit 131 extracts words that are plotted above the straight line R_hot = R_all as common words in the classification code “important”.
 STEP421乃至STEP424の処理を、「製品A」及び「製品B」という分別符号が付与された文書に対しても実行し、該文書の傾向情報を解析する。 The processing of STEP 421 to STEP 424 is also executed for the documents to which the classification codes “product A” and “product B” are assigned, and the trend information of the documents is analyzed.
 次に、第3自動分別部401の詳細な処理フローを、図17を用いて説明する。第3自動分別部401では、第4段階での処理対象の文書情報のうち、STEP411で分別符号受付付与部131によって分別符号の付与が受け付けられなかった文書に対して処理を行う。第3自動分別部401では、このような文書から、STEP424で解析した、分別符号「重要」、「製品A」及び「製品B」が付与された文書の傾向情報と、同じ傾向情報を持つ文書を、抽出し(STEP431)、抽出した文書について、傾向法をもとに式(1)を用いてスコアを算出する(STEP432)。また、STEP431で抽出した文書に対して、傾向情報に基づいて適切な分別符号を付与する(STEP433)。 Next, a detailed processing flow of the third automatic sorting unit 401 will be described with reference to FIG. The third automatic classification unit 401 performs processing on a document whose classification code is not accepted by the classification code acceptance and grant unit 131 in STEP 411 out of the document information to be processed in the fourth stage. In the third automatic classification unit 401, a document having the same trend information as the trend information of the document to which the classification codes “important”, “product A”, and “product B” are assigned, analyzed in STEP 424 from such a document. Are extracted (STEP 431), and the score of the extracted document is calculated using the formula (1) based on the trend method (STEP 432). Also, an appropriate classification code is assigned to the document extracted in STEP 431 based on the trend information (STEP 433).
 第3自動分別部401では、さらに、STEP432で算出したスコアを用いて、分別結果を各データベースに反映する(STEP434)。具体的には、スコアの低い文書に含まれているキーワード及び関連用語の評価値を下げ、スコアの高い文書に含まれているキーワード及び関連用語の評価値を上げる処理を行っても良い。 The third automatic sorting unit 401 further reflects the sorting result in each database using the score calculated in STEP 432 (STEP 434). Specifically, a process of lowering the evaluation values of keywords and related terms included in a document having a low score and increasing the evaluation values of keywords and related terms included in a document having a high score may be performed.
 更に、第3自動分別部401の詳細な処理フローの一例を、図18を用いて説明する。第3自動分別部401では、第4段階での処理対象の文書情報のうち、STEP411で分別符号受付付与部131によって分別符号の付与が受け付けられなかった文書に対して分別処理を行っても良い。第3自動分別部401では、引数が与えられなかった場合には(STEP441:なし)、該文書から、STEP424で解析した、分別符号「重要」が付与された文書の傾向情報と、同じ傾向情報を持つ文書を、抽出し(STEP442)、抽出した文書について、傾向情報をもとに式(1)を用いてスコアを算出する(STEP443)。また、STEP442で抽出した文書に対して、傾向情報に基づいて適切な分別符号を付与する(STEP444)。 Furthermore, an example of a detailed processing flow of the third automatic sorting unit 401 will be described with reference to FIG. The third automatic classification unit 401 may perform a classification process on a document whose classification code is not given by the classification code reception and grant unit 131 in STEP 411 among the document information to be processed in the fourth stage. . In the third automatic sorting unit 401, when no argument is given (STEP 441: None), the same trend information as the trend information of the document to which the classification code “important” is assigned, analyzed from the document in STEP 424. Is extracted (STEP 442), and the score of the extracted document is calculated using equation (1) based on the trend information (STEP 443). Further, an appropriate classification code is assigned to the document extracted in STEP 442 based on the trend information (STEP 444).
 第3自動分別部401では、さらに、STEP443で算出したスコアを用いて、分別結果を各データベースに反映する(STEP445)。具体的には、スコアの低い文書に含まれているキーワード及び関連用語の評価値を下げ、一方、スコアの高い文書に含まれているキーワード及び関連用語の評価値を上げる処理を行う。 The third automatic sorting unit 401 further reflects the sorting result in each database using the score calculated in STEP 443 (STEP 445). Specifically, the evaluation value of the keyword and the related term included in the document with a low score is lowered, while the evaluation value of the keyword and the related term included in the document with a high score is increased.
 上述のように第2自動分別部301と第3自動分別部401の両方でスコア算出が行われ、スコア算出の回数が多くなる場合には、スコア算出のためのデータをスコア算出データベース106に一括して格納しても良い。 As described above, when the score calculation is performed in both the second automatic classification unit 301 and the third automatic classification unit 401 and the number of score calculations increases, the data for score calculation is collectively stored in the score calculation database 106. May be stored.
 <第5段階(STEP500)>
 第5段階における品質検査部501の詳細な処理フローを図19を用いて説明する。品質検査部501では、分別符号受付付与部131が、STEP411で受け付けた文書に対して、文書解析部118がSTEP424で解析した傾向情報に基づいて、付与されるべき分別符号を決定する(STEP511)。
<Fifth stage (STEP 500)>
A detailed processing flow of the quality inspection unit 501 in the fifth stage will be described with reference to FIG. In the quality inspection unit 501, the classification code reception / giving unit 131 determines the classification code to be given to the document received in STEP 411 based on the trend information analyzed by the document analysis unit 118 in STEP 424 (STEP 511). .
 分別符号受付付与部131が受け付けた分別符号とSTEP511で決定した分別符号とを比較し(STEP512)、STEP411で受け付けた分別符号の妥当性を検証する(STEP513)。 The classification code received by the classification code reception / giving unit 131 is compared with the classification code determined in STEP 511 (STEP 512), and the validity of the classification code received in STEP 411 is verified (STEP 513).
 本発明の実施形態に係る文書分析システム1は、学習部601を備えても良い。学習部601では、第1から第4の処理結果をもとに、各キーワード又は関連用語の重みづけを式(2)により学習する。該学習結果をキーワードデータベース104、関連用語データベース105、又はスコア算出データベース106に反映しても良い。 The document analysis system 1 according to the embodiment of the present invention may include a learning unit 601. The learning unit 601 learns the weighting of each keyword or related term based on the first to fourth processing results using Expression (2). The learning result may be reflected in the keyword database 104, the related term database 105, or the score calculation database 106.
 本発明の実施形態に係る文書分析システム1は、文書分析処理の結果をもとに、訴訟案件(例えば、訴訟であればカルテル・特許・FCPA・PLなど)又は不正調査(例えば、情報漏洩、架空請求など)の調査種類に合わせて最適な調査レポートの出力を行うための報告作成部701を備えることができる。 The document analysis system 1 according to the embodiment of the present invention is based on the result of the document analysis processing, and a lawsuit case (for example, a cartel / patent / FCPA / PL if a lawsuit) or a fraud investigation (for example, information leakage, It is possible to provide a report creation unit 701 for outputting an optimum survey report according to the survey type (eg, fictitious billing).
 調査種類によって、調査する内容は異なる。
 例えば、カルテル案件であれば、
The contents of the survey vary depending on the survey type.
For example,
1.競合の担当者がカルテルに関連する意思疎通(価格の調整)を、いつ・どのように取ったか?
2.関係者はどの組織の誰か?
がポイントになる。
1. When and how did the competing personnel communicate with the cartel (price adjustment)?
2. Who is the organization involved?
Is the point.
 また、特許侵害であれば、
1.侵害の対象となっている技術と内容が同じか?
2.誰が、いつ、どのような意図をもって(もたずに)侵害したか、もしくはしていないか?
といったことがポイントになる。
In case of patent infringement,
1. Is the content the same as the technology being infringed?
2. Who, when, what intention (without) infringing or not infringing?
That is the point.
 本発明の実施形態の他の実施例について以下に記載する。 Other examples of the embodiment of the present invention will be described below.
 本発明の実施形態の他の実施例では、類似の検索情報に対応して、既に分別符号を付与した文書を解析し、解析結果に基づいて分別符号を付与する範囲を調整する方法を用いる。 In another example of the embodiment of the present invention, a method of analyzing a document that has already been given a classification code corresponding to similar search information and adjusting a range to which the classification code is assigned based on the analysis result is used.
 類似の検索情報に対応して分別符号を付与する範囲を調整する方法として、類似の検索情報に対応して類似の検索情報をクラスタリングして分別符号を付与する範囲を調整する方法と、分別結果を学習して予測分別を行う方法がある。類似の検索情報に対応して類似の検索情報をクラスタリングして分別符号を付与する範囲を調整する方法には、例えば、メタデータの共通性に着目して、原文書、原文書の返信文書、原文書の返信文書の返信文書に共通の分別符号を付与する場合がある。分別結果を学習して予測分別を行う方法では、分別結果について類似の検索情報を統合するように学習することによって、類似の検索情報について同一又は類似の分別符号を付与する。 As a method of adjusting the range to which the classification code is assigned corresponding to similar search information, the method of adjusting the range to which the classification code is assigned by clustering similar search information corresponding to the similar search information, and the classification result There is a method to perform prediction classification by learning. In order to adjust the range of clustering similar search information corresponding to similar search information and assigning a classification code, for example, focusing on the commonality of metadata, the original document, the reply document of the original document, A common classification code may be given to the reply document of the reply document of the original document. In the method of learning classification results and performing predictive classification, the same or similar classification codes are given to similar search information by learning to integrate similar search information for the classification results.
 本発明の実施形態の他の実施例では、解析の対象となる文書の件数により、解析結果の信頼性が変化する。分別の対象となる文書の全件数に対して、統計的手法を加えて、どの時点で、全文書のどの割合について、解析結果に基づいて分別符号を付与する範囲を調整するか定めても良い。 In another example of the embodiment of the present invention, the reliability of the analysis result varies depending on the number of documents to be analyzed. A statistical method may be added to the total number of documents to be classified to determine at what time point the percentage of all documents to be adjusted for the range to which the classification code is assigned based on the analysis results. .
 本発明の実施形態の他の実施例では、類似の検索情報に対応して分別符号を付与する範囲を調整する方法として、類似の検索情報に対応して検索情報をクラスタリングして分別符号を付与する範囲を調整する方法と、分別結果を学習して予測分別を行う方法の両方を実行して、分別符号を付与する文書の範囲を調整しても良い。これにより、本発明の実施形態の他の実施例では、迅速で的確な分別符号の付与を可能にすると共に、分別作業に伴う負担を軽減することができる。 In another example of the embodiment of the present invention, as a method of adjusting the range to which the classification code is assigned corresponding to the similar search information, the classification is performed by clustering the search information corresponding to the similar search information. The range of the document to which the classification code is assigned may be adjusted by executing both the method of adjusting the range to be performed and the method of performing the prediction classification by learning the classification result. Accordingly, in another example of the embodiment of the present invention, it is possible to quickly and accurately assign a classification code, and to reduce the burden associated with the classification work.
 本発明の実施形態の他の実施例では、ユーザに対し、調査種類判定部が抽出した情報の種類を提示する表示画面を制御する表示画面制御部を備えることができる。 In another example of the embodiment of the present invention, a display screen control unit that controls a display screen that presents the type of information extracted by the survey type determination unit to the user may be provided.
 本発明の実施形態の他の実施例では、表示画面制御部に提示された情報の種類に対応した、ユーザによるキーワードおよび/または文章の入力を受け付ける入力受付部を備えることができる。 In another example of the embodiment of the present invention, an input receiving unit that receives a keyword and / or sentence input by a user corresponding to the type of information presented on the display screen control unit may be provided.
 本発明の文書分析プログラムは、所定のコンピュータまたはサーバに記録された情報を取得し、当該取得された情報に含まれる、複数の文書から構成される文書情報を分析する文書分析プログラムであって、コンピュータに、訴訟または不正調査の原因となる所定の行為が生じる生成過程モデルを、当該所定の行為の進展に応じて分類するフェーズごとに格納するとともに、前記訴訟または不正調査に関連する情報を、当該訴訟または不正調査が属するカテゴリおよび前記生成過程モデルごとにさらに格納し、前記フェーズの時間的な序列を示す時系列情報をさらに格納する調査基礎データベースを参照することによって、前記訴訟または不正調査に関連する情報、前記生成過程モデル、および前記時系列情報に基づいて前記文書情報を分析し、前記所定の行為が生じる可能性を示す指標を当該分析した結果から算出させる算出機能を実現させる。 The document analysis program of the present invention is a document analysis program for acquiring information recorded in a predetermined computer or server and analyzing document information composed of a plurality of documents included in the acquired information. The computer stores a generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs, for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation, The lawsuit or fraud investigation is further stored for each category to which the lawsuit or fraud investigation belongs and the generation process model, and the lawsuit or fraud investigation is referred to by referring to a survey basic database further storing time series information indicating the temporal order of the phases. The document information is classified based on related information, the generation process model, and the time series information. And, an index indicating the likelihood that the predetermined action is caused to realize the calculation function to calculate the result of the analysis.
 上記算出機能は、上記算出部により実現されることができる。詳細については上述した通りである。 The calculation function can be realized by the calculation unit. Details are as described above.
 本発明の実施形態は、訴訟案件又は不正調査案件のカテゴリについてユーザの入力を受け付けることにより、カテゴリに応じて自動的にデータベースを更新する。これにより担当者、カストディアンの氏名等を入力する事務作業の負担が軽減される。また、カテゴリに応じて自動的に更新されたデータベースにより検索ワードを調整し、調整された検索ワードを用いて当該文書情報に対して分別符号を自動で付与する。これにより、訴訟又は不正調査案件に利用する文書情報の分別作業の負担が軽減される。 The embodiment of the present invention automatically updates the database according to a category by accepting a user input for a category of litigation case or fraud investigation case. As a result, the burden of office work for inputting the names of persons in charge, custodians, etc. is reduced. Further, the search word is adjusted by the database automatically updated according to the category, and a classification code is automatically assigned to the document information using the adjusted search word. This reduces the burden of sorting the document information used for litigation or fraud investigation cases.
 すなわち、本発明により、訴訟に利用する文書情報の分析が容易になる。 That is, according to the present invention, analysis of document information used in a lawsuit becomes easy.
 文書分析システム1の制御ブロックは、集積回路(ICチップ)等に形成された論理回路(ハードウェア)によって実現してもよいし、CPU(Central Processing Unit)を用いてソフトウェアによって実現してもよい。後者の場合、文書分析システム1は、各機能を実現するソフトウェアであるプログラム(制御プログラム)の命令を実行するCPU、上記プログラム及び各種データがコンピュータ(又はCPU)で読み取り可能に記録されたROM(Read Only Memory)又は記憶装置(これらを「記録媒体」と称する)、上記プログラムを展開するRAM(Random Access Memory)などを備えている。そして、コンピュータ(又はCPU)が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体(通信ネットワークや放送波等)を介して上記コンピュータに供給されてもよい。本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 The control block of the document analysis system 1 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software using a CPU (Central Processing Unit). . In the latter case, the document analysis system 1 includes a CPU that executes instructions of a program (control program) that is software that implements each function, and a ROM (in which the program and various data are recorded so as to be readable by the computer (or CPU) A Read Only Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
 本発明は上述したそれぞれの実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても、本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成できる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope of the claims, and the embodiments can be obtained by appropriately combining technical means disclosed in different embodiments. The form is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.
 複数のコンピュータまたはサーバに記録されたデジタル情報を取得し、該取得されたデジタル情報に含まれる、複数の文書から構成される文書情報を分析し、訴訟又は不正調査への利用を容易にする文書分析システムであって、前記訴訟又は不正調査に関連する情報を記憶する調査基礎データベースと、前記訴訟又は不正調査のカテゴリの入力を受け付ける調査カテゴリ入力受付部と、前記調査カテゴリ入力受付部が受け付けたカテゴリに基づいて、調査の対象とする調査カテゴリを判定し、前記調査基礎データベースから、必要な情報の種類を抽出する調査種類判定部とを備える文書分析システム。 A document that acquires digital information recorded on a plurality of computers or servers, analyzes document information comprised of a plurality of documents included in the acquired digital information, and facilitates use in lawsuits or fraud investigations A survey basic database for storing information related to the lawsuit or fraud investigation, a survey category input accepting unit for accepting an input of the category of the lawsuit or fraud investigation, and the survey category input accepting unit A document analysis system comprising: a survey category determination unit that determines a survey category to be surveyed based on a category and extracts a necessary type of information from the survey basic database.
 前記文書分析システムは、さらに、ユーザに対し、前記調査種類判定部が抽出した情報の種類を提示する表示画面を制御する表示画面制御部を備えることを特徴とする文書分析システム。 The document analysis system further includes a display screen control unit that controls a display screen for presenting a type of information extracted by the survey type determination unit to the user.
 前記文書分析システムは、さらに、前記表示画面制御部に提示された情報の種類に対応した、ユーザによるキーワードおよび/または文章の入力を受け付ける入力受付部を備えることを特徴とする文書分析システム。 The document analysis system further includes an input reception unit that receives an input of a keyword and / or a sentence by a user corresponding to the type of information presented on the display screen control unit.
 前記文書分析システムは、さらに、前記調査基礎データベースから、前記調査種類判定部が抽出した情報の種類に対応した、キーワードおよび/または文章を抽出する情報抽出部を備えることを特徴とする文書分析システム。 The document analysis system further includes an information extraction unit that extracts keywords and / or sentences corresponding to the type of information extracted by the survey type determination unit from the survey basic database. .
 前記文書分析システムは、さらに、前記キーワードおよび/または文章を、前記文書の中から検索する検索部を備えることを特徴とする文書分析システム。 The document analysis system further includes a search unit that searches the document for the keyword and / or the sentence.
 前記文書分析システムは、さらに、前記文書に対して自動で分別符号を付与する自動分別符号付与部を備え、前記キーワードおよび/または文章は、前記分別符号の付与に利用されることを特徴とする文書分析システム。 The document analysis system further includes an automatic classification code assigning unit that automatically assigns a classification code to the document, and the keyword and / or the sentence are used for assigning the classification code. Document analysis system.
 複数のコンピュータまたはサーバに記録されたデジタル情報を取得し、該取得されたデジタル情報に含まれる、複数の文書から構成される文書情報を分析し、訴訟又は不正調査への利用を容易にする文書分析方法であって、前記訴訟又は不正調査のカテゴリの入力を受け付ける調査カテゴリ入力受付ステップと、前記調査カテゴリ入力受付ステップが受け付けたカテゴリに基づいて、調査の対象とする調査カテゴリを判定し、前記訴訟又は不正調査に関連する情報を記憶する調査基礎データベースから、必要な情報の種類を抽出する調査種類判定ステップとを備える文書分析方法。 A document that acquires digital information recorded on a plurality of computers or servers, analyzes document information comprised of a plurality of documents included in the acquired digital information, and facilitates use in lawsuits or fraud investigations An analysis method comprising: a survey category input receiving step for receiving an input of a category of the lawsuit or fraud investigation; and a survey category to be investigated based on the category received by the survey category input receiving step; A document analysis method comprising: a survey type determination step for extracting a type of necessary information from a survey basic database that stores information related to litigation or fraud investigation.
 複数のコンピュータまたはサーバに記録されたデジタル情報を取得し、該取得されたデジタル情報に含まれる、複数の文書から構成される文書情報を分析し、訴訟又は不正調査への利用を容易にする文書分析プログラムであって、コンピュータに、前記訴訟又は不正調査のカテゴリの入力を受け付ける調査カテゴリ入力受付機能と、前記調査カテゴリ入力受付機能により受け付けたカテゴリに基づいて、調査の対象とする調査カテゴリを判定し、前記訴訟又は不正調査に関連する情報を記憶する調査基礎データベースから、必要な情報の種類を抽出する調査種類判定機能とを実現させるための文書分析プログラム。 A document that acquires digital information recorded on a plurality of computers or servers, analyzes document information comprised of a plurality of documents included in the acquired digital information, and facilitates use in lawsuits or fraud investigations An analysis program for determining a survey category to be surveyed based on a survey category input receiving function that accepts an input of a lawsuit or fraud investigation category in a computer and a category received by the survey category input receiving function And a document analysis program for realizing a survey type determination function that extracts a type of necessary information from a survey basic database that stores information related to the lawsuit or the fraud investigation.
   1  文書分析システム
 201  第1自動分別部
 301  第2自動分別部
 401  第3自動分別部
 501  品質検査部
 601  学習部
 701  報告作成部
 100  データ格納部
 101  デジタル情報格納領域
 103  調査基礎データベース
 104  キーワードデータベース
 105  関連用語データベース
 106  スコア算出データベース
 107  報告作成データベース
 109  データベース管理部
 116  スコア算出部
 118  文書解析部
 120  言語判定部
 122  翻訳部
 124  傾向情報生成部
 130  文書表示部
 131  分別符号受付付与部
 133  弁護士レビュー受付部
  11  文書表示画面
  20  調査カテゴリ入力受付部
  22  調査種類判定部
  24  情報抽出部
  26  分析部
  28  算出部
  30  検索部
  32  自動分別符号付与部
DESCRIPTION OF SYMBOLS 1 Document analysis system 201 1st automatic classification part 301 2nd automatic classification part 401 3rd automatic classification part 501 Quality inspection part 601 Learning part 701 Report preparation part 100 Data storage part 101 Digital information storage area 103 Basic research database 104 Keyword database 105 Related Term Database 106 Score Calculation Database 107 Report Creation Database 109 Database Management Unit 116 Score Calculation Unit 118 Document Analysis Unit 120 Language Determination Unit 122 Translation Unit 124 Trend Information Generation Unit 130 Document Display Unit 131 Sort Code Acceptance Grant Unit 133 Lawyer Review Acceptance Unit DESCRIPTION OF SYMBOLS 11 Document display screen 20 Investigation category input reception part 22 Investigation type determination part 24 Information extraction part 26 Analysis part 28 Calculation part 30 Search part 32 Automatic classification code provision part

Claims (7)

  1.  所定のコンピュータまたはサーバに記録された情報を取得し、当該取得された情報に含まれる、複数の文書から構成される文書情報を分析する文書分析システムであって、
     訴訟または不正調査の原因となる所定の行為が生じる生成過程モデルを、当該所定の行為の進展に応じて分類するフェーズごとに格納するとともに、前記訴訟または不正調査に関連する情報を、当該訴訟または不正調査が属するカテゴリおよび前記生成過程モデルごとにさらに格納し、前記フェーズの時間的な序列を示す時系列情報をさらに格納する調査基礎データベースと、
     前記訴訟または不正調査に関連する情報、前記生成過程モデル、および前記時系列情報に基づいて前記文書情報を分析し、前記所定の行為が生じる可能性を示す指標を当該分析した結果から算出する算出部とを備えたことを特徴とする文書分析システム。
    A document analysis system for acquiring information recorded in a predetermined computer or server and analyzing document information comprised of a plurality of documents included in the acquired information,
    A generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs is stored for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation is stored in the lawsuit or fraud investigation. An investigation basic database further storing for each category to which fraud investigations belong and the generation process model, and further storing time series information indicating the temporal order of the phases;
    A calculation for analyzing the document information based on the information related to the lawsuit or fraud investigation, the generation process model, and the time series information, and calculating an index indicating the possibility of the predetermined action from the analysis result Document analysis system characterized by comprising a department.
  2.  前記訴訟または不正調査のカテゴリの入力を受け付ける調査カテゴリ入力受付部と、
     前記調査カテゴリ入力受付部によって受け付けられたカテゴリに基づいて、調査の対象とする前記カテゴリを判定し、前記調査基礎データベースから、必要な情報の種類を抽出する調査種類判定部とをさらに備えたことを特徴とする請求項1に記載の文書分析システム。
    A survey category input receiving unit that receives input of the category of the lawsuit or fraud investigation;
    Based on the category received by the survey category input reception unit, the survey category determination unit for determining the category to be surveyed and extracting the necessary information type from the survey basic database is further provided. The document analysis system according to claim 1.
  3.  前記文書情報に含まれるキーワードおよび/または文章を、前記訴訟または不正調査に関連する情報として当該文書情報から抽出する情報抽出部をさらに備えたことを特徴とする請求項1または2に記載の文書分析システム。 The document according to claim 1, further comprising: an information extraction unit that extracts keywords and / or sentences included in the document information from the document information as information related to the lawsuit or fraud investigation. Analysis system.
  4.  前記キーワードおよび/または文章を、前記複数の文書の中から検索する検索部をさらに備えたことを特徴とする請求項3に記載の文書分析システム。 4. The document analysis system according to claim 3, further comprising a search unit that searches the keywords and / or sentences from the plurality of documents.
  5.  前記複数の文書のそれぞれに対して自動で分別符号を付与する自動分別符号付与部をさらに備え、
     前記キーワードおよび/または文章は、前記分別符号の付与に利用されることを特徴とする請求項3または4に記載の文書分析システム。
    An automatic classification code assigning unit that automatically assigns a classification code to each of the plurality of documents;
    5. The document analysis system according to claim 3, wherein the keyword and / or the sentence is used for assigning the classification code. 6.
  6.  所定のコンピュータまたはサーバに記録された情報を取得し、当該取得された情報に含まれる、複数の文書から構成される文書情報を分析する文書分析方法であって、
     訴訟または不正調査の原因となる所定の行為が生じる生成過程モデルを、当該所定の行為の進展に応じて分類するフェーズごとに格納するとともに、前記訴訟または不正調査に関連する情報を、当該訴訟または不正調査が属するカテゴリおよび前記生成過程モデルごとにさらに格納し、前記フェーズの時間的な序列を示す時系列情報をさらに格納する調査基礎データベースを参照することによって、前記訴訟または不正調査に関連する情報、前記生成過程モデル、および前記時系列情報に基づいて前記文書情報を分析し、前記所定の行為が生じる可能性を示す指標を当該分析した結果から算出する算出ステップを含むことを特徴とする文書分析方法。
    A document analysis method for acquiring information recorded in a predetermined computer or server and analyzing document information comprised of a plurality of documents included in the acquired information,
    A generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs is stored for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation is stored in the lawsuit or fraud investigation. Information related to the lawsuit or fraud investigation is further stored by referring to the investigation basic database further storing for each category to which the fraud investigation belongs and the generation process model, and further storing time series information indicating the temporal order of the phases. A document analyzing step of analyzing the document information based on the generation process model and the time-series information and calculating an index indicating a possibility of the predetermined action from the analysis result. Analysis method.
  7.  所定のコンピュータまたはサーバに記録された情報を取得し、当該取得された情報に含まれる、複数の文書から構成される文書情報を分析する文書分析プログラムであって、コンピュータに、
     訴訟または不正調査の原因となる所定の行為が生じる生成過程モデルを、当該所定の行為の進展に応じて分類するフェーズごとに格納するとともに、前記訴訟または不正調査に関連する情報を、当該訴訟または不正調査が属するカテゴリおよび前記生成過程モデルごとにさらに格納し、前記フェーズの時間的な序列を示す時系列情報をさらに格納する調査基礎データベースを参照することによって、前記訴訟または不正調査に関連する情報、前記生成過程モデル、および前記時系列情報に基づいて前記文書情報を分析し、前記所定の行為が生じる可能性を示す指標を当該分析した結果から算出させる算出機能を実現させることを特徴とする文書分析プログラム。
    A document analysis program for acquiring information recorded in a predetermined computer or server and analyzing document information comprised of a plurality of documents included in the acquired information.
    A generation process model in which a predetermined action causing a lawsuit or fraud investigation occurs is stored for each phase classified according to the progress of the predetermined action, and information related to the lawsuit or fraud investigation is stored in the lawsuit or fraud investigation. Information related to the lawsuit or fraud investigation is further stored by referring to the investigation basic database further storing for each category to which the fraud investigation belongs and the generation process model, and further storing time series information indicating the temporal order of the phases. The document information is analyzed based on the generation process model and the time series information, and a calculation function for calculating an index indicating the possibility of the predetermined action from the analysis result is realized. Document analysis program.
PCT/JP2014/052581 2014-02-04 2014-02-04 Document analysis system, document analysis method, and document analysis program WO2015118619A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2014/052581 WO2015118619A1 (en) 2014-02-04 2014-02-04 Document analysis system, document analysis method, and document analysis program
TW104103850A TW201539217A (en) 2014-02-04 2015-02-04 A document analysis system, document analysis method and document analysis program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/052581 WO2015118619A1 (en) 2014-02-04 2014-02-04 Document analysis system, document analysis method, and document analysis program

Publications (1)

Publication Number Publication Date
WO2015118619A1 true WO2015118619A1 (en) 2015-08-13

Family

ID=53777454

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/052581 WO2015118619A1 (en) 2014-02-04 2014-02-04 Document analysis system, document analysis method, and document analysis program

Country Status (2)

Country Link
TW (1) TW201539217A (en)
WO (1) WO2015118619A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598988A (en) * 2019-08-14 2019-12-20 中国平安财产保险股份有限公司 Statistical data processing method, device and storage medium
WO2022184034A1 (en) * 2021-03-01 2022-09-09 北京字跳网络技术有限公司 Document processing method and apparatus, device, and medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI742549B (en) * 2020-03-02 2021-10-11 如如研創股份有限公司 A report generating method and system under multi-dimension template technology

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011081491A (en) * 2009-10-05 2011-04-21 Nec Biglobe Ltd Time series analysis device, time series analysis method and program
JP2012038135A (en) * 2010-08-09 2012-02-23 Hitachi Solutions Ltd Device for determination of trend transition or method for the same
JP2013109642A (en) * 2011-11-22 2013-06-06 Nomura Research Institute Ltd Document management device
JP2013182338A (en) * 2012-02-29 2013-09-12 Ubic:Kk Document classification system and document classification method and document classification program
JP2013214152A (en) * 2012-03-30 2013-10-17 Ubic:Kk Document classification system, document classification method and document classification program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011081491A (en) * 2009-10-05 2011-04-21 Nec Biglobe Ltd Time series analysis device, time series analysis method and program
JP2012038135A (en) * 2010-08-09 2012-02-23 Hitachi Solutions Ltd Device for determination of trend transition or method for the same
JP2013109642A (en) * 2011-11-22 2013-06-06 Nomura Research Institute Ltd Document management device
JP2013182338A (en) * 2012-02-29 2013-09-12 Ubic:Kk Document classification system and document classification method and document classification program
JP2013214152A (en) * 2012-03-30 2013-10-17 Ubic:Kk Document classification system, document classification method and document classification program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GIRGENTI, RICHARD H., MANAGING THE RISK OF FRAUD AND MISCONDUCT, 13 June 2012 (2012-06-13), pages 260 - 262 , 305 to 308 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598988A (en) * 2019-08-14 2019-12-20 中国平安财产保险股份有限公司 Statistical data processing method, device and storage medium
WO2022184034A1 (en) * 2021-03-01 2022-09-09 北京字跳网络技术有限公司 Document processing method and apparatus, device, and medium

Also Published As

Publication number Publication date
TW201539217A (en) 2015-10-16

Similar Documents

Publication Publication Date Title
JP5627820B1 (en) Document analysis system, document analysis method, and document analysis program
JP5530476B2 (en) Document sorting system, document sorting method, and document sorting program
JP5596213B1 (en) Document analysis system, document analysis method, and document analysis program
JP5627750B1 (en) Document analysis system, document analysis method, and document analysis program
JP5723067B1 (en) Data analysis system, data analysis method, and data analysis program
JP5683749B1 (en) Document analysis system, document analysis method, and document analysis program
JP5622969B1 (en) Document analysis system, document analysis method, and document analysis program
WO2015118619A1 (en) Document analysis system, document analysis method, and document analysis program
JP6124936B2 (en) Data analysis system, data analysis method, and data analysis program
JP5669904B1 (en) Document search system, document search method, and document search program for providing prior information
WO2015025978A1 (en) Text classification system, text classification method, and text classification program
JP5815911B1 (en) Document analysis system, document analysis system control method, and document analysis system control program
JP5851007B2 (en) Document analysis system, document analysis method, and document analysis program
JP5829768B2 (en) E-mail analysis system, e-mail analysis method, and e-mail analysis program
JP2015056185A (en) Document analyzing system, document analysis method, and document analysis program
JP5990562B2 (en) Document search system, document search method, and document search program for providing prior information
WO2015145524A1 (en) Document analysis system, document analysis method, and document analysis program
JP5850973B2 (en) Document sorting system, document sorting method, and document sorting program
JP5745676B1 (en) Document analysis system, document analysis method, and document analysis program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14881448

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14881448

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP