KR101627550B1

KR101627550B1 - The intelligent disclosure of public records management system based machine learning

Info

Publication number: KR101627550B1
Application number: KR1020150086510A
Authority: KR
Inventors: 방재현
Original assignee: 주식회사 스토리안트
Priority date: 2015-06-18
Filing date: 2015-06-18
Publication date: 2016-06-07

Abstract

The present invention relates to an intelligent record disclosure management system based on machine learning. More specifically, the intelligent record disclosure management system based on machine learning uses at least one of basic learning data including a record disclosure list with completed disclosure distinction, detailed reference information of non-disclosure target information of each public institution, and electronic file content information obtained from a record generation management system to generate disclosure distinction pattern data. The intelligent record disclosure management system analyzes and compares the target data, which is to be disclosed and distinguished, with the disclosure distinction pattern data to automatically make a recommendation to disclose the data, partially disclose the data, or keep the data private.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0002] The present invention relates to an intelligent record-

The present invention relates to a machine learning-based intelligent document disclosure management system, and more particularly, to a machine learning-based intelligent document disclosure management system, File disclosure information, and file content information to generate public classification pattern data, compares the object data to be public classified with the public classification pattern data, and automatically recommends one of public, partial public, and private Based intelligent document disclosure management system.

There are tens of thousands and hundreds of millions of documents on the Internet, and the amount of documents is increasing exponentially due to the activation of blogs and mini homepages. These documents have a lot of information, and search and analysis systems are used in various ways to access the information contained in the documents.

Most of the retrieval and analysis systems for accessing the information in the document increase the accessibility by dividing the documents into categories.

For example, in the case of a portal search system that provides Internet news, documents are classified by categories such as politics, society, economy, and entertainment, thereby improving accessibility to documents. Initially, the classification of these documents was done directly by the person.

However, as the amount of information increases, there is a growing need for a document classifier capable of automatically classifying a large number of documents.

In particular, an administrative agency related to the present invention generates a large amount of administrative documents and stores the generated administrative documents in a document storage location.

In order to efficiently store a large amount of administrative documents, attribute information of each administrative document is stored and managed on the system.

At this time, a part of the metadata of the administrative document, for example, the disclosure division for the public service, is made by the manager's judgment.

If an administrator handles a large amount of administrative documents, errors may arise in establishing a public division for public service.

For example, when setting up an administrative document that should be kept private according to the closed subcriteria of the administrative agency, confusion and inefficiency of the administrative process may arise.

Accordingly, there is a need for a method for effectively managing the disclosure of administrative documents.

To be more specific, public records disclosure refers to the provision of information or records by a public agency to view, copy, or distribute information or records in accordance with the Records Management Act or the Information Disclosure Act.

The records produced and held by the public authorities are disclosed in principle, but where there is a mixture of public and non-public information, public information should be disclosed except for the information to be disclosed.

In addition, public archives should periodically reclassify the non-public records they hold and disclose them if the non-public cause disappears.

As such, we intend to actively disclose the records produced and held by public institutions. However, public agencies are required to disclose publicly, publicly, and privately in consideration of the nature of the institution's business within the scope of Article 9 (1) Should be set.

However, manual processing of mass-produced and managed record information is limited and requires a large amount of budget and resources.

Therefore, by using the machine learning technology, it is possible to automatically classify the public classification through comparison analysis with the public classification target data based on the existing public classified list, detailed reference information of the private object information, and electronic file contents information as the learning data The system can be done.

Korean Patent Publication No. 10-2012-0059935 (2012.06.11)

A first object of the present invention to solve the above problems is to provide a method and apparatus for using a machine learning technique to utilize existing existing classified list, detailed reference information of private object information, electronic file contents information as learning data, And to automatically recommend the public classification through data and comparative analysis.

The second object of the present invention is to extract the legal provision information to be matched from the legal provision information database of the institution when the recommended result is partial disclosure or non-disclosure, and to provide information of reason for partial disclosure or privatization.

SUMMARY OF THE INVENTION [0006] The present invention provides a solution for achieving the above object.

That is, the basic learning data including the public record list in which the public classification has been completed is obtained from the record public list database 400, and the detailed reference information of the private information is obtained from the private information information criteria database 500 for each public institution And a learning data database 130 for generating public classification pattern data using at least one or more pieces of electronic file content information obtained from the record production management system and storing the public classification pattern data, , The target data to be publicly classified is obtained from the request terminal 300 and is compared with the open classification pattern data to automatically recommend one of open, partial disclosure, and non-public as a requesting terminal, and automatically added to the learning data database A public classification management means (100) for updating processing;

A public record list database 400 storing basic learning data including a public record list in which publicity classification has been completed;

An unspecified information detail criteria database 500 for each public institution that stores detailed reference information of private information to be disclosed by each public institution;

By providing the subject data to be publicly classified by the public category management means and including the request terminal 300 for obtaining any one of the public presence information provided by the public category management means, Thereby solving the problems of the invention.

The present invention has the following effects.

By using the machine learning technology, it is possible to automatically recommend the public classification through the comparative analysis with the public classification object data based on the existing public classified list, the detailed criteria of the private object information, and the electronic file contents information as the learning data It can dramatically improve the quality of public service, and outsourcing can greatly reduce the budget for manual open reclassification business.

In addition, if the recommendation result is partially public or non-public, the provisional information of the legal statutory information to be matched is extracted from the institutional legal statutory information database, and the reason for disclosure can be presented by providing the reason disclosure information about the partial disclosure or confidentiality.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an overall configuration diagram of a machine learning-based intelligent document disclosure management system according to an embodiment of the present invention; FIG.
FIG. 2 is a block diagram of an open classification management means of a machine learning-based intelligent document disclosure management system according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating disclosure classification pattern data of the intelligent document disclosure management system based on a machine learning according to an embodiment of the present invention, and FIG. 4 is a conceptual diagram of a learning layer.

The machine learning-based intelligent document disclosure management system according to an embodiment of the present invention includes:

The disclosure classification pattern data may be generated using at least one or more of the basic learning data including the publicized list of the publicly classified records, the detailed criteria information of the private information of each public institution, and the electronic file content information obtained from the record production management system And a public segmentation management means (100) for automatically comparing the analyzed data with the public segmentation pattern data, and automatically recommending one of public, partial, and non-public data.

The machine learning-based intelligent document disclosure management system according to another embodiment includes:

Acquires the basic learning data including the public disclosure list in which the public classification has been completed from the public record list database 400 and acquires the detailed reference information of the private information to be disclosed to each public institution from the private information information criteria database 500 for each public institution And generates the open classification pattern data using at least one of the electronic file content information obtained from the record production management system 200 and includes the learning data database 130 storing the open classification pattern data And acquires the object data to be publicly classified from the request terminal 300 and compares the data with the open classification pattern data to automatically recommend one of open, partial disclosure, and non-public as a requesting terminal, and at the same time, A public classification management means (100) for performing an additional update process to the database;

A public record list database 400 storing basic learning data including a public record classification list in which the public classification is completed;

And a request terminal 300 for providing the object data to be publicly classified by the public category management means and obtaining any one of the public presence information, the partial publicity information, and the publicly recommended private information provided by the public category management means .

The system of the present invention is further characterized in that the system further comprises weight setting means for extracting the public division pattern data stored in the learning data directory, giving the weight to the public division pattern data, and providing the weight to the public division management means 100. [

The open classification management means 100 includes an electronic file content information acquisition unit 110 for acquiring electronic file content information in association with the record production management system 200, The basic learning data including the public classified classification list is obtained,

(500), and generates public classification pattern data for generating public classification pattern data using at least one or more pieces of electronic file contents information obtained from the recording material production management system (120), a target data to be publicly distinguished from the request terminal (300), compares the data with the open classification pattern data stored in the learning data database, and automatically recommends one of open, partial disclosure, And an automatic additional learning progress unit 150 for acquiring the recommended information and automatically updating the recommended learning information to the learning data database.

In this case, the open classification pattern data of the present invention includes at least one of a unit job name, a process name, a recorded article title, a recommended disclosure classification, a storage period, and a non-disclosure cause.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

It should be understood, however, that the scope of the present invention is not limited to these embodiments, and all of the technical ideas that fall within the scope of the present invention are within the scope of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an overall configuration diagram of a machine learning-based intelligent document disclosure management system according to an embodiment of the present invention; FIG.

1, the machine learning-based intelligent document disclosure management system according to the present invention includes a public classification management unit 100, a document production management system 200, a request terminal 300, a document disclosure list database 400, And a private information database 500 for each public institution.

Public records disclosure refers to the provision of information or records by a public agency to view or copy, reproduce, or provide information or information through the information communication network in accordance with the Records Management Act and the Information Disclosure Act.

The public classification management unit 100 acquires the basic learning data including the public record list in which the public classification has been completed from the records disclosure list database 400 and acquires the basic learning data from the private information detail criteria database 500 for each public institution Acquires sub-criteria information of the non-disclosure subject information, and generates the publicity classification pattern data using at least one of the electronic file contents information obtained from the record production management system (200).

In other words, the system of the present invention generates public classification pattern data using at least one of basic learning data including a list of public records, detailed reference information of private information to be public institutions, and electronic file contents information .

At this time, the generated classification pattern data is updated and stored in the learning data database 130.

Accordingly, if the subject data to be publicly classified is obtained from the request terminal 300, the system of the present invention compares the public classification pattern data with the public classified pattern data and automatically recommends one of public, partial public, and private to the requesting terminal, To the learning data database.

In the record public list database 400, basic learning data including a public record classification list in which the public classification has been completed is stored. The private information detail criteria database 500 for each public institution stores detailed information The reference information is stored.

In addition, the request terminal 300 provides the subject data to be publicly classified as the public category management means, and obtains any one of the public presence information, the partial publicity, and the publicly recommended private information provided by the public category management means.

FIG. 2 is a block diagram of an open classification management means of a machine learning-based intelligent document disclosure management system according to an embodiment of the present invention.

2, the open classification management unit 100 includes an electronic file content information acquisition unit 110, a public classification pattern data generation unit 120, a learning data database 130, a public classification recommendation unit 140 ), And an automatic additional learning progress unit 150.

The electronic file content information acquisition unit 110 is configured to acquire electronic file content information in association with the recorded material production management system 200.

That is, the records management system currently used by the public institutions manages the electronic approval system and the work management system, and the system is applied from the production and management stage of the records.

The public classification pattern data generation unit 120 acquires the basic learning data including the public record list of the publicly classified public records list database 400 from the air museum private information detail criteria database 500, Acquires the detailed reference information of the non-disclosure target information for each organization, and generates the public classification pattern data using the electronic file content information obtained from the recorded material production management system (200).

In addition, the open classification recommendation unit 140 acquires the object data to be publicly classified from the request terminal 300, compares and analyzes the publicly-classified pattern data stored in the learning data database with the obtained object data to be publicly classified, , Partial disclosure, or non-disclosure is recommended to the requesting terminal.

At this time, the automatic additional learning progress unit 150 acquires the recommended information and automatically updates the learning data database.

That is, the information of the existing document disclosure list, the details of the non-disclosure target information, the contents of the electronic file (for example, draft document, attachment file, etc.) are automatically generated as a public distinguishing pattern, and the generated pattern is learned and applied .

On the other hand, the disclosure classification management means 100 of the present invention includes an information divice for each institution, which stores information on statutory provision related to information disclosure by a public institution, and a statutory information provision division, A legal term extraction unit for extracting legal term information to be matched from the legal term information and a reason display processing unit for obtaining the extracted legal term information to generate the reason display information for partial disclosure or non-disclosure.

In other words, the statutory information provision division of the institution stores information on information provision related to information disclosure by the public institution, and when the recommendation result through the legal section extraction department is partial disclosure or non-disclosure, .

At this time, as shown in FIG. 4, the reason display processing unit acquires the extracted legal provision information and generates the reason display information about the partial disclosure or the non-disclosure.

In the case of FIG. 4, a public classification data field is included among the registration list items, and is displayed in at least one of public, partial, and non-public.

In addition, the open classification management means 100 may further comprise a learning period setting unit for setting the information gathering period of the open classification pattern data generation unit 120. That is, the learning period is automatically set by setting the Gathering period.

As shown in FIG. 3, the public classification pattern data generation unit 120 performs a primary learning by setting a public record list of a publicly classified classification to a primary learning layer, Is set as a secondary learning layer to perform secondary learning, and electronic file content information is set as a tertiary learning layer to sequentially perform tertiary learning to generate public classification pattern data.

In particular, as shown in Fig. 4, the open classification pattern data is characterized by including at least any one of a unit job name, a process name, a recorded article title, a recommended open classification, a storage period, and a private cause .

The item of the data field for disclosure classification is preferably 38, and the learning data to be compared includes the existing disclosure classification completion list, the detailed criteria of the non-disclosure target information, and the electronic record content information.

The learning data described in the system of the present invention is generally processed in natural language. Here, natural language processing means to mechanically analyze the language phenomena that human utterances are made into a form that the computer can understand.

For example, natural language processing can be done through morphological analysis, part-of-speech, phrase unit analysis, and syntactic analysis.

Further, the system of the present invention is operated on a machine learning basis, which means that machine learning is generally performed on learning data processed in a natural language.

That is, the learning data processed in the natural language is generalized or trained.

For example, a series of learning data may be machine-learned to learn public classification information and private reason information according to the public classification and title information of the administrative document.

To this end, the system of the present invention can constitute a machine learning unit, and its operation principle is a general technique, and thus a detailed description thereof will be omitted.

In addition, the system of the present invention generates and stores pattern data (sample data) as a result of learning by the machine learning unit. At this time, the sample data may include a result of processing natural data of the learning data.

Then, in the case of acquiring the data to be publicly classified, it is subjected to natural language processing and then compared and analyzed.

Meanwhile, the system of the present invention compares the matching rate between the natural language processing result of the new data and the sample data. For example, the matching rate between the processing of the new data, the processing of the information, the title information and the sample data, Can be compared.

It can be judged that the matching rate is higher as the patterns of the new data and the sampled data are similar.

Meanwhile, the system of the present invention may further include weight setting means for extracting the public division pattern data stored in the learning data directory, giving the weight to the public division pattern data, and providing the weight to the public division management means 100. [

The weight can be calculated on the basis of a function giving a general weight or can be obtained based on the degree of proximity to a specific word. In addition, the weighted object may be an existing document disclosure list which is a learning data area, a non-disclosure target information criterion according to a public institution, and an electronic file content.

The purpose of weighting words is to express relative value as an index word according to their importance as a subject element of concepts handled by a document.

That is, by assigning a value (weight value) within a certain range to each term representing each concept, even if the same index is used, it indicates that the degree of importance differs according to each document.

As a result, through the above-described configuration and operation, the existing classified list, the detailed criteria of the non-disclosure target information, and the electronic file content information are used as the learning data using the machine learning technology, It is possible to improve the quality of the public service by using the effect of automatically recommending the public classification, and outsourcing can significantly reduce the budget for manual open reclassification business.

Meanwhile, the method according to various embodiments of the present invention may be stored in a computer-readable recording medium. The computer-readable recording medium may be a ROM, a RAM, CDROMs, magnetic tapes, floppy disks, optical data storage devices, and the like, as well as carrier waves (e.g., transmission over the Internet).

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It should be understood that various modifications may be made by those skilled in the art without departing from the spirit and scope of the present invention.

100: Open classification management means
200: Record production management system
300: request terminal
400: Archives List
500: Details subject to private information subject to public institution

Claims

A machine learning based intelligent document disclosure management system,
Acquires the basic learning data including the public disclosure list in which the publicity classification has been completed from the public record list database 400 and obtains the detailed reference information of the private information targeted for each public institution from the private information information criteria database 500 for each public institution And generates the open classification pattern data using at least one of the electronic file content information obtained from the record production management system 200 and includes the learning data database 130 storing the open classification pattern data And acquires the object data to be publicly classified from the request terminal 300 and compares the data with the open classification pattern data to automatically recommend one of open, partial disclosure, and non-public as a requesting terminal, and at the same time, A public classification management means (100) for performing an additional update process to the database;
A public record list database 400 storing basic learning data including a public record list in which publicity classification has been completed;
An unspecified information detail criteria database 500 for each public institution that stores detailed reference information of private information to be disclosed by each public institution;
And a request terminal 300 for providing the object data to be publicly classified by the public category management means and obtaining any one of the public presence information, the partial publicity information, and the publicly recommended private information provided by the public category management means Intelligent documentary disclosure management system based on machine learning.

delete

The method according to claim 1,
Further comprising weight setting means for extracting the public division pattern data stored in the learning data directory and providing the weight to the public division management means (100).

In the first aspect,
The open classification management means (100)
An electronic file content information acquisition unit 110 for acquiring electronic file content information in association with the record production management system 200,
Acquires the basic learning data including the public disclosure list in which the publicity classification has been completed from the public record list database 400 and obtains the detailed reference information of the private information targeted for each public institution from the private information information criteria database 500 for each public institution An open classification pattern data generation unit 120 for generating open classification pattern data using at least one of the electronic file content information obtained from the record production management system,
A public classification recommendation unit for acquiring the object data to be publicly classified from the request terminal 300 and comparing the analyzed data with the public classification pattern data stored in the learning data directory and automatically recommending any one of public, 140,
And an automatic additional learning progress unit (150) for automatically acquiring the recommended information and additionally updating the learning data to the learning data database.

The method according to claim 1,
Wherein the open classification pattern data includes at least one data field of at least one of a unit business name, a process name, a recorded object title, a recommended disclosure classification, a storage period, and a non-disclosure cause.