KR101772199B1

KR101772199B1 - System for knowledge verification based on crowdsourcing

Info

Publication number: KR101772199B1
Application number: KR1020150152389A
Authority: KR
Inventors: 윤대일
Original assignee: (주)유미테크
Priority date: 2015-10-30
Filing date: 2015-10-30
Publication date: 2017-09-01
Also published as: KR20170050617A

Abstract

The present invention relates to a crowd sourcing-based knowledge verification system capable of determining the reliability of knowledge data based on crowd sourcing and building a knowledge database accordingly.
Specifically, the knowledge data is extracted through an uncertain knowledge database in which uncertain knowledge data collected through a knowledge collection unit is stored and an answer knowledge database in which validated correct knowledge data is stored, and an evaluation set is generated to generate an evaluation set in which the knowledge data is combined part; An evaluation set transmission unit for transmitting the evaluation set to an evaluation set providing server; A completion set collection unit for collecting a completion set from the evaluation set providing server; A completion set determiner for determining reliability of the completion set; And a data determination unit that transmits the completion set to the correct answer knowledge database when the reliability of the completion set transmitted from the completion set verification unit satisfies a set reference value.
Thus, the present invention provides a system whereby the participant can evaluate the evaluation set via the evaluation set provision server and determine the evaluated completion set to store the verified data in the correct knowledge database.

Description

{SYSTEM FOR KNOWLEDGE VERIFICATION BASED ON CROWDSOURCING}

The present invention relates to a crowd sourcing-based knowledge verification system capable of determining the reliability of knowledge data based on crowd sourcing and building a knowledge database accordingly.

Specifically, the knowledge data is extracted through an uncertain knowledge database in which uncertain knowledge data collected through a knowledge collection unit is stored and an answer knowledge database in which validated correct knowledge data is stored, and an evaluation set is generated to generate an evaluation set in which the knowledge data is combined part; An evaluation set transmission unit for transmitting the evaluation set to an evaluation set providing server; A completion set collection unit for collecting a completion set from the evaluation set providing server; A completion set determiner for determining reliability of the completion set; And a data determination unit that transmits the completion set to the correct answer knowledge database when the reliability of the completion set transmitted from the completion set verification unit satisfies a set reference value.

Thus, the present invention provides a system whereby the participant can evaluate the evaluation set through the evaluation set provision server and determine the evaluated completion set to store the verified data in the correct knowledge database.

Generally, a knowledge service creates knowledge data through natural language processing of various documents and information existing on the Internet, and builds a knowledge database by storing such knowledge data as a knowledge database. Based on this knowledge database, the user can receive the necessary knowledge service.

Recently, knowledge of various fields has been explosively increased, and the amount of knowledge data accumulated in a knowledge database is increasing. In particular, as users who use knowledge services seek various knowledge and desire to share non-professional knowledge such as their own experience, wisdom, know-how, etc., Knowledge data is generated, and websites based on the generated knowledge data are attracting attention.

However, since it is difficult to judge the reliability of the generated knowledge data, the user can not confirm whether or not the knowledge is correct. Therefore, in order to solve such a problem, Japanese Patent Application Laid-Open No. 10-2010-0066642 discloses a knowledge information providing system and method (hereinafter referred to as " prior art 1 ") by question and answer, .

Prior Art 1 allows the question and answer to be automatically registered on the bulletin board, and the question is automatically sent to the corresponding category expert of the question through e-mail, SMS, IM, and the answer of the question is automatically sent to the questioner via mail, SMS, , And suggests a knowledge information providing system and its method by question and answer which leads to professional and practical answers and especially excellent question and answer database into knowledge information.

The above technique can judge the accuracy and reliability of the knowledge (question) through the expert, but it is difficult for the expert to judge a vast amount of knowledge, and even if the expert sends the wrong answer to the questioner, Therefore, there is a concern that the system is data-ized because it is satisfied with the wrong answer.

Therefore, it is necessary to incorporate crowd sourcing so that a vast amount of knowledge can be judged.

Crowd sourcing is a combination of crowd and outsourcing, which means engaging the public in some of the activities of the company, such as production and services. Through crowd sourcing, many users can judge the reliability of knowledge data. The reliability of the knowledge data can be verified.

In the meantime, as described above, the reliability is determined through the participation of the user and the determined knowledge data can be accumulated in the knowledge database. In the registration patent publication No. 10-0756382, Prior art 2 ').

In the prior art 2, recording and maintaining user-created content in a database, inputting a modification request for the user-created content from a user or an operator, modifying the user-created content in response to the modification request, Creating corrected user-created content in the database; and providing the modified user-generated content according to the user's modification request to a user who has entered the modification request, and in response to the modification request of the operator, And providing the content to all the users.

However, since the prior art 2 is revised by the confirmation and judgment of the other person, the producer who made the content corrects it again, and the corrected user-created content can not also determine the accuracy and reliability. I can not solve it.

Therefore, it is necessary to develop a system that can grasp the accuracy and reliability of the knowledge generated by the user, and can verify the knowledge generated in the system without expert judgment.

Furthermore, it is required to develop a technology that can verify knowledge through crowd sourcing and store the proven knowledge in a knowledge database.

Patent Document 1: Japanese Patent Application Laid-Open No. 10-2010-0066642 (June 18, 2010) Patent Registration No. 10-0756382 (September 10, 2007)

It is an object of the present invention to provide a crowd sourcing-based knowledge verification system that can verify and verify the reliability of unverified knowledge data.

Another object of the present invention is to provide a crowd sourcing-based knowledge verification system capable of quickly determining a vast amount of knowledge data by using crowd sourcing and constructing it as a knowledge database.

According to an aspect of the present invention, there is provided a crowd sourcing-based knowledge verification system,

A non-deterministic knowledge database storing uncertain knowledge data collected through a knowledge collection unit; A correct answer knowledge database stored with verified correct answer knowledge data; An evaluation set generation unit for generating an evaluation set in which the undetermined knowledge data and the correct answer knowledge data are combined; An evaluation set transmission unit for transmitting the evaluation set to an evaluation set providing server; A completion set collection unit for collecting a completion set from the evaluation set providing server; A completion set determiner for determining reliability of the completion set; And a data determination unit that transmits the completion set to the correct answer knowledge database when the reliability of the completion set transmitted from the completion set verification unit satisfies a set reference value.

The crowd sourcing-based knowledge verification system according to the present invention has a remarkable effect of judging the reliability of the unverified knowledge data and verifying the reliability thereof.

Further, the present invention has a remarkable effect of rapidly performing knowledge data determination and knowledge database construction using crow sourcing.

Further, the present invention can remarkably determine the reliability of the knowledge data as compared with the expert group, and has a remarkable effect with high accuracy in the reliability calculation.

1 shows a configuration of a crowd sourcing-based knowledge verification system according to the present invention.
Figure 2 shows knowledge data according to an exemplary embodiment of the present invention.
Figure 3 shows an evaluation set and a completion set in accordance with an exemplary embodiment of the present invention.
4 shows a first embodiment of the evaluation set verifying section of the present invention.
5 shows a second embodiment of the evaluation set verifying section of the present invention.
FIG. 6 illustrates an example of the degree of difficulty of the difficulty determination module according to an exemplary embodiment of the present invention.

The terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary meanings and the inventor can properly define the concept of the term to describe its invention in the best possible way And should be construed in accordance with the principles and meanings and concepts consistent with the technical idea of the present invention.

Therefore, the embodiments described in the present specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention and are not intended to represent all of the technical ideas of the present invention. Therefore, various equivalents And variations are possible.

Before describing the present invention with reference to the accompanying drawings, it should be noted that the present invention is not described or specifically described with respect to a known configuration that can be easily added by a person skilled in the art, Let the sound be revealed.

Such a crowd sourcing-based knowledge verification system is illustrated in FIGS. 1 to 6 of the accompanying drawings.

Figure 1 illustrates the configuration of a crowd sourcing-based knowledge validation system of the present invention, Figure 2 illustrates knowledge data in accordance with an exemplary embodiment of the present invention, Figure 3 depicts an exemplary set of evaluation sets, Figure 4 shows a first embodiment of the evaluation set verifier of the present invention, Figure 5 shows a second embodiment of the evaluation set verifier of the present invention, Figure 6 shows an example of the present invention An example of determining the degree of difficulty of the difficulty level determination module according to the embodiment is shown.

The knowledge database is divided into an uncertain knowledge database (10) and a correct knowledge database (20).

The undecided knowledge database 10 stores undecided knowledge data 111 that has been stored in the knowledge database but has not been verified (see FIG. 2 (a)).

The undecided knowledge database 10 collects knowledge data through the knowledge collection unit 11 and the knowledge collection unit 11 is connected to a web site or the like using a wiki system so that unrecognized undecorated knowledge data 111 ).

A wiki is a website that allows users to easily add, edit, and delete content through a simple markup language using a web browser on a device with Internet access.

The correct answer knowledge database 20 is a knowledge database contrasted with the uncertain knowledge database 10. The uncertain knowledge database 10 stores uncertain knowledge data 111 whose accuracy and reliability are not verified, Is the place where the correct knowledge data 112 that has already been verified and confirmed for accuracy and reliability is stored.

As shown in FIG. 2 (b) of the accompanying drawings, the correct answer knowledge data 112 stores knowledge data having the highest reliability and accuracy as in the case of the invariant law or the historically verified fact. Further, knowledge data for which correction work and update are not expected are stored later.

The evaluation set providing server 30 serves to provide a plurality of participants with the evaluation set 113 to be described later, and collects the evaluated set 121 evaluated through the plurality of participants and transmits the completed set 121 to the system.

The crowd sourcing-based knowledge verification system 100 of the present invention includes an evaluation set generation unit 110, an evaluation set transmission unit 120, a completion set collection unit 130, a completion set determination unit 140 and a data determination unit 150, .

The evaluation set generation unit 110 extracts N number of uncertain knowledge data 111 determined by the manager from the uncertain knowledge database 10 and extracts from the correct answer knowledge database 20 M correct answer knowledge data And generates an evaluation set 113 in which the N uncertain knowledge data 111 and the M answer knowledge data 112 are combined. The generated evaluation set 113 is transmitted to the evaluation set transmission unit 120. At this time, the uncorrected knowledge data 111 and the correct answer knowledge data 112 to be extracted may be arbitrarily selected.

The evaluation set transmission unit 120 transmits the evaluation set 113 generated by the evaluation set generation unit 110 to the evaluation set providing server 30 so that the participants can evaluate the evaluation set 113. [ 3 (a) of the accompanying drawings, the evaluation set 113 includes (M + N) pieces of knowledge data and includes a selection window 113a for judging respective pieces of knowledge data, It is possible to determine whether (M + N) pieces of knowledge data are true or false through the selection window 113a.

At this time, the selection window 113a of FIG. 3 is represented by two circles (○, ●), and only one of them can be selected, and the circle filled with the inner space (●) Express what you have received.

When all (M + N) pieces of knowledge data have been evaluated, the evaluation set 113 evaluated by the plurality of participants in the evaluation set providing server 30 is evaluated and stored in the complete set 131. This completion set 131 is sent to the completion set collection unit 130 and collected.

3B, the completion set 131 determines whether (M + N) pieces of knowledge data have been judged to be true or false by the participant through the selection result display field 131a Can be confirmed.

The completion set 131 as described above is collected in the completion set collection unit 130, and is moved to the completion set determination unit 140 to determine the reliability.

The completion set determiner 140 is responsible for determining the reliability of the completion set 131. The completion set determination unit 140 includes a completion set separation module 141, a first reliability determination module 142, and a second reliability determination module 143 (see FIG. 4).

The completion set separation module 141 separates (M + N) pieces of knowledge data included in the completion set 141 into first data and second data.

The first data is knowledge data extracted from the correct answer knowledge database 20 and has M correct answer knowledge data 112. [

The second data is knowledge data extracted from the uncertainty knowledge database 10 and has N uncertain knowledge data 111.

This is because the reliability of the first data can be determined through the correct knowledge data 112 having reached the maximum reliability by dividing the completion set 131 into the first data and the second data.

The first reliability determination module 142 is used to determine the reliability of the first data, which uses the result evaluated as true or false by the participant.

Since the first data is the correct answer knowledge data 112, if the participant evaluates it as 'true', it processes it as a correct answer. If the participant evaluates it as 'false', it treats it as an incorrect answer.

Accordingly, the reliability of the first data can be evaluated through CA and IA of the first data, and a method of evaluating the reliability is as follows.

In this case, M means the number of first data, CA (Correct Answer) means the number of knowledge data processed as correct answer, and IA (Incorrect Answer) means the number of knowledge data processed as wrong answer.

Therefore, the reliability of the first data is as follows.

Here, S denotes the reliability of the first data.

The second reliability determination module 143 determines the reliability of the second data, and applies the reliability value of the first data obtained through the first reliability evaluation module 142 to the second data.

This is because it is possible to estimate through the first data evaluated by the participant how much the participant has knowledge verification ability and to estimate the reliability through Equation (2) so that the second data is also equal to the reliability of the first data .

Therefore, the reliability value of the second data, that is, the uncertain knowledge data 111, can be determined through the second reliability determination module 143.

At this time, the same undecided knowledge data 111 can be evaluated several times, and a plurality of reliability generated by a plurality of participants are all added, and the average value of the total union knowledge data 111 is set as the reliability for the uncorrected knowledge data 11.

However, even if the reliability of the first data calculated through Equation (2) is estimated to be the same as the reliability of the second data, there is a concern that the reliability of the first data and the second data may be determined through the differences therebetween.

For example, the evaluation set generation unit 110 combines the M correct answer knowledge data 112 having low difficulty and the N uncertain knowledge data 111, which is generally unknown and has a high degree of difficulty, The M correct answer knowledge data 112 having a low degree of difficulty exhibits a high correcting rate and thus has high reliability. On the other hand, the N uncertain knowledge data 111 having a high degree of difficulty have a high probability of having a high error rate, May be low.

Accordingly, the completion set determiner 140 may determine the difficulty level of the first data instead of determining the reliability of the second data through the first and second reliability determination modules 132 and 133 using Equation (2) A difficulty determination module 144 and a third reliability determination module 135 that can determine the reliability of the second data according to the determined difficulty level (refer to FIG. 5).

The degree-of-difficulty determination module 144 determines the degree of difficulty of the first data, and determines that the degree of difficulty is lower as the CA value obtained by correcting or correcting the first degree of difficulty is higher. In this case, the difficulty level (W) is greater than or equal to 0 and less than or equal to 1, and the closer to 0, the easier the knowledge is. At this time, the degree of difficulty of the i-th knowledge data arbitrarily selected from the first data is

.

At this time, the sum of the degree of difficulty of the first data is equal to the sum of the degree of difficulty of the corrected first data and the sum of the degree of difficulty of the erroneously processed first data. Therefore, the reliability value for the second data is as follows.

Here, S denotes the reliability value of the second data,

Represents the sum of difficulty levels of the first data processed correctly, Represents the degree of difficulty of the first data.

Therefore, an accurate reliability value can be calculated by evaluating the reliability in consideration of the degree of difficulty.

However, consideration should be given to cases in which the number of evaluations of arbitrarily selected i-th knowledge is smaller than Z and large cases. At this time, Z can be arbitrarily set by the manager as " the number of times of selection as a criterion for ensuring validity of the uncertain knowledge data ". In other words, the reliability is less accurate as the knowledge data is less frequently evaluated by the participant.

Therefore, if the number of evaluations (hereinafter referred to as C) for the i-th knowledge is less than Z, it is impossible to evaluate the degree of difficulty of knowledge, and the intermediate value of 0.5 is determined as the initial difficulty value. In this case, the difficulty values to be judged are expressed by Equations (4) and (5).

At this time, W represents the degree of difficulty, A represents the number of times judged to be true for the specific knowledge, and B represents the number of times judged to be false for the specific knowledge.

The data determination unit 150 transmits the second data to the correct knowledge database 20 when the reliability of the second data transmitted from the completion set verification unit 140 satisfies a predetermined reference value.

Advantageously, the correct answer knowledge data 112 and the correct answer database 20 can be augmented through this process.

The reference value includes a first reference value based on the reliability and a second reference value based on the number of participants participating in the evaluation, and is a basis for determining whether the second data can be stored in the correct knowledge database 20 .

The first and second reference values may be set in advance by the administrator, and accordingly, the first reference value is a reference reliability value determined by the manager, and the second reference value is the number of evaluation participants determined by the administrator.

At this time, the reliability value of the second data moving to the correct answer knowledge database 20 has a range of '(first reference value) < (reliability value of second data) < = 1' Can be moved to the correct answer knowledge database 20 only when the number is greater than the second reference value.

On the other hand, if the reference value is not satisfied, it is transmitted to the evaluation set generation unit 110 again and is generated as a new evaluation set 113. This allows the second data to be judged by many participants.

According to the present invention, the reliability of the unverified knowledge data can be determined and verified through the above-described configuration and the embodiments, and it is possible to quickly perform the knowledge data determination and knowledge database construction using the crowd sourcing do.

1 to 6 describe only the main points of the present invention, and the present invention is not limited to the configurations of Figs. 1 to 6, as various designs can be made within the technical scope thereof. It is self-evident.

10: Undecided knowledge database 11: Knowledge collection unit
20: Correct answer database 30: Evaluation set providing server
100: Crowd sourcing-based knowledge verification system
110: Evaluation set generation unit 111: Undecided knowledge data
112: Correct answer knowledge data 113: Evaluation set
113a: Selection window 120: Evaluation set transmission unit
130: Completed set collecting part 131: Completed set
131a: Selection result display column 140: Completion set determination unit
141: Completion set separation module 142: First reliability determination module
143: second reliability determination module 144: difficulty determination module
145: third reliability determination module 150:

Claims

An evaluation set generation unit for extracting knowledge data through an uncertain knowledge database in which uncertain knowledge data collected through a knowledge collection unit is stored and an answer knowledge database in which verified correct answer knowledge data is stored and generating an evaluation set in which the knowledge data is combined;
An evaluation set transmission unit for transmitting the evaluation set to an evaluation set providing server;
A completion set collection unit for collecting a completion set from the evaluation set providing server;
A completion set determiner for determining reliability of the completion set; And
And transmits the completion set to the correct answer knowledge database when the reliability of the completion set transmitted from the completion set determination unit satisfies the set reference value and transmits the completion set to the evaluation set transmission unit if the reliability is not satisfied &Lt; / RTI >
The evaluation set providing server includes:
Providing the evaluation set to the participants, sending the completed set evaluated from the participants to the completed set collector,
Wherein the completion set determination unit includes:
And a completion set separation module that separates the complete set into first data and second data,
Wherein the first data is extracted from the correct answer knowledge database,
Wherein the second data is extracted from the uncertain knowledge database,
Wherein the completion set determination unit includes:
A first reliability determination module that determines reliability of the first data;
A difficulty level determination module for determining a difficulty level of the first data; And
And a third reliability determination module that determines reliability of the second data using the difficulty of the first data,
The reference value,
As a criterion for judging whether or not the second data can be stored in the correct knowledge database 20,
A first reference value based on the reliability; And a second reference value based on the number of participants who evaluated the evaluation set,
The data determination unit determines that the reliability of the second data among the completed sets transmitted to the correct answer knowledge database 20 has a range of '(first reference value) <(reliability of second data) 1, And transmits to the correct answer knowledge database (20) only when the number of participating participants is larger than the second reference value.
(The mathematical formulas used in the above system are as follows.

(Where M is the number of first data, CA is the number of knowledge data processed as the correct answer, and IA is the number of knowledge data processed as an incorrect answer)

(S is the reliability of the first data)

(S is the reliability of the second data,

The sum of difficulty levels of the correct first data,

Is the sum of the degree of difficulty of the first data)

(Where W is the difficulty level, A is the number of times judged to be true for the specific knowledge, B is the number of times judged to be false for the specific knowledge, Z is the number of selection which is the criterion for ensuring validity for the undecided knowledge data, &Lt; / RTI >

delete