CN102681992A - Method and system for data hierarchy - Google Patents

Method and system for data hierarchy Download PDF

Info

Publication number
CN102681992A
CN102681992A CN2011100537198A CN201110053719A CN102681992A CN 102681992 A CN102681992 A CN 102681992A CN 2011100537198 A CN2011100537198 A CN 2011100537198A CN 201110053719 A CN201110053719 A CN 201110053719A CN 102681992 A CN102681992 A CN 102681992A
Authority
CN
China
Prior art keywords
data
question
intellectual
answer
answer data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100537198A
Other languages
Chinese (zh)
Inventor
薛晔伟
杨月奎
高晓娜
李晓艳
焦峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN2011100537198A priority Critical patent/CN102681992A/en
Publication of CN102681992A publication Critical patent/CN102681992A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention is applicable to the field of data hierarchy and provides a method and a system for data hierarchy. The method includes following steps: acquiring feature information of question answering data; transmitting the acquired feature information of the question answering data to a preset classifier; judging whether the question answering data belong to knowledge data or non-knowledge data according to a pre-trained data model in the classifier; and finally outputting judgment results of the question answering data. The question answering data are classified into the knowledge data and the non-knowledge data, so that in actual searching, knowledge answering data and non-knowledge answering data can be indicated clearly, and users' judgment on the confidence level of searched results can be effectively facilitated.

Description

A kind of data hierarchy method and system
Technical field
The invention belongs to the data hierarchy field, relate in particular to a kind of data hierarchy method and system.
Background technology
Question and answer community is that similar searching the user such as asked, Baidu is known and participated in puing question to and answering, and the internet product that user and data organization is got up according to this question and answer relation.
The method of judging for the total quality of question and answer data at present, i.e. the method for layering all is based on simple rule, for example, the length of answer text, user's credit worthiness, non-Chinese symbol accounting example etc.These class methods had not both comprehensively been weighed the confidence level (being whether answer data is the also intellectual data of right and wrong of intellectual data) for the corresponding answer data of puing question to, and " high-quality " of answer data were not provided clearly definition yet.Therefore; In actual search is used; The degree of accuracy of the search result data that obtains is very poor; No matter be intellectual answer data also the intellectual answer data of right and wrong all clearly do not show, generally can only do targetedly according to general knowledge and filter through the user, think intellectual answer data thereby obtain user oneself.
Summary of the invention
The present invention provides a kind of data hierarchy method and system; Be intended to solve the not comprehensive confidence level of weighing for the corresponding answer data of puing question to that exists in the prior art; Cause the degree of accuracy of search result data relatively poor; No matter be intellectual answer data also the intellectual answer data of right and wrong all clearly do not show, need the user to judge the problem of the accuracy of Search Results voluntarily.
The present invention is achieved in that a kind of data hierarchy method, said method comprising the steps of:
Obtain the characteristic information of question and answer data;
The characteristic information of the question and answer data of obtaining is transferred in the preset sorter;
The data model good according to training in advance in the sorter judges that said question and answer data belong to the also intellectual data of right and wrong of intellectual data;
Export the judged result of said question and answer data.
Another object of the present invention is to provide a kind of data hierarchy system, said system comprises:
The characteristic information acquisition module is used to obtain the characteristic information of question and answer data;
Transport module is used for the characteristic information of the question and answer data of obtaining is transferred to preset sorter;
Judge module is used for the data model good according to the sorter training in advance, judges that said question and answer data belong to the also intellectual data of right and wrong of intellectual data;
Output module is used to export the judged result of said question and answer data.
In the present invention; Through the question and answer data being carried out the layering of intellectual and non-intellectual; Make in actual search is used, can the answer data of intellectual and the answer data of non-intellectual clearly be shown, can effectively help the user to judge the confidence level of Search Results.
Description of drawings
Fig. 1 is the realization flow synoptic diagram of the data hierarchy method that provides of first embodiment of the invention.
Fig. 2 is the realization flow synoptic diagram of the data hierarchy method that provides of second embodiment of the invention.
Fig. 3 is the structural representation of the data hierarchy system that provides of the embodiment of the invention.
Embodiment
In order to make the object of the invention, technical scheme and beneficial effect clearer,, the present invention is further elaborated below in conjunction with accompanying drawing and embodiment.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
In embodiments of the present invention; Through the question and answer data being carried out the layering of intellectual and non-intellectual; To solve the not comprehensive confidence level of weighing for the corresponding answer data of puing question to that exists in the prior art, cause in actual search is used, the degree of accuracy of search result data is very poor; No matter be intellectual answer data also the intellectual answer data person of right and wrong clearly do not show, need the user to judge the problem of the accuracy of Search Results voluntarily.
See also Fig. 1, the realization flow of the data hierarchy method that provides for first embodiment of the invention, it may further comprise the steps:
Step S101: the characteristic information that obtains the question and answer data;
Step S102: the characteristic information of the question and answer data of obtaining is transferred in the preset sorter;
Step S103: the data model good according to training in advance in the sorter, judge that said question and answer data belong to the also intellectual data of right and wrong of intellectual data;
Step S104: the judged result of exporting said question and answer data.
See also Fig. 2, the realization flow of the data hierarchy method that provides for second embodiment of the invention, it may further comprise the steps:
Step S201: the characteristic of definition question and answer data;
Step S202: said characteristic and intellectual data and non-intellectual data are set up the data model of associated in correspondence, generate sorter;
Step S203: the characteristic information that obtains the question and answer data;
Step S204: the characteristic information of the question and answer data of obtaining is transferred in the preset sorter;
Step S205: the data model good according to training in advance in the sorter, judge that said question and answer data belong to the also intellectual data of right and wrong of intellectual data;
Step S206: the judged result of exporting said question and answer data.
In order to let the user more know the confidence level degree of the question and answer data in the Search Results, as one embodiment of the present invention, said method is further comprising the steps of:
According to judged result, the question and answer data are identified, identify said question and answer data and belong to the also intellectual data of right and wrong of intellectual data.
For let the user more quick and easy check question and answer data with a high credibility, as another preferred embodiment of the present invention, said method is further comprising the steps of:
When search question and answer data, search engine is arranged in the Search Results front end according to the identification information of question and answer data with the question and answer high priority data that is designated intellectual data.
Describe question and answer data knowledge mark rule below in detail, and the characteristic of definition question and answer data.
What standard to carry out layering by for the question and answer data, the embodiment of the invention has provided a definition actually: whether the question and answer data are " intellectual data ".The standard of " intellectual data " can be complete the quality (being the confidence level degree) of portrayal question and answer data." intellectual data " refer to information useful and can be not out-of-date.On the mass data of Knowledge Community accumulation, whether be " intellectual data " according to question and answer data of following standard determination, be not limited to following question and answer data,
Question and answer data knowledge mark rule
Figure BDA0000049099680000051
In addition, the embodiment of the invention characteristic that designed a whole set of question and answer data is described complete question and answer data.These characteristics have contained the related nearly all information of question and answer data, comprising: aspects such as question and answer content, participant, action behavior, feedback information.Concrete characteristic is provided with as follows:
Figure BDA0000049099680000052
Figure BDA0000049099680000061
Figure BDA0000049099680000081
See also Fig. 3, the data hierarchy system for the embodiment of the invention provides for the ease of explanation, only shows the part relevant with the embodiment of the invention.
Said data hierarchy system comprises: characteristic information acquisition module 102, transport module 104, judge module 106 and output module 108.
Characteristic information acquisition module 102 is used to obtain the characteristic information of question and answer data.
Transport module 104 is used for the characteristic information of the question and answer data of obtaining is transferred to preset sorter.
Judge module 106 is used for the data model good according to the sorter training in advance, judges that said question and answer data belong to the also intellectual data of right and wrong of intellectual data.
Output module 108 is used to export the judged result of said question and answer data.
As one embodiment of the present invention, said system also comprises: identification module.
Said identification module is used for according to judged result, and the question and answer data are identified, and identifies said question and answer data and belongs to the also intellectual data of right and wrong of intellectual data.
As another preferred embodiment of the present invention, said system also comprises: search engine.
Search engine is used for when search question and answer data, according to the identification information of question and answer data, the question and answer high priority data that is designated intellectual data being arranged in the Search Results front end.
In embodiments of the present invention, said system also comprises: module and generation module are set up in definition module, association.
Definition module is used to define the characteristic of question and answer data.
Generation module is used for said characteristic and intellectual data and non-intellectual data are set up the data model of associated in correspondence, generates sorter.
In sum; The embodiment of the invention is through carrying out the layering of intellectual and non-intellectual to the question and answer data; Make in actual search is used, can the answer data of intellectual and the answer data of non-intellectual clearly be shown, can effectively help the user to judge the confidence level of Search Results.The embodiment of the invention can also be arranged in Search Results foremost with the question and answer high priority data of intellectual, thus make the user more quick and easy check question and answer data with a high credibility.
One of ordinary skill in the art will appreciate that all or part of step that realizes in the foregoing description method is to instruct relevant hardware to accomplish through program; Described program can be in being stored in a computer read/write memory medium; Described storage medium is like ROM/RAM, disk, CD etc.
The above is merely preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of within spirit of the present invention and principle, being done, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. a data hierarchy method is characterized in that, said method comprising the steps of:
Obtain the characteristic information of question and answer data;
The characteristic information of the question and answer data of obtaining is transferred in the preset sorter;
The data model good according to training in advance in the sorter judges that said question and answer data belong to the also intellectual data of right and wrong of intellectual data;
Export the judged result of said question and answer data.
2. the method for claim 1 is characterized in that, and is before the step of the said characteristic information that obtains the question and answer data, further comprising the steps of:
The characteristic of definition question and answer data;
Said characteristic and intellectual data and non-intellectual data are set up the data model of associated in correspondence, generate sorter.
3. the method for claim 1 is characterized in that, said method is further comprising the steps of:
According to judged result, the question and answer data are identified, identify said question and answer data and belong to the also intellectual data of right and wrong of intellectual data.
4. method as claimed in claim 3 is characterized in that, said method is further comprising the steps of:
When search question and answer data, search engine is arranged in the Search Results front end according to the identification information of question and answer data with the question and answer high priority data that is designated intellectual data.
5. a data hierarchy system is characterized in that, said system comprises:
The characteristic information acquisition module is used to obtain the characteristic information of question and answer data;
Transport module is used for the characteristic information of the question and answer data of obtaining is transferred to preset sorter;
Judge module is used for the data model good according to the sorter training in advance, judges that said question and answer data belong to the also intellectual data of right and wrong of intellectual data;
Output module is used to export the judged result of said question and answer data.
6. system as claimed in claim 5 is characterized in that, said system also comprises:
Identification module is used for according to judged result, and the question and answer data are identified, and identifies said question and answer data and belongs to the also intellectual data of right and wrong of intellectual data.
7. system as claimed in claim 6 is characterized in that, said system also comprises:
Search engine is used for when search question and answer data, according to the identification information of question and answer data, the question and answer high priority data that is designated intellectual data being arranged in the Search Results front end.
8. system as claimed in claim 5 is characterized in that, said system also comprises:
Definition module is used to define the characteristic of question and answer data;
Generation module is used for said characteristic and intellectual data and non-intellectual data are set up the data model of associated in correspondence, generates sorter.
CN2011100537198A 2011-03-07 2011-03-07 Method and system for data hierarchy Pending CN102681992A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100537198A CN102681992A (en) 2011-03-07 2011-03-07 Method and system for data hierarchy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100537198A CN102681992A (en) 2011-03-07 2011-03-07 Method and system for data hierarchy

Publications (1)

Publication Number Publication Date
CN102681992A true CN102681992A (en) 2012-09-19

Family

ID=46813944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100537198A Pending CN102681992A (en) 2011-03-07 2011-03-07 Method and system for data hierarchy

Country Status (1)

Country Link
CN (1) CN102681992A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844530A (en) * 2016-12-29 2017-06-13 北京奇虎科技有限公司 Training method and device of a kind of question and answer to disaggregated model
CN109309652A (en) * 2017-07-28 2019-02-05 阿里巴巴集团控股有限公司 A kind of method and device of training pattern

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1794233A (en) * 2005-12-28 2006-06-28 刘文印 Network user interactive asking answering method and its system
CN101118554A (en) * 2007-09-14 2008-02-06 中兴通讯股份有限公司 Intelligent interactive request-answering system and processing method thereof
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
CN101520802A (en) * 2009-04-13 2009-09-02 腾讯科技(深圳)有限公司 Question-answer pair quality evaluation method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1794233A (en) * 2005-12-28 2006-06-28 刘文印 Network user interactive asking answering method and its system
CN101118554A (en) * 2007-09-14 2008-02-06 中兴通讯股份有限公司 Intelligent interactive request-answering system and processing method thereof
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
CN101520802A (en) * 2009-04-13 2009-09-02 腾讯科技(深圳)有限公司 Question-answer pair quality evaluation method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844530A (en) * 2016-12-29 2017-06-13 北京奇虎科技有限公司 Training method and device of a kind of question and answer to disaggregated model
CN109309652A (en) * 2017-07-28 2019-02-05 阿里巴巴集团控股有限公司 A kind of method and device of training pattern
US10867071B2 (en) 2017-07-28 2020-12-15 Advanced New Technologies Co., Ltd. Data security enhancement by model training
US10929558B2 (en) 2017-07-28 2021-02-23 Advanced New Technologies Co., Ltd. Data secruity enhancement by model training

Similar Documents

Publication Publication Date Title
EP2570974B1 (en) Automatic crowd sourcing for machine learning in information extraction
CN103914494B (en) Method and system for identifying identity of microblog user
CN102567304B (en) Filtering method and device for network malicious information
CN106202028B (en) A kind of address information recognition methods and device
CN107704512A (en) Financial product based on social data recommends method, electronic installation and medium
CN102289459A (en) Automatically generating training data
CN103336766A (en) Short text garbage identification and modeling method and device
CN102033880A (en) Marking method and device based on structured data acquisition
CN102999625A (en) Method for realizing semantic extension on retrieval request
Ilina et al. Social event detection on twitter
CN104317784A (en) Cross-platform user identification method and cross-platform user identification system
CN103577989A (en) Method and system for information classification based on product identification
CN102567534B (en) Interactive product user generated content intercepting system and intercepting method for the same
CN104899335A (en) Method for performing sentiment classification on network public sentiment of information
WO2010096986A1 (en) Mobile search method and device
CN103631946A (en) Content pushing system based on geographic positions
CN110287314A (en) Long text credibility evaluation method and system based on Unsupervised clustering
CN115186654A (en) Method for generating document abstract
CN102521713B (en) Data processing equipment and data processing method
CN111488453B (en) Resource grading method, device, equipment and storage medium
CN102063497A (en) Open type knowledge sharing platform and entry processing method thereof
CN102939602A (en) Semantically ranking content in a website
CN103177084A (en) Data mining method considering data reliability
CN104572613A (en) Data processing device, data processing method and program
CN110020196A (en) A kind of customer analysis method and apparatus and calculating equipment based on different data sources

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131018

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518044 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TA01 Transfer of patent application right

Effective date of registration: 20131018

Address after: A Tencent Building in Shenzhen Nanshan District City, Guangdong streets in Guangdong province science and technology 518057 16

Applicant after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Applicant before: Tencent Technology (Shenzhen) Co., Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120919