CN105528421B - A kind of search dimension method for digging for query word in mass data - Google Patents

A kind of search dimension method for digging for query word in mass data Download PDF

Info

Publication number
CN105528421B
CN105528421B CN201510890422.5A CN201510890422A CN105528421B CN 105528421 B CN105528421 B CN 105528421B CN 201510890422 A CN201510890422 A CN 201510890422A CN 105528421 B CN105528421 B CN 105528421B
Authority
CN
China
Prior art keywords
list
lexical item
name
word
extracted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510890422.5A
Other languages
Chinese (zh)
Other versions
CN105528421A (en
Inventor
窦志成
文继荣
李谨秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING YILANQUNZHI DATA TECHNOLOGY Co.,Ltd.
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN201510890422.5A priority Critical patent/CN105528421B/en
Publication of CN105528421A publication Critical patent/CN105528421A/en
Application granted granted Critical
Publication of CN105528421B publication Critical patent/CN105528421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of search dimension method for digging for query word in mass data, and this method comprises the following steps:1) it is based on text, html tag and repeat region isotype, Lists is extracted from each webpage in the data set grabbed;2) increase extracting mechanism, validity expansion is carried out to the Lists being drawn into step 1) to realize;3) importance for each List that assessment extracts;4) lexical item list clusters:Similar lexical item list is merged to form an inquiry dimension;5) sequence of dimension and lexical item list is inquired:Calculate different inquiry facets, the importance of lexical item.The present invention can obtain more effective lexical item lists, after the lexical item list after being supplemented, it gives a mark to new lexical item list, similar lexical item list is merged into classification, calculate different inquiry facets, the importance of lexical item list, the final inquiry dimension for excavate is more perfect so that user can obtain more complete information.

Description

A kind of search dimension method for digging for query word in mass data
Technical field
The present invention relates to a kind of search dimension method for digging for query word in mass data.
Background technology
Currently, in research work in our prior, for the search dimension method for digging master of query word in mass data There is following four step:(1) lexical item list is extracted according to text, html tag, repeat region isotype on webpage (List);(2) it gives a mark to lexical item list, assesses the importance of lexical item list;(3) similar lexical item list is merged Form an inquiry dimension;(4) different inquiry facets, the importance of lexical item list are calculated;Said program is primarily present as follows Problem:There is no the webpage of repeat region and html tag to have very much (news data, microblogging blog articles etc.), existing method is for this A little data are simultaneously not suitable for, especially news data, and the lexical item list being drawn into can be seldom, or take out less than.
Therefore, the technical issues of how solving the above problems as those skilled in the art's urgent need to resolve.
Invention content
The problem of for background technology, the purpose of the present invention is to provide one kind for query word in mass data Search dimension method for digging, this method can obtain more effective lexical item lists, the lexical item list after being supplemented it Afterwards, it gives a mark to new lexical item list, similar lexical item list is merged into classification, calculates different inquiry facets, word The importance of item list, it is final so that the inquiry dimension excavated is more perfect so that user can obtain more complete letter Breath.
The purpose of the present invention is achieved through the following technical solutions:
A kind of search dimension method for digging for query word in mass data, described method includes following steps:
1) lexical item list is extracted:Based on text, html tag or repeat region pattern, from every in the data set grabbed Lists is extracted in one webpage;
2) increase extracting mechanism, validity expansion is carried out to the Lists being drawn into step 1) to realize;
3) lexical item list is given a mark:Assess the importance of each List extracted;
4) lexical item list clusters:Similar lexical item list is merged to form an inquiry dimension;
5) sequence of dimension and lexical item list is inquired:Calculate different inquiry facets, the importance of lexical item.
Further, the step 2) is specially:
(1) for each news search word, relevant news data K items are crawled in a search engine as data set;
(2) text therein is extracted to each document crawled;
(3) data of each document are handled, is extracted in a word, the same paragraph or the same chapters and sections Name is extracted to extract as a List, place name and be extracted as a List as a List, mechanism name;
(4) List extracted in step (3) is filtered.
Further, the extraction in the step (3) for the name, place name, mechanism name of Chinese, uses tool first Nlpir Chinese word segmentation systems segment Chinese text, and name, place name and mechanism name can be obtained after participle;For English The extraction of the name, place name, mechanism name of text identifies name, place name, mechanism using the name Entity recognition device of Stanford University Name.
Further, the step (4) is specially:
A) each webpage of the lexical item in wikipedia in the List of step (3) extraction is crawled, and is obtained every in the List " classification " property set of a lexical item;
B) " classification " property set of each lexical item in List is sought into union, obtains a big categorical attribute collection C;
C) each classification in C is traversed, for each classification, the lexical item comprising the classification in the List is put together, such as Lexical item in the fruit classification is more than three, then forms a new List, and the List by lexical item less than three gives up;
D) step c) cycles can obtain a series of Lists after terminating, and each List is classified according to one What attribute obtained;
E) it for List new each of Lists, is scored the List of extraction using idf information;
F) select a highest List of scoring as final List.
Further, the idf calculation formula in the step e) are:Idf=(N-n+0.5)/(n+0.5);Wherein, wherein N The item numbers in total for including in wikipedia, n indicate List institutes according to categorical attribute include in wikipedia Entry sum.
Further, it is to the List of the extraction calculation formula to score using idf information in the step e):Score =length*idf, wherein length indicate the length of List.
Further, the step 2) is specially:Entity word in same a word, same paragraph or same piece news is extracted Out it is used as a List;Then processing is filtered using wikipedia to the List being drawn into.
The present invention has following positive technique effect:
The method of the present invention can obtain more effective lexical item lists, right after the lexical item list after being supplemented New lexical item list is given a mark, and similar lexical item list is merged classification, calculates different inquiry facets, lexical item list Importance, it is final so that the inquiry dimension excavated is more perfect so that user can obtain more complete information.
Description of the drawings
Fig. 1 is the news data example used in the embodiment of the present invention;
Fig. 2 a are categorical attribute information of " Beijing " lexical item in wikipedia;
Fig. 2 b are categorical attribute information of " Shanghai " lexical item in wikipedia;
Fig. 2 c are categorical attribute information of " China " lexical item in wikipedia;
Fig. 3 is search term " Cheng Long " categorical attribute information in wikipedia.
Specific implementation mode
The application is further described below in conjunction with the accompanying drawings.
With the fast development of internet, the information content of internet is increasing, and user plane uses omnifarious information Family is difficult often to be quickly obtained desired information.Desired information is quickly obtained in order to facilitate user, we are to largely examining Rope information is handled, and is classified according to the inquiry dimension of information, then be presented to the user, and inquiry dimension is for describing one A series of words of some important aspect of query word, a series of this word is one group of semantic relevant lexical item arranged side by side, at this It is referred to as lexical item list (List) in invention.Such as wrist-watch, it can be by the bulk information retrieved according to brand, feature, performance, The inquiry such as model dimension is classified, and a TV play " Lost " can be according to the collection of drama in each season, performer, the angle in play Color, the dimensions such as plot are classified, query word " flower ", then can have colored use, type, the dimensions such as color to classify, table First, the example of the inquiry dimension of some query words.If can will divide according to dimension with the relevant information of query word on internet Class, then user very easily can be quickly found corresponding information according to the dimension of query word on the internet.And herein Work be exactly to excavate the inquiry dimension of query word.
During the information retrieved is classified according to dimension, the query word being presently mainly directed on network obtains To inquiry dimension, there is following four processing procedure (1), according to text, html tag, repeat region isotype, to be extracted on webpage Lexical item list (List);(2) it gives a mark to lexical item list, assesses the importance of lexical item list;(3) by similar lexical item list It merges to form an inquiry dimension;(4) different inquiry facets, the importance of lexical item list are calculated.It is extracted in the first step During lexical item list, original method is extracted in web data according to text, html tag, repeat region isotype List's, however the webpage of no repeat region and html tag has very much (news data, microblogging blog articles etc.), side originally Method is for these data and is not suitable for, especially news data.Herein by taking news data as an example, it is largely in news data Plain text information, abstracting method originally is difficult to be drawn into suitable lexical item list, and more targetedly examine herein herein The feature for considering news data is improved on the basis of original method for extracting lexical item list, is increased for news data Some extracting mechanisms effectively expand the former methodical lexical item list being drawn into.
Present invention primarily contemplates the features of news data, have mainly done the improvement of following three aspects:(1) name, Name, mechanism name:The noun of personage, place etc frequently occur in news data, and this class noun is very heavy in news data It wants, and is likely to related with the name that in short, in the same paragraph or same piece news occurs, place name, mechanism name, it can be with Original Lists is expanded as lexical item list (Lists);(2) wikipedia is filtered:For the people in problem (1) Name, place name, mechanism name are filtered processing using wikipedia, and the description inquiry dimension in the same paragraph is more suitable Lexical item as new List, inappropriate word is deleted from List;(3)entity linking:Consider news data In, entity word (entity word refers here to the lexical item that can be searched out in wikipedia) meaning in the same paragraph very may be used It can be related, it is likely that can be used for describing the same inquiry dimension, consider using the entity word in the same paragraph as one Then List utilizes the new Lists obtained after wikipedia filtration treatments.The present invention is mainly by considering three above aspect The problem of, it once tests, is drawn into after new Lists, the Lists newly obtained is beaten with original scoring method Point, then similar Lists is merged together and to form an inquiry dimension, finally calculates different inquiry facets, lexical item again Importance.
In news data, the sentence of structuring and containing the seldom of repeat region pattern, if according to structuring If sentence extracts, it can only be drawn into seldom or extract less than thing, for example, according to the data in Fig. 1, according to original extraction Mode is just extracted less than List.It is contemplated that in news data, personage, place are information critically important in news, and It frequently occurs, the name in news data is extracted and extracted as one as a List, place name by the present embodiment List, mechanism name are extracted as a List, are expanded former methodical extraction lexical item list.
Present invention primarily contemplates following three kinds of schemes:
Name in same a word is extracted and is extracted as one as a List, place name by scheme one List, mechanism name are extracted as a List.
Name in same paragraph is extracted and is extracted as one as a List, place name by scheme two List, mechanism name are extracted as a List.
Name in same piece news is extracted and is extracted as one as a List, place name by scheme three List, mechanism name are extracted as a List.
The present embodiment mainly introduces the processing method of scheme two, similar with scheme two for scheme one and scheme three.
For scheme two, appears in the information such as the name in the same paragraph, place name, mechanism name and be likely to have prodigious pass Connection.By taking figure one as an example, in first segment, " outer dragon, Zheng Yourong, rice Zorovic " while same paragraph is appeared in, " horse in second segment Ding Neisi, Gao Lin " while same paragraph is appeared in, they are football players, they are some semantic relevant words arranged side by side , it is well suited for being put into inquiry dimension, so we can out regard these very relevant information extractions as List.The present invention In, it is contemplated that putting the name of same paragraph, place name, mechanism name together as a List respectively, table one is to be added to extract The Lists being drawn into according to this section of word after name, place name, mechanism name, but only List length is more than that 3 can just retain, So the List being finally drawn into is the first two.
Specific abstracting method is as follows:
(1) for each news search word, relevant news data K items are crawled in a search engine as data set.
(2) text therein is extracted to each document crawled.
(3) each paragraph in each document is handled, extract name in each paragraph as a List, Place name, which is extracted, to be extracted as a List, mechanism name as a List.
Extraction for the name, place name, mechanism name of Chinese, we use existing tool nlpir Chinese word segmentation systems Chinese text is segmented, name, place name and the mechanism name of same paragraph can be readily available after participle.
For English, the name Entity recognition device of Stanford University can be used to identify name, place name, mechanism name.
If directly expanded lexical item list with the List obtained, there is some shortcomings, so needing by dimension Lists processing of the base encyclopaedia (wikipedia) to obtaining here.
It is somewhat coarse directly to extract the certain lists of List tables out by the above method, understands the lexical item in some List and less phase Close, be merged into the same inquiry dimension and improper, such as place name, if the same paragraph occur simultaneously " China, it is northern Capital, Shanghai, Tianjin, Chongqing ", it is evident that " Beijing, Shanghai, Tianjin, Chongqing " is four municipalities directly under the Central Government, and " China " is then one Include a country in the cities Zhe Sige, countries and cities are put into improper inside a List, they are not a ranks, such as Fruit filters out " China " from the list, this List seems more suitably go description inquiry dimension.To understand Certainly this problem, we are by the data in wikipedia, the lexical item obtained to us by extracting name, place name, mechanism name List is filtered.
Each lexical item in every List directly should if corresponding entry information can be can not find in wikipedia Lexical item is deleted from List, it may be possible to which noun extraction is wrong, and corresponding entry letter is found if can look in wikipedia Breath, we grab the entry information, then " classification " attribute in the entry information are utilized to be filtered, categorical attribute As shown in figs. 2 a-2 c.
If the classification information lap of entry is relatively more, illustrate that they are very close, such as " Beijing " and " Shanghai " The two nouns have " the provincial administrative area of the People's Republic of China " and " Chinese megalopolis " in categorical attribute, have in picture Two overlapped attributes, then the two nouns can appear in and inquire dimension described in the same list, if lap compares It is few, for example, " Beijing " and " China " two nouns do not have identical classification, then illustrate that they improper are described together an inquiry Dimension.In this application, we be exactly according to entry in wikipedia " classification " information to extract and arrive name, place name, Mechanism name is filtered, and here by taking the place name List in the same paragraph of extraction as an example, the lexical item of name and mechanism name arranges The filter method of table is also the same, and detailed process is as follows:
(1) each webpage of the lexical item in wikipedia in the List is crawled, and obtains each lexical item in the List " classification " property set.
(2) " classification " property set of each lexical item in List is sought into union, obtains a big categorical attribute collection C.
(3) each classification in C is traversed, for each classification, the lexical item comprising the classification in the List is put together, such as Lexical item in the fruit classification is more than three, then forms a new List, and List of the lexical item less than three gives up.
(4) third step cycle can obtain a series of Lists (0,1, or more) after terminating, and each List is obtained according to a categorical attribute.
(5) it for List new each of Lists, is scored the List of extraction using idf (formula 1) information, For standards of grading according to formula 2, wherein N is the item numbers in total for including in wikipedia, n indicate List institutes according to point The entry sum (this specific object for clicking classification can be obtained) that generic attribute includes in wikipedia, length is indicated The length of List is (it is intended that the List that selection is long ties up inquiry because the lexical item that long List includes is more Degree has better supplement),
(6) a highest List of scoring is selected.
Score=length*idf formula 2
The highest List of scoring of final choice is exactly to filter it to the List being drawn into according to the information in wikipedia The new List obtained afterwards.
Here illustrate to using idf to do, if the entry for including in some classification of a wikipedia is special It is more, then illustrate that the semanteme of the classification is very wide in range, the List obtained using the classification as benchmark is likely to uncorrelated, is not well suited for One inquiry dimension of description is not suitable for the such List of selection and supplements original Lists, such as search term " Cheng Long ", Categorical attribute in wikipedia as shown in figure 3, for affiliated classification " alive personage ", enter it can be seen that belonging to this minute by point The entry that class includes has 62,205.Therefore the lexical item that the List generated on the basis of " alive personage " includes is likely to not phase Close, it is intended that it is relatively low according to such obtained scoring of List of classifying, so we constrained using idf it is such List。
It can be obtained to deleting directly extraction name, place name, mechanism name using the filtered List of wikipedia Incoherent lexical item in Lists so that the lexical item in finally obtained List is more related side by side, and the List that can make is more Effectively supplemented.
It is obtained supplementing original method with the Lists that extraction name, place name, mechanism name are drawn into filtered same paragraph After the Lists obtained, it is contemplated that the entity in news data in the same paragraph there is a possibility that contact is also very big, very may be used Can be related, because the noun in news data is likely to the meaning for having special, if the relatively high entity of correlation can be had It adds in Lists to effect, the excavation to inquiring dimension has better expansion.Another method of the application is to use Wikipedia miner find out entity word as initial List, and then above-described filter method filtering, finally obtains new List is expanded.
The application mainly considers following three kinds of schemes:
Scheme one extracts the entity word in same a word as a List.
Scheme two extracts the entity word in same paragraph as a List.
Scheme three extracts the entity word in same piece news as a List.
For scheme two, we find out all entities in text in each paragraph using wikipedia miner (entity), the noun lexical item but in the same paragraph has very much, it is likely that some are incoherent, in order to what is ensured The correlation of List ensures that they are relatively suitble to one inquiry dimension of description, we utilize the List that each paragraph is drawn into Wikipedia is filtered processing, the List obtained after filtering is added in the Lists that original method obtains.For side Case one and scheme three, method are similar with scheme two.
In conclusion the application can obtain more effective lexical item lists, after the lexical item list after being supplemented, We give a mark to new lexical item list, and similar lexical item list is then merged classification, calculate different inquiry point Face, lexical item list importance, it is final so that the inquiry dimension excavated is more perfect so that user can obtain more complete Information.
It is described above simply to illustrate that of the invention, it is understood that the present invention is not limited to the above embodiments, meets The various variants of inventive concept are within protection scope of the present invention.

Claims (6)

1. a kind of search dimension method for digging for query word in mass data, which is characterized in that the method includes as follows Step:
1)Lexical item list is extracted:Based on text, html tag and repeat region pattern, from each in the data set grabbed Lists is extracted in webpage;
2)Increase extracting mechanism, to realize to step 1)In the Lists that is drawn into carry out validity expansion;
(1)For each news search word, relevant news data K items are crawled in a search engine as data set;
(2)Text therein is extracted to each document crawled;
(3)The data of each document are handled, using same a word as the standard or same paragraph work of an extraction List Standard of the standard or same chapter for extracting List for one as an extraction List;Will in short, the same paragraph or Name in the same chapter is extracted to extract as a List, place name and be extracted as a List, mechanism name It is used as a List;
(4)To step(3)In the List that extracts be filtered;
3)Lexical item list is given a mark:Assess the importance of each List extracted;
4)Lexical item list clusters:Similar lexical item list is merged to form an inquiry dimension;
5)Inquire the sequence of dimension and lexical item list:Calculate different inquiry facets, the importance of lexical item.
2. the search dimension method for digging according to claim 1 for query word in mass data, which is characterized in that institute State step(3)In for Chinese name, place name, mechanism name extraction, first use tool nlpir Chinese word segmentation system centerings Text is segmented, and name, place name and mechanism name therein can be obtained after participle;For the name of English, place name, machine The extraction of structure name identifies name, place name, mechanism name using the name Entity recognition device of Stanford University.
3. the search dimension method for digging according to claim 1 for query word in mass data, which is characterized in that institute State step(4)Specially:
a)Crawl step(3)Each webpage of the lexical item in wikipedia in the List of extraction, and obtain each word in the List " classification " property set of item;
b)" classification " property set of each lexical item in List is sought into union, obtains a big categorical attribute collection C;
c)Each classification in C is traversed, for each classification, the lexical item comprising the classification in the List is put together, if should Lexical item in classification is more than three, then forms a new List, and the List by lexical item less than three gives up;
d)Step c)Cycle can obtain a series of Lists after terminating, and each List is according to a categorical attribute It obtains;
e)For the new List of each of Lists, scored the List of extraction using idf information;
f)Select a highest List of scoring as final List.
4. the search dimension method for digging according to claim 3 for query word in mass data, which is characterized in that institute State step e)In idf calculation formula be:idf=(N-n+0.5)/(n+0.5);Wherein, to be in wikipedia include wherein N Item numbers in total, n indicate List institutes according to categorical attribute include in wikipedia entry sum.
5. the search dimension method for digging according to claim 3 for query word in mass data, which is characterized in that institute State step e)It is middle to be to the List of the extraction calculation formula to score using idf information:Score=length*idf, wherein Length indicates the length of List.
6. the search dimension method for digging according to claim 1 for query word in mass data, which is characterized in that institute State step 2)Specially:Entity word in same a word, same paragraph or same piece news is extracted as a List; Then processing is filtered using wikipedia to the List being drawn into.
CN201510890422.5A 2015-12-07 2015-12-07 A kind of search dimension method for digging for query word in mass data Active CN105528421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510890422.5A CN105528421B (en) 2015-12-07 2015-12-07 A kind of search dimension method for digging for query word in mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510890422.5A CN105528421B (en) 2015-12-07 2015-12-07 A kind of search dimension method for digging for query word in mass data

Publications (2)

Publication Number Publication Date
CN105528421A CN105528421A (en) 2016-04-27
CN105528421B true CN105528421B (en) 2018-09-04

Family

ID=55770644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510890422.5A Active CN105528421B (en) 2015-12-07 2015-12-07 A kind of search dimension method for digging for query word in mass data

Country Status (1)

Country Link
CN (1) CN105528421B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354799B (en) * 2016-08-26 2020-01-14 河海大学 Subject data set multilayer facet filtering method and system based on data quality
CN109241296A (en) * 2018-09-14 2019-01-18 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN109815495B (en) * 2019-01-16 2020-06-05 西安交通大学 Method for performing topic facet mining through label propagation algorithm
CN110163688A (en) * 2019-05-30 2019-08-23 复旦大学 Commodity network public sentiment detection system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440256A (en) * 2013-07-26 2013-12-11 中国科学院深圳先进技术研究院 Method and device for automatically generating Chinese text label cloud
CN104731768A (en) * 2015-03-05 2015-06-24 西安交通大学城市学院 Incident location extraction method oriented to Chinese news texts

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9280587B2 (en) * 2013-03-15 2016-03-08 Xerox Corporation Mailbox search engine using query multi-modal expansion and community-based smoothing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440256A (en) * 2013-07-26 2013-12-11 中国科学院深圳先进技术研究院 Method and device for automatically generating Chinese text label cloud
CN104731768A (en) * 2015-03-05 2015-06-24 西安交通大学城市学院 Incident location extraction method oriented to Chinese news texts

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Facet-based opinion retrieval from blogs;Olga Vechtomova;《Information Processing and Management》;20090717;第71-88页,第3.2节 *
Finding dimensions for queries;Zhicheng Dou 等;《Acm International Conference on Information & Knowledge Management ACM》;20111028;第1311-1320页,第3.1节至3.2节 *
Summarization and Expansion of Search Facets;Aparna Nurani Venkitasubramanian 等;《Ceur Workshop Proceedings》;20130426;第1-2页 *
结合相关规则和本体加权图的查询扩展;郝志峰 等;《计算机应用研究》;20140418;第31卷(第10期);第3028-3032页 *

Also Published As

Publication number Publication date
CN105528421A (en) 2016-04-27

Similar Documents

Publication Publication Date Title
CN105528421B (en) A kind of search dimension method for digging for query word in mass data
JP2004289848A5 (en)
CN101727498A (en) Automatic extraction method of web page information based on WEB structure
Tao et al. Nearest keyword search in xml documents
CN102054029A (en) Figure information disambiguation treatment method based on social network and name context
CN102262618A (en) Method and device for identifying page information
CN107748745B (en) Enterprise name keyword extraction method
CN106777261A (en) Data query method and device based on multi-source heterogeneous data set
CN104317867B (en) The system that entity cluster is carried out to the Web page picture that search engine returns
Afzaal et al. A novel framework for aspect-based opinion classification for tourist places
CN106294358A (en) The search method of a kind of information and system
CN104346382B (en) Use the text analysis system and method for language inquiry
CN103514289A (en) Method and device for building interest entity base
Bhardwaj et al. A novel approach for content extraction from web pages
CN101369275A (en) Product attribute excavation method of non-structured text
CN104462439B (en) The recognition methods of event and device
CN105095203A (en) Methods for determining and searching synonym, and server
CN106951511A (en) A kind of Text Clustering Method and device
JP2010123038A (en) Related web page detecting device, related web page detecting method, and related web page detecting program
CN107369066A (en) A kind of feature between comment object compares method and device
CN110321403A (en) A kind of lyrics generation method
CN106970922A (en) Index establishing method, search method and directory system based on multi-field keyword
Win et al. Web page segmentation and informative content extraction for effective information retrieval
JP6843588B2 (en) Document retrieval method and equipment
Choi et al. Consento: a new framework for opinion based entity search and summarization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200717

Address after: Room 2510, 25 / F, building 1, yard 1, Danling street, Haidian District, Beijing 100600

Patentee after: BEIJING YILANQUNZHI DATA TECHNOLOGY Co.,Ltd.

Address before: 100872 No. 59, Zhongguancun Avenue, Haidian District, Beijing

Patentee before: Renmin University of China