CN105528421B - A kind of search dimension method for digging for query word in mass data - Google Patents
A kind of search dimension method for digging for query word in mass data Download PDFInfo
- Publication number
- CN105528421B CN105528421B CN201510890422.5A CN201510890422A CN105528421B CN 105528421 B CN105528421 B CN 105528421B CN 201510890422 A CN201510890422 A CN 201510890422A CN 105528421 B CN105528421 B CN 105528421B
- Authority
- CN
- China
- Prior art keywords
- list
- lexical item
- name
- word
- extracted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of search dimension method for digging for query word in mass data, and this method comprises the following steps:1) it is based on text, html tag and repeat region isotype, Lists is extracted from each webpage in the data set grabbed;2) increase extracting mechanism, validity expansion is carried out to the Lists being drawn into step 1) to realize;3) importance for each List that assessment extracts;4) lexical item list clusters:Similar lexical item list is merged to form an inquiry dimension;5) sequence of dimension and lexical item list is inquired:Calculate different inquiry facets, the importance of lexical item.The present invention can obtain more effective lexical item lists, after the lexical item list after being supplemented, it gives a mark to new lexical item list, similar lexical item list is merged into classification, calculate different inquiry facets, the importance of lexical item list, the final inquiry dimension for excavate is more perfect so that user can obtain more complete information.
Description
Technical field
The present invention relates to a kind of search dimension method for digging for query word in mass data.
Background technology
Currently, in research work in our prior, for the search dimension method for digging master of query word in mass data
There is following four step:(1) lexical item list is extracted according to text, html tag, repeat region isotype on webpage
(List);(2) it gives a mark to lexical item list, assesses the importance of lexical item list;(3) similar lexical item list is merged
Form an inquiry dimension;(4) different inquiry facets, the importance of lexical item list are calculated;Said program is primarily present as follows
Problem:There is no the webpage of repeat region and html tag to have very much (news data, microblogging blog articles etc.), existing method is for this
A little data are simultaneously not suitable for, especially news data, and the lexical item list being drawn into can be seldom, or take out less than.
Therefore, the technical issues of how solving the above problems as those skilled in the art's urgent need to resolve.
Invention content
The problem of for background technology, the purpose of the present invention is to provide one kind for query word in mass data
Search dimension method for digging, this method can obtain more effective lexical item lists, the lexical item list after being supplemented it
Afterwards, it gives a mark to new lexical item list, similar lexical item list is merged into classification, calculates different inquiry facets, word
The importance of item list, it is final so that the inquiry dimension excavated is more perfect so that user can obtain more complete letter
Breath.
The purpose of the present invention is achieved through the following technical solutions:
A kind of search dimension method for digging for query word in mass data, described method includes following steps:
1) lexical item list is extracted:Based on text, html tag or repeat region pattern, from every in the data set grabbed
Lists is extracted in one webpage;
2) increase extracting mechanism, validity expansion is carried out to the Lists being drawn into step 1) to realize;
3) lexical item list is given a mark:Assess the importance of each List extracted;
4) lexical item list clusters:Similar lexical item list is merged to form an inquiry dimension;
5) sequence of dimension and lexical item list is inquired:Calculate different inquiry facets, the importance of lexical item.
Further, the step 2) is specially:
(1) for each news search word, relevant news data K items are crawled in a search engine as data set;
(2) text therein is extracted to each document crawled;
(3) data of each document are handled, is extracted in a word, the same paragraph or the same chapters and sections
Name is extracted to extract as a List, place name and be extracted as a List as a List, mechanism name;
(4) List extracted in step (3) is filtered.
Further, the extraction in the step (3) for the name, place name, mechanism name of Chinese, uses tool first
Nlpir Chinese word segmentation systems segment Chinese text, and name, place name and mechanism name can be obtained after participle;For English
The extraction of the name, place name, mechanism name of text identifies name, place name, mechanism using the name Entity recognition device of Stanford University
Name.
Further, the step (4) is specially:
A) each webpage of the lexical item in wikipedia in the List of step (3) extraction is crawled, and is obtained every in the List
" classification " property set of a lexical item;
B) " classification " property set of each lexical item in List is sought into union, obtains a big categorical attribute collection C;
C) each classification in C is traversed, for each classification, the lexical item comprising the classification in the List is put together, such as
Lexical item in the fruit classification is more than three, then forms a new List, and the List by lexical item less than three gives up;
D) step c) cycles can obtain a series of Lists after terminating, and each List is classified according to one
What attribute obtained;
E) it for List new each of Lists, is scored the List of extraction using idf information;
F) select a highest List of scoring as final List.
Further, the idf calculation formula in the step e) are:Idf=(N-n+0.5)/(n+0.5);Wherein, wherein N
The item numbers in total for including in wikipedia, n indicate List institutes according to categorical attribute include in wikipedia
Entry sum.
Further, it is to the List of the extraction calculation formula to score using idf information in the step e):Score
=length*idf, wherein length indicate the length of List.
Further, the step 2) is specially:Entity word in same a word, same paragraph or same piece news is extracted
Out it is used as a List;Then processing is filtered using wikipedia to the List being drawn into.
The present invention has following positive technique effect:
The method of the present invention can obtain more effective lexical item lists, right after the lexical item list after being supplemented
New lexical item list is given a mark, and similar lexical item list is merged classification, calculates different inquiry facets, lexical item list
Importance, it is final so that the inquiry dimension excavated is more perfect so that user can obtain more complete information.
Description of the drawings
Fig. 1 is the news data example used in the embodiment of the present invention;
Fig. 2 a are categorical attribute information of " Beijing " lexical item in wikipedia;
Fig. 2 b are categorical attribute information of " Shanghai " lexical item in wikipedia;
Fig. 2 c are categorical attribute information of " China " lexical item in wikipedia;
Fig. 3 is search term " Cheng Long " categorical attribute information in wikipedia.
Specific implementation mode
The application is further described below in conjunction with the accompanying drawings.
With the fast development of internet, the information content of internet is increasing, and user plane uses omnifarious information
Family is difficult often to be quickly obtained desired information.Desired information is quickly obtained in order to facilitate user, we are to largely examining
Rope information is handled, and is classified according to the inquiry dimension of information, then be presented to the user, and inquiry dimension is for describing one
A series of words of some important aspect of query word, a series of this word is one group of semantic relevant lexical item arranged side by side, at this
It is referred to as lexical item list (List) in invention.Such as wrist-watch, it can be by the bulk information retrieved according to brand, feature, performance,
The inquiry such as model dimension is classified, and a TV play " Lost " can be according to the collection of drama in each season, performer, the angle in play
Color, the dimensions such as plot are classified, query word " flower ", then can have colored use, type, the dimensions such as color to classify, table
First, the example of the inquiry dimension of some query words.If can will divide according to dimension with the relevant information of query word on internet
Class, then user very easily can be quickly found corresponding information according to the dimension of query word on the internet.And herein
Work be exactly to excavate the inquiry dimension of query word.
During the information retrieved is classified according to dimension, the query word being presently mainly directed on network obtains
To inquiry dimension, there is following four processing procedure (1), according to text, html tag, repeat region isotype, to be extracted on webpage
Lexical item list (List);(2) it gives a mark to lexical item list, assesses the importance of lexical item list;(3) by similar lexical item list
It merges to form an inquiry dimension;(4) different inquiry facets, the importance of lexical item list are calculated.It is extracted in the first step
During lexical item list, original method is extracted in web data according to text, html tag, repeat region isotype
List's, however the webpage of no repeat region and html tag has very much (news data, microblogging blog articles etc.), side originally
Method is for these data and is not suitable for, especially news data.Herein by taking news data as an example, it is largely in news data
Plain text information, abstracting method originally is difficult to be drawn into suitable lexical item list, and more targetedly examine herein herein
The feature for considering news data is improved on the basis of original method for extracting lexical item list, is increased for news data
Some extracting mechanisms effectively expand the former methodical lexical item list being drawn into.
Present invention primarily contemplates the features of news data, have mainly done the improvement of following three aspects:(1) name,
Name, mechanism name:The noun of personage, place etc frequently occur in news data, and this class noun is very heavy in news data
It wants, and is likely to related with the name that in short, in the same paragraph or same piece news occurs, place name, mechanism name, it can be with
Original Lists is expanded as lexical item list (Lists);(2) wikipedia is filtered:For the people in problem (1)
Name, place name, mechanism name are filtered processing using wikipedia, and the description inquiry dimension in the same paragraph is more suitable
Lexical item as new List, inappropriate word is deleted from List;(3)entity linking:Consider news data
In, entity word (entity word refers here to the lexical item that can be searched out in wikipedia) meaning in the same paragraph very may be used
It can be related, it is likely that can be used for describing the same inquiry dimension, consider using the entity word in the same paragraph as one
Then List utilizes the new Lists obtained after wikipedia filtration treatments.The present invention is mainly by considering three above aspect
The problem of, it once tests, is drawn into after new Lists, the Lists newly obtained is beaten with original scoring method
Point, then similar Lists is merged together and to form an inquiry dimension, finally calculates different inquiry facets, lexical item again
Importance.
In news data, the sentence of structuring and containing the seldom of repeat region pattern, if according to structuring
If sentence extracts, it can only be drawn into seldom or extract less than thing, for example, according to the data in Fig. 1, according to original extraction
Mode is just extracted less than List.It is contemplated that in news data, personage, place are information critically important in news, and
It frequently occurs, the name in news data is extracted and extracted as one as a List, place name by the present embodiment
List, mechanism name are extracted as a List, are expanded former methodical extraction lexical item list.
Present invention primarily contemplates following three kinds of schemes:
Name in same a word is extracted and is extracted as one as a List, place name by scheme one
List, mechanism name are extracted as a List.
Name in same paragraph is extracted and is extracted as one as a List, place name by scheme two
List, mechanism name are extracted as a List.
Name in same piece news is extracted and is extracted as one as a List, place name by scheme three
List, mechanism name are extracted as a List.
The present embodiment mainly introduces the processing method of scheme two, similar with scheme two for scheme one and scheme three.
For scheme two, appears in the information such as the name in the same paragraph, place name, mechanism name and be likely to have prodigious pass
Connection.By taking figure one as an example, in first segment, " outer dragon, Zheng Yourong, rice Zorovic " while same paragraph is appeared in, " horse in second segment
Ding Neisi, Gao Lin " while same paragraph is appeared in, they are football players, they are some semantic relevant words arranged side by side
, it is well suited for being put into inquiry dimension, so we can out regard these very relevant information extractions as List.The present invention
In, it is contemplated that putting the name of same paragraph, place name, mechanism name together as a List respectively, table one is to be added to extract
The Lists being drawn into according to this section of word after name, place name, mechanism name, but only List length is more than that 3 can just retain,
So the List being finally drawn into is the first two.
Specific abstracting method is as follows:
(1) for each news search word, relevant news data K items are crawled in a search engine as data set.
(2) text therein is extracted to each document crawled.
(3) each paragraph in each document is handled, extract name in each paragraph as a List,
Place name, which is extracted, to be extracted as a List, mechanism name as a List.
Extraction for the name, place name, mechanism name of Chinese, we use existing tool nlpir Chinese word segmentation systems
Chinese text is segmented, name, place name and the mechanism name of same paragraph can be readily available after participle.
For English, the name Entity recognition device of Stanford University can be used to identify name, place name, mechanism name.
If directly expanded lexical item list with the List obtained, there is some shortcomings, so needing by dimension
Lists processing of the base encyclopaedia (wikipedia) to obtaining here.
It is somewhat coarse directly to extract the certain lists of List tables out by the above method, understands the lexical item in some List and less phase
Close, be merged into the same inquiry dimension and improper, such as place name, if the same paragraph occur simultaneously " China, it is northern
Capital, Shanghai, Tianjin, Chongqing ", it is evident that " Beijing, Shanghai, Tianjin, Chongqing " is four municipalities directly under the Central Government, and " China " is then one
Include a country in the cities Zhe Sige, countries and cities are put into improper inside a List, they are not a ranks, such as
Fruit filters out " China " from the list, this List seems more suitably go description inquiry dimension.To understand
Certainly this problem, we are by the data in wikipedia, the lexical item obtained to us by extracting name, place name, mechanism name
List is filtered.
Each lexical item in every List directly should if corresponding entry information can be can not find in wikipedia
Lexical item is deleted from List, it may be possible to which noun extraction is wrong, and corresponding entry letter is found if can look in wikipedia
Breath, we grab the entry information, then " classification " attribute in the entry information are utilized to be filtered, categorical attribute
As shown in figs. 2 a-2 c.
If the classification information lap of entry is relatively more, illustrate that they are very close, such as " Beijing " and " Shanghai "
The two nouns have " the provincial administrative area of the People's Republic of China " and " Chinese megalopolis " in categorical attribute, have in picture
Two overlapped attributes, then the two nouns can appear in and inquire dimension described in the same list, if lap compares
It is few, for example, " Beijing " and " China " two nouns do not have identical classification, then illustrate that they improper are described together an inquiry
Dimension.In this application, we be exactly according to entry in wikipedia " classification " information to extract and arrive name, place name,
Mechanism name is filtered, and here by taking the place name List in the same paragraph of extraction as an example, the lexical item of name and mechanism name arranges
The filter method of table is also the same, and detailed process is as follows:
(1) each webpage of the lexical item in wikipedia in the List is crawled, and obtains each lexical item in the List
" classification " property set.
(2) " classification " property set of each lexical item in List is sought into union, obtains a big categorical attribute collection C.
(3) each classification in C is traversed, for each classification, the lexical item comprising the classification in the List is put together, such as
Lexical item in the fruit classification is more than three, then forms a new List, and List of the lexical item less than three gives up.
(4) third step cycle can obtain a series of Lists (0,1, or more) after terminating, and each
List is obtained according to a categorical attribute.
(5) it for List new each of Lists, is scored the List of extraction using idf (formula 1) information,
For standards of grading according to formula 2, wherein N is the item numbers in total for including in wikipedia, n indicate List institutes according to point
The entry sum (this specific object for clicking classification can be obtained) that generic attribute includes in wikipedia, length is indicated
The length of List is (it is intended that the List that selection is long ties up inquiry because the lexical item that long List includes is more
Degree has better supplement),
(6) a highest List of scoring is selected.
Score=length*idf formula 2
The highest List of scoring of final choice is exactly to filter it to the List being drawn into according to the information in wikipedia
The new List obtained afterwards.
Here illustrate to using idf to do, if the entry for including in some classification of a wikipedia is special
It is more, then illustrate that the semanteme of the classification is very wide in range, the List obtained using the classification as benchmark is likely to uncorrelated, is not well suited for
One inquiry dimension of description is not suitable for the such List of selection and supplements original Lists, such as search term " Cheng Long ",
Categorical attribute in wikipedia as shown in figure 3, for affiliated classification " alive personage ", enter it can be seen that belonging to this minute by point
The entry that class includes has 62,205.Therefore the lexical item that the List generated on the basis of " alive personage " includes is likely to not phase
Close, it is intended that it is relatively low according to such obtained scoring of List of classifying, so we constrained using idf it is such
List。
It can be obtained to deleting directly extraction name, place name, mechanism name using the filtered List of wikipedia
Incoherent lexical item in Lists so that the lexical item in finally obtained List is more related side by side, and the List that can make is more
Effectively supplemented.
It is obtained supplementing original method with the Lists that extraction name, place name, mechanism name are drawn into filtered same paragraph
After the Lists obtained, it is contemplated that the entity in news data in the same paragraph there is a possibility that contact is also very big, very may be used
Can be related, because the noun in news data is likely to the meaning for having special, if the relatively high entity of correlation can be had
It adds in Lists to effect, the excavation to inquiring dimension has better expansion.Another method of the application is to use
Wikipedia miner find out entity word as initial List, and then above-described filter method filtering, finally obtains new
List is expanded.
The application mainly considers following three kinds of schemes:
Scheme one extracts the entity word in same a word as a List.
Scheme two extracts the entity word in same paragraph as a List.
Scheme three extracts the entity word in same piece news as a List.
For scheme two, we find out all entities in text in each paragraph using wikipedia miner
(entity), the noun lexical item but in the same paragraph has very much, it is likely that some are incoherent, in order to what is ensured
The correlation of List ensures that they are relatively suitble to one inquiry dimension of description, we utilize the List that each paragraph is drawn into
Wikipedia is filtered processing, the List obtained after filtering is added in the Lists that original method obtains.For side
Case one and scheme three, method are similar with scheme two.
In conclusion the application can obtain more effective lexical item lists, after the lexical item list after being supplemented,
We give a mark to new lexical item list, and similar lexical item list is then merged classification, calculate different inquiry point
Face, lexical item list importance, it is final so that the inquiry dimension excavated is more perfect so that user can obtain more complete
Information.
It is described above simply to illustrate that of the invention, it is understood that the present invention is not limited to the above embodiments, meets
The various variants of inventive concept are within protection scope of the present invention.
Claims (6)
1. a kind of search dimension method for digging for query word in mass data, which is characterized in that the method includes as follows
Step:
1)Lexical item list is extracted:Based on text, html tag and repeat region pattern, from each in the data set grabbed
Lists is extracted in webpage;
2)Increase extracting mechanism, to realize to step 1)In the Lists that is drawn into carry out validity expansion;
(1)For each news search word, relevant news data K items are crawled in a search engine as data set;
(2)Text therein is extracted to each document crawled;
(3)The data of each document are handled, using same a word as the standard or same paragraph work of an extraction List
Standard of the standard or same chapter for extracting List for one as an extraction List;Will in short, the same paragraph or
Name in the same chapter is extracted to extract as a List, place name and be extracted as a List, mechanism name
It is used as a List;
(4)To step(3)In the List that extracts be filtered;
3)Lexical item list is given a mark:Assess the importance of each List extracted;
4)Lexical item list clusters:Similar lexical item list is merged to form an inquiry dimension;
5)Inquire the sequence of dimension and lexical item list:Calculate different inquiry facets, the importance of lexical item.
2. the search dimension method for digging according to claim 1 for query word in mass data, which is characterized in that institute
State step(3)In for Chinese name, place name, mechanism name extraction, first use tool nlpir Chinese word segmentation system centerings
Text is segmented, and name, place name and mechanism name therein can be obtained after participle;For the name of English, place name, machine
The extraction of structure name identifies name, place name, mechanism name using the name Entity recognition device of Stanford University.
3. the search dimension method for digging according to claim 1 for query word in mass data, which is characterized in that institute
State step(4)Specially:
a)Crawl step(3)Each webpage of the lexical item in wikipedia in the List of extraction, and obtain each word in the List
" classification " property set of item;
b)" classification " property set of each lexical item in List is sought into union, obtains a big categorical attribute collection C;
c)Each classification in C is traversed, for each classification, the lexical item comprising the classification in the List is put together, if should
Lexical item in classification is more than three, then forms a new List, and the List by lexical item less than three gives up;
d)Step c)Cycle can obtain a series of Lists after terminating, and each List is according to a categorical attribute
It obtains;
e)For the new List of each of Lists, scored the List of extraction using idf information;
f)Select a highest List of scoring as final List.
4. the search dimension method for digging according to claim 3 for query word in mass data, which is characterized in that institute
State step e)In idf calculation formula be:idf=(N-n+0.5)/(n+0.5);Wherein, to be in wikipedia include wherein N
Item numbers in total, n indicate List institutes according to categorical attribute include in wikipedia entry sum.
5. the search dimension method for digging according to claim 3 for query word in mass data, which is characterized in that institute
State step e)It is middle to be to the List of the extraction calculation formula to score using idf information:Score=length*idf, wherein
Length indicates the length of List.
6. the search dimension method for digging according to claim 1 for query word in mass data, which is characterized in that institute
State step 2)Specially:Entity word in same a word, same paragraph or same piece news is extracted as a List;
Then processing is filtered using wikipedia to the List being drawn into.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510890422.5A CN105528421B (en) | 2015-12-07 | 2015-12-07 | A kind of search dimension method for digging for query word in mass data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510890422.5A CN105528421B (en) | 2015-12-07 | 2015-12-07 | A kind of search dimension method for digging for query word in mass data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105528421A CN105528421A (en) | 2016-04-27 |
CN105528421B true CN105528421B (en) | 2018-09-04 |
Family
ID=55770644
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510890422.5A Active CN105528421B (en) | 2015-12-07 | 2015-12-07 | A kind of search dimension method for digging for query word in mass data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105528421B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106354799B (en) * | 2016-08-26 | 2020-01-14 | 河海大学 | Subject data set multilayer facet filtering method and system based on data quality |
CN109241296A (en) * | 2018-09-14 | 2019-01-18 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN109815495B (en) * | 2019-01-16 | 2020-06-05 | 西安交通大学 | Method for performing topic facet mining through label propagation algorithm |
CN110163688A (en) * | 2019-05-30 | 2019-08-23 | 复旦大学 | Commodity network public sentiment detection system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440256A (en) * | 2013-07-26 | 2013-12-11 | 中国科学院深圳先进技术研究院 | Method and device for automatically generating Chinese text label cloud |
CN104731768A (en) * | 2015-03-05 | 2015-06-24 | 西安交通大学城市学院 | Incident location extraction method oriented to Chinese news texts |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9280587B2 (en) * | 2013-03-15 | 2016-03-08 | Xerox Corporation | Mailbox search engine using query multi-modal expansion and community-based smoothing |
-
2015
- 2015-12-07 CN CN201510890422.5A patent/CN105528421B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440256A (en) * | 2013-07-26 | 2013-12-11 | 中国科学院深圳先进技术研究院 | Method and device for automatically generating Chinese text label cloud |
CN104731768A (en) * | 2015-03-05 | 2015-06-24 | 西安交通大学城市学院 | Incident location extraction method oriented to Chinese news texts |
Non-Patent Citations (4)
Title |
---|
Facet-based opinion retrieval from blogs;Olga Vechtomova;《Information Processing and Management》;20090717;第71-88页,第3.2节 * |
Finding dimensions for queries;Zhicheng Dou 等;《Acm International Conference on Information & Knowledge Management ACM》;20111028;第1311-1320页,第3.1节至3.2节 * |
Summarization and Expansion of Search Facets;Aparna Nurani Venkitasubramanian 等;《Ceur Workshop Proceedings》;20130426;第1-2页 * |
结合相关规则和本体加权图的查询扩展;郝志峰 等;《计算机应用研究》;20140418;第31卷(第10期);第3028-3032页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105528421A (en) | 2016-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105528421B (en) | A kind of search dimension method for digging for query word in mass data | |
JP2004289848A5 (en) | ||
CN101727498A (en) | Automatic extraction method of web page information based on WEB structure | |
Tao et al. | Nearest keyword search in xml documents | |
CN102054029A (en) | Figure information disambiguation treatment method based on social network and name context | |
CN102262618A (en) | Method and device for identifying page information | |
CN107748745B (en) | Enterprise name keyword extraction method | |
CN106777261A (en) | Data query method and device based on multi-source heterogeneous data set | |
CN104317867B (en) | The system that entity cluster is carried out to the Web page picture that search engine returns | |
Afzaal et al. | A novel framework for aspect-based opinion classification for tourist places | |
CN106294358A (en) | The search method of a kind of information and system | |
CN104346382B (en) | Use the text analysis system and method for language inquiry | |
CN103514289A (en) | Method and device for building interest entity base | |
Bhardwaj et al. | A novel approach for content extraction from web pages | |
CN101369275A (en) | Product attribute excavation method of non-structured text | |
CN104462439B (en) | The recognition methods of event and device | |
CN105095203A (en) | Methods for determining and searching synonym, and server | |
CN106951511A (en) | A kind of Text Clustering Method and device | |
JP2010123038A (en) | Related web page detecting device, related web page detecting method, and related web page detecting program | |
CN107369066A (en) | A kind of feature between comment object compares method and device | |
CN110321403A (en) | A kind of lyrics generation method | |
CN106970922A (en) | Index establishing method, search method and directory system based on multi-field keyword | |
Win et al. | Web page segmentation and informative content extraction for effective information retrieval | |
JP6843588B2 (en) | Document retrieval method and equipment | |
Choi et al. | Consento: a new framework for opinion based entity search and summarization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200717 Address after: Room 2510, 25 / F, building 1, yard 1, Danling street, Haidian District, Beijing 100600 Patentee after: BEIJING YILANQUNZHI DATA TECHNOLOGY Co.,Ltd. Address before: 100872 No. 59, Zhongguancun Avenue, Haidian District, Beijing Patentee before: Renmin University of China |