CN101937444A - Textile raw material-oriented semantic-based data search engine - Google Patents

Textile raw material-oriented semantic-based data search engine Download PDF

Info

Publication number
CN101937444A
CN101937444A CN2010101603775A CN201010160377A CN101937444A CN 101937444 A CN101937444 A CN 101937444A CN 2010101603775 A CN2010101603775 A CN 2010101603775A CN 201010160377 A CN201010160377 A CN 201010160377A CN 101937444 A CN101937444 A CN 101937444A
Authority
CN
China
Prior art keywords
search
document
search engine
information
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010101603775A
Other languages
Chinese (zh)
Inventor
吕瑞宝
闫红桥
沈霞锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHAOXING YIQI INFORMATIONAL TECHNOLOGY Co Ltd
Original Assignee
SHAOXING YIQI INFORMATIONAL TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHAOXING YIQI INFORMATIONAL TECHNOLOGY Co Ltd filed Critical SHAOXING YIQI INFORMATIONAL TECHNOLOGY Co Ltd
Priority to CN2010101603775A priority Critical patent/CN101937444A/en
Publication of CN101937444A publication Critical patent/CN101937444A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a textile raw material-oriented semantic-based data search engine. Software comprises three parts, namely a client operating interface, a semantic-based search engine and a market behavior data warehouse, wherein 1) the semantic-based search engine realizes a search engine mode by adopting a metadata file-based semantic search mode; 2) intelligent comparison is performed, namely the project sets the pairing function of information, and a user publishes an item of own product information, namely can search purchase information matched with the product information; and 3) a dynamic-based textile raw material base data warehouse and the market behavior data warehouse provide analysis services of market behaviors for an enterprise in pricing, production, selling and storage by adopting methods of multivariate statistical analysis, predictive analysis, decision analysis and the like and expert system technology.

Description

A kind of towards the data search engine of textile raw material based on semanteme
Technical field
The present invention relates to electronics and message area computer software, relate to a kind ofly specifically, be specially adapted to textile enterprise's information interaction towards the data search engine of textile raw material based on semanteme.
Background technology
In recent years, along with a large amount of new electronic business modes produce, part enterprise is devoted to trade matching on the research network at the supply and demand situation of textile industry, online supermarket, and the dealing information trading of member system.Compare with external relative ripe ecommerce, remain in a certain distance.Trace it to its cause:
The one, the computed popularity rate of domestic part medium-sized and small enterprises leader is lower, makes the people produce certain doubt to the virtual presence of network;
The 2nd, country does not put into effect clear and definite ecommerce policies and regulations as yet, and online violation operation has exposure repeatly, has limited Development of E-business speed to a certain extent.But along with being gradually improved of Electronic Commerce in China policy, government clearly advocates again will greatly develop ecommerce, large quantities of new transaction platforms and method are arisen at the historic moment, but present most products all are similar to the form of B to B, in the majority to exchange paid information pattern, part is then with the transaction between the actual enterprise of intermediary's pattern participation, to return sharp some profit.These forms have all caused the client can't find corresponding information quickly.Even if information releasing but owing to there is not the complete definition message header content of science,, exist the mistake of information to leak on the internet even if search engine also can't be found the information content of oneself wanting.And the software operation of circulating on the market mostly needs certain Basis of Computer Engineering, and many potential clients want to cause popularity rate to be affected with but not operating.
Summary of the invention
The objective of the invention is to overcome the defective of above-mentioned prior art, develop the software in a kind of content specialization (textile material field), break through traditional literal input form,, make search quick more, comprehensive with option formal definition product title; Set the pairing function of information, the user issue one oneself product information simultaneously intelligence seek the information of wanting to buy that matches.
The present invention is achieved by following technical proposals: a kind of towards the data search engine device of textile raw material based on semanteme, it is characterized in that: comprise three major parts: and the client operation interface, based on the search engine of semanteme, market behavior data warehouse.
Data search software based on semanteme comprises the hardware and software two large divisions, and hardware components comprises server and is arranged on the client of each functional department's part, is connected into LAN by bus, or sets up into Ethernet.
Software section comprises:
System adopts the asp+delphi language to develop, and wherein the core is by the asp language development, and the log-in interface framework is the delphi exploitation, and the system configuration log processing module adopts the java language development in addition.Can select a Ge to carry out data bank with last Capital material storehouse obtains.Index with the data bank overstated is browsed, Hyperlink shows, the data sequencing ability.
The present invention also provides a kind of and it is characterized in that towards the data search engine method of textile raw material based on semanteme, may further comprise the steps:
(1), reads document information from metadata document storage district;
(2), utilize screening washer to filter format information and non-legible information in the metadata document, it is right to generate text strings and attribute/value, and it is passed to index engine;
(3), the character string of extracting is carried out reverse indexing: promptly record comprises information, occurrence number and the relevant position of search news in document of the document of search word; The corresponding relation that to be accustomed to thinking in other words: " document number " is to " all keywords in the file ", utilize inverted index that this relation is turned around, become: to " have this key with All Files number ", promptly certain query word occurred in some file " keyword "; Reverse indexing can applied statistics and new probability formula, so that calculate the correlativity of document fast;
(4), according to search word the metadata set that searches is sorted according to special algorithm, maximally related document is placed on the foremost, improve the accuracy of search: sum up through light textile raw database shared data being carried out long-term analysis, the inner data weighting analytical approach of formulating oneself of project, the sort algorithm that adopts based on this, and in conjunction with search engine sort algorithm PageRank commonly used, just the link analysis algorithm obtains.
The invention has the beneficial effects as follows, can utilize search and service-Engine,, generate a powerful data warehouse by the automatic extracting useful information of system based on semanteme.Make the very convenient information needed that retrieves efficiently of user's energy, overcome original software and do not supported fuzzy query, be subjected to user's consistent favorable comment.
Description of drawings
Fig. 1 is a flow process frame diagram of the present invention.
Fig. 2 is the search interface form (part) of this search system.
The XML keyword character string that Fig. 3 reads
Fig. 4 character string is carried out the design sketch behind the participle
The index structure of Fig. 5 inverted index
The tabulation of Fig. 6 metadata node
The keyword that Fig. 7 extracts at specific node
Fig. 8 metadata search engine inlet
Embodiment
Past is because the limitation of thinking and technology will realize that the full-text search mode at the XML pattern is a difficult problem.In order to realize the query and search at metadata, system has taked to realize by the method for database search technology the inquiry of metadata.Most interested and the most easy-to-use metadata node during the analysis user query metadata at first, as data sheet field, the property value that extracts this node in each metadata is as field value with these nodes.Simultaneously meta data file is adopted the mode of big file directly to be stored in the database, Web publishing is provided, be convenient to the user and check.In this way, also can satisfy the demand of a part of user, but clearly, there are a lot of weak points in this mode for data retrieval.
At first, when extracting metadata information, the field of can not the assurance system extracting can satisfy user's inquiry needs.Secondly, the process that information extraction is saved in the database from XML document not only needs code to realize, also must need artificial cooperation, has wasted resource virtually, has influenced efficient.The 3rd, database search itself has a lot of limitation.Such as, the efficient of database retrieval in the extreme down, can consume great amount of hardware resources, can't finish full-text search (can be with the list index function of the SQL the simplest index function of finishing is realized rudimentary full-text index), can't highlighted demonstration term, can not carry out sort result according to correlativity to result set.
For reason given above, this way of search based on database before must needing to change adopts new pattern to realize towards the textile raw material data search engine.Propose to adopt semantic retrieval mode to realize search engine shown in picture based on meta data file,
Past is in order to reach the purpose of search metadata information, and we manually extract partial information to the meta data file with the document form storage and are transformed in the database and store, and there are a lot of drawbacks in this mode.And now in conjunction with the semantic searching method based on file, can realize directly meta data file being searched for, thereby reach the final purpose of searching data.Therefore Design of searching engine considers that the user has different demands and different querying conditions, and the search engine for metadata provides two kinds of inlets: full-text search and at the search of metadata node element.
Full-text search is a kind of to user's way of search the most easily, for the user, as long as know any point information about these data, can both find the data that need by the full-text search engine.It is exactly this characteristic of utilizing full-text search that the foundation of starting is drawn in the metadata full-text search, makes things convenient for the user to search the data of demand fast according to the descriptor of its no datat in the light textile raw database information sharing data of magnanimity.Simply inquire about normally text query, in each document of metadata, search for each (or owning) query word.If but just open merely and scan each document, seek each query word, that can influence search efficiency because handle and open each document and search query word when apricot is ask and waste many times.
Therefore, the implementation method of the full-text search of metadata is that according to the standard of XML file layout, the real information that adopts technology for information acquisition to propose in each document is set up index, and preserves index with a kind of method of being convenient to retrieve.Just need not scan each document when handling inquiry so, but adopt reverse indexing to compare metadata mutually, and select and the most related metadata of inquiry.Like this, full-text search can be main process object with the metadata of textual form just, based on the full text index, uses special sort to retrieve.
Index xml code segment, code is as follows:
<?xml?version=″1.0″encoding=″utf-8″standalone=″yes″?>
<p_class_list>
<p_class name=" natural material " 〉
<p_c>
<product_name〉product Chinese full name</product_name 〉
<product_Specification〉specification</product_Specification 〉
<product_price〉price</product_price 〉
<product_company〉manufacturing enterprise</product_company 〉
<Main_function〉major function</Main_function 〉
<product_Performance〉performance</product_Performance 〉
<Technical_parameters〉technical parameter</Technical_parameters 〉
</p_c>
The searching algorithm that system adopts belongs to indexed search, promptly exchanges the time for the space, and file, the character stream that retrieve carried out full-text index.In retrieval, index is retrieved fast, obtained retrieve position, file path or certain keyword that this location records term occurs.Because for each phrase item that cuts out, the document that comprises it can be listed in this index.The inversion of Here it is document and an associate naturally, just inverted index.
1) reads document information from metadata document storage district.
2) utilize screening washer to filter format information and non-legible information in the metadata document, it is right to generate text strings and attribute/value, and it is passed to index engine, as Fig. 3 promptly is to screen with respect to the light textile raw database meta data file form shown in above, has filtered the metadata keywords that obtains after the garbage.
3) character string of extracting is carried out reverse indexing.Just record comprises information, occurrence number and the relevant position of search news in document of the document of search word.The corresponding relation that to be accustomed to thinking in other words: " document number " is to " all keywords in the file ", utilize inverted index that this relation is turned around, become: to " have this key with All Files number ", promptly certain query word occurred in some file " keyword ".Reverse indexing can applied statistics and new probability formula, so that calculate the correlativity of document fast.To 2) in the character string that obtains carry out behind the participle effect as shown in Figure 4.Except the recorded key speech occurs not enough in which meta data file, we also need to know the position of keyword in occurrence number and appearance, so just can be when showing Query Result for keyword carry out highlighted (for example, the keyword font reddens or overstriking) handle, and the saving index space improves search efficiency.The indexed results of keyword (suppose to have two and be numbered 1 and 2 meta data file) as shown in table 1: Fig. 5 explanation " China " this keyword occurred in meta data file 23 times, the position occurred and was respectively the 5th, 16 and 22 keyword.
4) according to search word the metadata set that searches is sorted according to special algorithm, maximally related document is placed on the foremost, improve the accuracy of search.Sum up through light textile raw database shared data being carried out long-term analysis, the data weighting analytical approach of oneself has been formulated in project inside.The sort algorithm that adopts is exactly based on this, and in conjunction with search engine sort algorithm PageRank commonly used, just link analysis algorithm (the maximum data that are cited are exactly most important data) obtains.
Though the quantity of information that the full-text search mode comprises is the abundantest, also must bring certain drawback simultaneously.Because to the understanding of data very little, return a large amount of result sets after will inevitably causing search during user search, even the help of sort algorithm is arranged, system effectiveness still must be under some influence.Design based on the particular sections point search is exactly in order to avoid this problem better, searches for targetedly, will inevitably improve search efficiency and accuracy.
Mentioned, metadata was divided into two-stage towards textile raw material information sharing metadata standard regulation.The one-level metadata is data set of only sign (data set, data set series, key element and attribute) needed minimum metadata entity and element.This explanation must comprise the essential element of stipulating in the one-level metadata according to each meta data file that standard generates.That is to say have some elements can data set of only expression.Essential element in the one-level metadata of stipulating in the textile raw material information sharing metadata standard scheme (is only listed part) as shown in Figure 6.
According to the characteristics of above-mentioned light textile raw database-level metadata node, we as specific node, to expanding based on data search function in full, increase particular sections point search function with the one-level metadata stipulated in the light textile raw database metadata standard.
Native system is realized obtaining the property value of any specific node in the file by code.If the user carries out index at " technical parameter " this specific node, so will be based on this among Fig. 7 to the filtration of metadata keywords, will all filter with " technical parameter " irrelevant information, reduced a large amount of irrelevant informations, as shown in Figure 4.So, compare with Fig. 2, conspicuous.Low volume data is carried out index and search, and its efficient must be much higher than the search to full text information.The search of specific node and the indexing model of full-text search much at one, crucial difference is obtaining the effective information in the meta data file.
Running example
Therefore,, and offer the user search interface, will significantly reduce the quantity of disturbing result set, improve the validity of search if targetedly this specific node is carried out index.Consider the search custom of system user and to the going and finding out what's going on of data, what realize in the system at present is the retrieval that these three specific node elements of dataset name, summary and responsible department are provided for the user.
Light textile raw material metadata amount in China's light textile raw material net is about 1,000,000.As mentioned above, system can carry out different index to metadata according to different retrieval needs, sets up metadata full-text search, dataset name retrieval, the retrieval of data set summary and data set to submit four inlets of unit retrieval to.
Because engine has comprised Chinese analyzer, so the inquiry of metadata can be imported Chinese or English is retrieved as query word, and the Chinese or the English that contain this keyword all will be revealed.In result set, the highlighted demonstration of query word meeting allows the user can clearly find search word fast.
Engine also can be retrieved based on multi-key word.For example, the user wishes to retrieve the content of light textile market, also wishes the content of retrieval quotation simultaneously, only needs these two query words of input, and separates with the space, just can obtain result for retrieval.
Therefrom can obviously find out full-text search different with at node searching, be term with " spandex ", and 22 records are returned in full-text search, and node searching returns 5 records.The result of node searching is more targeted, has got rid of a lot of useless result sets, the data that just can find the user to need without page turning, and the full-text search applicable scope is more extensive, the two respectively has its advantage.

Claims (2)

1. one kind towards the data search engine device of textile raw material based on semanteme, it is characterized in that: comprise three major parts: the client operation interface, and based on the search engine of semanteme, market behavior data warehouse.
2. one kind towards the data search engine method of textile raw material based on semanteme, it is characterized in that, may further comprise the steps:
(1), reads document information from metadata document storage district;
(2), utilize screening washer to filter format information and non-legible information in the metadata document, it is right to generate text strings and attribute/value, and it is passed to index engine;
(3), the character string of extracting is carried out reverse indexing: promptly record comprises information, occurrence number and the relevant position of search news in document of the document of search word; The corresponding relation that to be accustomed to thinking in other words: " document number " is to " all keywords in the file ", utilize inverted index that this relation is turned around, become: to " have this key with All Files number ", promptly certain query word occurred in some file " keyword "; Reverse indexing can applied statistics and new probability formula, so that calculate the correlativity of document fast;
(4), according to search word the metadata set that searches is sorted according to special algorithm, maximally related document is placed on the foremost, improve the accuracy of search: sum up through light textile raw database shared data being carried out long-term analysis, the inner data weighting analytical approach of formulating oneself of project, the sort algorithm that adopts based on this, and in conjunction with search engine sort algorithm PageRank commonly used, just the link analysis algorithm obtains.
CN2010101603775A 2010-04-30 2010-04-30 Textile raw material-oriented semantic-based data search engine Pending CN101937444A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101603775A CN101937444A (en) 2010-04-30 2010-04-30 Textile raw material-oriented semantic-based data search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101603775A CN101937444A (en) 2010-04-30 2010-04-30 Textile raw material-oriented semantic-based data search engine

Publications (1)

Publication Number Publication Date
CN101937444A true CN101937444A (en) 2011-01-05

Family

ID=43390777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101603775A Pending CN101937444A (en) 2010-04-30 2010-04-30 Textile raw material-oriented semantic-based data search engine

Country Status (1)

Country Link
CN (1) CN101937444A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599988A (en) * 2016-12-09 2017-04-26 无锡清华信息科学与技术国家实验室物联网技术中心 Multi-level semantic feature extraction method for behavior data of intelligent wearable equipment
CN108733708A (en) * 2017-04-21 2018-11-02 国家计算机网络与信息安全管理中心 Method, apparatus and computer storage media for information management
CN110928998A (en) * 2019-12-09 2020-03-27 南开大学 Latin side search engine based on equivalence class representative element index and storage
CN111199143A (en) * 2018-10-31 2020-05-26 北大方正集团有限公司 Indexing method, device and equipment of Word thesis and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1327214A (en) * 2000-06-01 2001-12-19 纺织品商业有限公司 System and method for optionaly ordering by internet
CN101281525A (en) * 2007-11-23 2008-10-08 北京九城网络软件有限公司 System and method for searching based on knowledge base on internet
CN101655848A (en) * 2008-08-20 2010-02-24 华为技术有限公司 Method, system and device for implementing content management

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1327214A (en) * 2000-06-01 2001-12-19 纺织品商业有限公司 System and method for optionaly ordering by internet
CN101281525A (en) * 2007-11-23 2008-10-08 北京九城网络软件有限公司 System and method for searching based on knowledge base on internet
CN101655848A (en) * 2008-08-20 2010-02-24 华为技术有限公司 Method, system and device for implementing content management

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599988A (en) * 2016-12-09 2017-04-26 无锡清华信息科学与技术国家实验室物联网技术中心 Multi-level semantic feature extraction method for behavior data of intelligent wearable equipment
CN106599988B (en) * 2016-12-09 2019-10-08 无锡清华信息科学与技术国家实验室物联网技术中心 A kind of multistage semantic feature extraction method of intelligence wearable device behavioral data
CN108733708A (en) * 2017-04-21 2018-11-02 国家计算机网络与信息安全管理中心 Method, apparatus and computer storage media for information management
CN111199143A (en) * 2018-10-31 2020-05-26 北大方正集团有限公司 Indexing method, device and equipment of Word thesis and storage medium
CN110928998A (en) * 2019-12-09 2020-03-27 南开大学 Latin side search engine based on equivalence class representative element index and storage
CN110928998B (en) * 2019-12-09 2023-04-14 南开大学 Latin side search engine based on equivalence class representative element index and storage

Similar Documents

Publication Publication Date Title
CN103164454B (en) Keyword group technology and system
CN110597981B (en) Network news summary system for automatically generating summary by adopting multiple strategies
KR101419504B1 (en) System and method providing a suited shopping information by analyzing the propensity of an user
CN102708096B (en) Network intelligence public sentiment monitoring system based on semantics and work method thereof
JP5721818B2 (en) Use of model information group in search
KR101463974B1 (en) Big data analysis system for marketing and method thereof
CN103294781B (en) A kind of method and apparatus for processing page data
Vosecky et al. Searching for quality microblog posts: Filtering and ranking based on content analysis and implicit links
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN1963816A (en) Automatization processing method of rating of merit of search engine
CN111008265A (en) Enterprise information searching method and device
CN101751458A (en) Network public sentiment monitoring system and method
CN103136360A (en) Internet behavior markup engine and behavior markup method corresponding to same
CN104750713A (en) Method and device for sorting search results
CN104239340A (en) Search result screening method and search result screening device
CN103678412A (en) Document retrieval method and device
CN103116635B (en) Field-oriented method and system for collecting invisible web resources
CN102521321A (en) Video search method based on search term ambiguity and user preferences
CN110188291B (en) Document processing based on proxy log
CN104615627A (en) Event public sentiment information extracting method and system based on micro-blog platform
CN108446333B (en) Big data text mining processing system and method thereof
CN111859065A (en) Big data-based public opinion listening system
CN103970800A (en) Method and system for extracting and processing webpage related keywords
CN115905489A (en) Method for providing bid and bid information search service
CN101937444A (en) Textile raw material-oriented semantic-based data search engine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110105