CN110502680A - A kind of abstracting method and device of acceptance of the bid bulletin relevant field - Google Patents

A kind of abstracting method and device of acceptance of the bid bulletin relevant field Download PDF

Info

Publication number
CN110502680A
CN110502680A CN201910797772.5A CN201910797772A CN110502680A CN 110502680 A CN110502680 A CN 110502680A CN 201910797772 A CN201910797772 A CN 201910797772A CN 110502680 A CN110502680 A CN 110502680A
Authority
CN
China
Prior art keywords
acceptance
bid
website
relevant field
bulletin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910797772.5A
Other languages
Chinese (zh)
Inventor
廖泽丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Dasicong Information Technology Co Ltd
Original Assignee
Chongqing Dasicong Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Dasicong Information Technology Co Ltd filed Critical Chongqing Dasicong Information Technology Co Ltd
Priority to CN201910797772.5A priority Critical patent/CN110502680A/en
Publication of CN110502680A publication Critical patent/CN110502680A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/08Auctions

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides the abstracting methods and device of a kind of acceptance of the bid bulletin relevant field, wherein, abstracting method includes the following steps: S1, finds the related web site for being corresponding with acceptance of the bid bulletin from network, S2, the link for getting acceptance of the bid list therein, S3, the content pages link for getting acceptance of the bid bulletin, S4, obtain text by the link of content pages, and obtain text total data, S5, relevant acceptance of the bid bulletin relevant field is extracted from text, S6, the acceptance of the bid field of extraction is verified.The present invention is provided with the abstracting method and device of a kind of acceptance of the bid bulletin relevant field for the project for bidding of construction field, can be used that family is more convenient, the accurate acceptance of the bid information for obtaining relevant item or enterprise.

Description

A kind of abstracting method and device of acceptance of the bid bulletin relevant field
Technical field
The present invention relates to internet data structuring processing technology fields more particularly to a kind of acceptance of the bid to announce relevant field Abstracting method and device.
Background technique
It with the informationization progress of bidding industry, needs to announce acceptance of the bid progress induction and conclusion, to be bidding row The relevant departments of industry and enterprise provide efficiently data service.In existing technology, the extraction for bulletin of getting the bid it is all relatively succinct and Simply, it is easy to appear mistake or inefficient, causes a large amount of drains on manpower and material resources.
Summary of the invention
The invention discloses a kind of abstracting methods of acceptance of the bid bulletin relevant field, by the way that related web site is classified and to pumping The technological means that the field taken is tested solves the problems, such as to be easy to appear mistake or inefficient in the prior art.
The present invention specifically adopts the following technical scheme that S1, finds the related web site for being corresponding with acceptance of the bid bulletin from network, S2, Get the link of acceptance of the bid list therein, S3, the content pages link for getting acceptance of the bid bulletin, S4, the link by content pages Text is obtained, and obtains text total data, S5, relevant acceptance of the bid is extracted from text and announces relevant field, S6, to extraction Acceptance of the bid field verified.
In addition, being specifically included the present invention also provides a kind of draw-out device of acceptance of the bid bulletin relevant field
MIM message input module inputs the relevant field information for wanting to extract for user;
Related web site obtains module, for obtaining the website containing the call for tender;
Websites collection module, for the related web site point to be classified according to structure of web page and type of coding;
Relevant field extraction module formulates different creep for defining different keywords according to different types of website Strategy is to grab relevant field;
Authentication module, whether authentication module connects structure art library and enterprise library, quasi- for verifying grabbed relevant field Really;
As a result output module announces relevant field for exporting the acceptance of the bid being drawn into.
Compared with the existing technology, the invention has the following beneficial effects:
When the extraction of the underway mark bulletin of the prior art, it is required to judge current internet structure of web page and coding every time Rule then needs replacing extraction program and is extracted in the structure and not identical coding rule of two neighboring internet web page, To influence extraction efficiency.Website is divided into different types by the present invention, and same type of website corresponds to identical search strategy, Therefore recall precision is improved.In addition, the present invention establishes the data of relevant enterprise and structure art for the bidding of construction class The field of extraction is verified in library, more accurately can provide required information for user.
Detailed description of the invention
Fig. 1 is a kind of abstracting method flow chart of acceptance of the bid bulletin relevant field in the present invention.
Fig. 2 is a kind of draw-out device structure chart of acceptance of the bid bulletin relevant field in this development.
Specific embodiment
Carry out the particular content that the present invention will be described in detail with reference to the accompanying drawings and examples.
As shown in Fig. 1, the invention discloses a kind of abstracting methods of acceptance of the bid bulletin relevant field.Include the following steps: S1, the related web site for being corresponding with acceptance of the bid bulletin is found from network, S3, S2, the link for getting acceptance of the bid list therein are got The content pages link of acceptance of the bid bulletin, S4, obtains text by the links of content pages, and obtains text total data, S5, from just Relevant acceptance of the bid bulletin relevant field is extracted in text, and S6, the acceptance of the bid field of extraction is verified.
Related web site in the S1 includes government procurement website, school's buying website, calling for tenders of project website and large size Bidding website.As the embodiment of the present invention, government procurement website may include Chinese Government's buying net, each province and city government procurement Net, district government's buying net etc., it includes Chinese school's bid net, each province and city school bid net etc., calling for tenders of project that school, which purchases net, Website includes construction net, each province's calling for tenders of project net etc., and large-scale bidding website includes winged steed, adopts and recruit net etc..
The S2, the link for getting acceptance of the bid list therein, S3, the content pages link for getting acceptance of the bid bulletin, S4, lead to The link for crossing content pages obtains text, and obtains text total data, and relevant acceptance of the bid bulletin correlation S5, is extracted from text Field, including known government procurement website, school's buying website, calling for tenders of project website and large-scale bidding website are classified as just Beginning URL is grabbed using information of the focused crawler to website, formulates different types of website different crawl policies, root According to different crawl policies, crawler capturing parameter is set.Wherein, different type refers to the structure of web page and coding rule of each website The website of identical structure of web page and coding rule is divided into one kind, and formulates identical crawl policy by difference, can be subsequent Relevant field saves retrieval time when extraction.
Common crawl policy includes that depth-first strategy, breadth-first strategy, backward chaining number strategy and major station are excellent First strategy.Depth-first strategy is the sequence according to depth from low to high, next stage web page interlinkage is successively accessed, until cannot be again Until deeply.Web crawlers searches further for other links back to a upper hinged node after completing a branch of creeping.When After all-links have traversed, the task of creeping terminates.Breadth-first strategy is according to the web page contents TOC level depth come page of creeping Face, the page in shallower TOC level are creeped first.After the page in same level is creeped, web crawlers is deep again Enter next layer to continue to creep.Backward chaining number in backward chaining number strategy refers to what a webpage was directed toward by other web page interlinkages Quantity, backward chaining number indicate be a webpage content by other people recommendation degree, commented by this index The significance level of valence webpage, to determine the crawl sequencing of different web pages.Major station preference strategy is for URL team to be grabbed All webpages in column are classified according to affiliated website, preferential to download for the website more than page number to be downloaded.It needs to illustrate , when specified crawl policy has multiple, multiple crawl policies cannot conflict each other, in case web crawlers can not be grabbed effectively It wins the confidence breath.In embodiments of the present invention, different crawl policies can be set to different websites, user is different to be grabbed to meet Demand is taken, the crawl efficiency of information is improved.
Relevant field in the S5 includes acceptance of the bid title, acceptance of the bid enterprise, acceptance of the bid structure art, structure art number of registration, acceptance of the bid Item types, the acceptance of the bid amount of money, acceptance of the bid time, location.The present invention is extracted for the acceptance of the bid bulletin relevant field of construction field Method only extracts the relevant data of construction field, reduces data storage capacity and improves arithmetic speed.
Verifying is carried out to the acceptance of the bid field of extraction in the S6 to include A1, verify whether to announce for acceptance of the bid, A2, is verified whether For enterprise, A3, verify whether as structure art.As having " acceptance of the bid " printed words, with " bid result " even if the acceptance of the bid publicity of printed words. Verify whether that the method for enterprise is matching existing enterprise library data, whether verification is enterprise, and whether verify enterprise correct. The method for judging whether it is structure art be by his role and name in the project, directly judge whether it is structure art, and The name and structure art number of registration in existing structure art library are matched, whether verification is structure art.
As shown in Fig. 2, it the present invention also provides a kind of draw-out device of acceptance of the bid bulletin relevant field, specifically includes
MIM message input module inputs key word information for user;Such as user wants by certain project name query The title of enterprise is marked, then cuit title.
Related web site obtains module, for obtaining the website containing the call for tender;
Websites collection module, for the related web site point to be classified according to structure of web page and type of coding;
Relevant field extraction module formulates different creep for defining different keywords according to different types of website Strategy is to grab relevant field;
Data processing module carries out uniform format for the processing to duplicate message, and to the relevant field of crawl;By There are many duplicate contents in government procurement website, school's buying website, calling for tenders of project website and large-scale bidding website, it is identical Acceptance of the bid information, may have announcement in major website, it is therefore desirable to information carry out duplicate removal, and by format unified with Convenient for storage.
Data memory module, for storing treated data;The data of storage include relevant field above-mentioned, such as in Entitling claims, the enterprise that gets the bid, acceptance of the bid structure art, structure art number of registration, project winning a bid type, the acceptance of the bid amount of money, gets the bid time, location Area.
Abstraction module extracts relevant field information for the key word information according to input from data memory module;In After user's cuit title, the present apparatus is retrieved in data memory module, and will be related to the keyword that user keys in Information scratching come out.
Authentication module, whether authentication module connects structure art library and enterprise library, quasi- for verifying grabbed relevant field Really;
As a result output module announces relevant field for exporting the acceptance of the bid being drawn into.
Finally, it is stated that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although referring to compared with Good embodiment describes the invention in detail, those skilled in the art should understand that, it can be to skill of the invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of technical solution of the present invention, should all be covered at this In the scope of the claims of invention.

Claims (6)

1. a kind of abstracting method of acceptance of the bid bulletin relevant field, which comprises the steps of: S1, find from network pair The related web site that should there is acceptance of the bid to announce, S2, the link for getting acceptance of the bid list therein, S3, the content for getting acceptance of the bid bulletin S4, page link obtains text by the links of content pages, and obtain text total data, S5, extracted from text it is relevant Acceptance of the bid bulletin relevant field, S6, verifies the acceptance of the bid field of extraction.
2. a kind of abstracting method of acceptance of the bid bulletin relevant field as described in claim 1, it is characterised in that: the phase in the S1 Closing website includes government procurement website, school's buying website, calling for tenders of project website and large-scale bidding website.
3. a kind of abstracting method of acceptance of the bid bulletin relevant field as claimed in claim 2, it is characterised in that: the S2, acquisition To the link of acceptance of the bid list therein, S4, S3, the content pages link for getting acceptance of the bid bulletin are obtained by the link of content pages Text, and text total data is obtained, relevant acceptance of the bid bulletin relevant field S5, is extracted from text, including will be known Government procurement website, school's buying website, calling for tenders of project website and large-scale bidding website are classified as initial URL, are climbed using focusing Worm grabs the information of website, formulates different types of website different crawl policies, according to different crawl policies Crawler capturing parameter is set.
4. a kind of abstracting method of acceptance of the bid bulletin relevant field as described in claim 1, it is characterised in that: the phase in the S5 Close field include acceptance of the bid title, acceptance of the bid enterprise, acceptance of the bid structure art, structure art number of registration, project winning a bid type, acceptance of the bid the amount of money, in Mark time, location.
5. a kind of abstracting method of acceptance of the bid bulletin relevant field as described in claim 1, it is characterised in that: to mentioning in the S6 The acceptance of the bid field taken carries out verifying and includes A1, verifies whether A2, to verify whether as enterprise for acceptance of the bid bulletin, A3, verify whether for Structure art.
6. a kind of draw-out device of acceptance of the bid bulletin relevant field characterized by comprising
MIM message input module inputs key word information for user;
Related web site obtains module, for obtaining the website containing the call for tender;
Websites collection module, for the related web site point to be classified according to structure of web page and type of coding;
Relevant field extraction module formulates different crawl policies for defining different keywords according to different types of website To grab relevant field;
Data processing module carries out uniform format for the processing to duplicate message, and to the relevant field of crawl;
Data memory module, for storing treated data;
Abstraction module extracts relevant field information for the key word information according to input from data memory module;
Authentication module, whether authentication module connects structure art library and enterprise library, accurate for verifying grabbed relevant field;
As a result output module announces relevant field for exporting the acceptance of the bid being drawn into.
CN201910797772.5A 2019-08-27 2019-08-27 A kind of abstracting method and device of acceptance of the bid bulletin relevant field Pending CN110502680A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910797772.5A CN110502680A (en) 2019-08-27 2019-08-27 A kind of abstracting method and device of acceptance of the bid bulletin relevant field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910797772.5A CN110502680A (en) 2019-08-27 2019-08-27 A kind of abstracting method and device of acceptance of the bid bulletin relevant field

Publications (1)

Publication Number Publication Date
CN110502680A true CN110502680A (en) 2019-11-26

Family

ID=68588371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910797772.5A Pending CN110502680A (en) 2019-08-27 2019-08-27 A kind of abstracting method and device of acceptance of the bid bulletin relevant field

Country Status (1)

Country Link
CN (1) CN110502680A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506795A (en) * 2020-04-20 2020-08-07 北京中电普华信息技术有限公司 Bidding information acquisition method and device
CN112100235A (en) * 2020-08-13 2020-12-18 北京理工大学 Information Communication Technology (ICT) supply chain relation portrait based on public data source
CN113704667A (en) * 2021-08-31 2021-11-26 北京百炼智能科技有限公司 Automatic extraction processing method and device for bidding announcement

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915334A (en) * 2015-05-29 2015-09-16 浪潮软件集团有限公司 Automatic extraction method of key information of bidding project based on semantic analysis
CN106250456A (en) * 2016-07-28 2016-12-21 浪潮软件集团有限公司 Bid winning announcement extraction method and device
CN107590236A (en) * 2017-09-09 2018-01-16 杭州数立方征信有限公司 A kind of big data acquisition method and system towards enterprise in charge of construction
CN108563729A (en) * 2018-04-04 2018-09-21 福州大学 A kind of bidding website acceptance of the bid information extraction method based on dom tree
CN109597927A (en) * 2018-12-05 2019-04-09 贵阳高新数通信息有限公司 Bidding related web page page info extracting method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915334A (en) * 2015-05-29 2015-09-16 浪潮软件集团有限公司 Automatic extraction method of key information of bidding project based on semantic analysis
CN106250456A (en) * 2016-07-28 2016-12-21 浪潮软件集团有限公司 Bid winning announcement extraction method and device
CN107590236A (en) * 2017-09-09 2018-01-16 杭州数立方征信有限公司 A kind of big data acquisition method and system towards enterprise in charge of construction
CN108563729A (en) * 2018-04-04 2018-09-21 福州大学 A kind of bidding website acceptance of the bid information extraction method based on dom tree
CN109597927A (en) * 2018-12-05 2019-04-09 贵阳高新数通信息有限公司 Bidding related web page page info extracting method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506795A (en) * 2020-04-20 2020-08-07 北京中电普华信息技术有限公司 Bidding information acquisition method and device
CN111506795B (en) * 2020-04-20 2023-09-15 北京中电普华信息技术有限公司 Method and device for acquiring bid information
CN112100235A (en) * 2020-08-13 2020-12-18 北京理工大学 Information Communication Technology (ICT) supply chain relation portrait based on public data source
CN113704667A (en) * 2021-08-31 2021-11-26 北京百炼智能科技有限公司 Automatic extraction processing method and device for bidding announcement

Similar Documents

Publication Publication Date Title
CN102611785B (en) Personalized active news recommending service system and method for mobile phone user
US7668812B1 (en) Filtering search results using annotations
CN103294781B (en) A kind of method and apparatus for processing page data
CN110502680A (en) A kind of abstracting method and device of acceptance of the bid bulletin relevant field
CN107341183A (en) A kind of Website classification method based on darknet website comprehensive characteristics
CN103049440A (en) Recommendation processing method and processing system for related articles
CN106033415A (en) A text content recommendation method and device
CN103605715A (en) Method and device used for data integration processing of multiple data sources
CN105653547A (en) Method and device for extracting keywords of text
KR20110019131A (en) Apparatus and method for searching information using social relation
Jiang et al. HyOASAM: A hybrid open API selection approach for mashup development
WO2017000659A1 (en) Enriched uniform resource locator (url) identification method and apparatus
Samantaray et al. Fake news detection using text similarity approach
Borrero et al. Crawling big data in a new frontier for socioeconomic research: Testing with social tagging
Sabou et al. Towards improving web service repositories through semantic web techniques
CN113836395A (en) Heterogeneous information network-based service developer on-demand recommendation method and system
Zuze The crossover point between keyword rich website text and spamdexing
CN103699602B (en) A kind of method and apparatus for setting up model essay webpage database
Malone et al. Guidelines for URBS routing parameters
Cirovic Comparative analysis of SEO factors across and within distinct industries—ecommerce, hospitality, telecommunications
Andersson et al. Ranking factors to increase your positionon the search engine result page: Theoretical and practical examples
Aul et al. Towards experience management for search engine optimisation
Nandakumar et al. Investigation of the use of Australian water availability project rainfall data for the development of eWater source models for water planning in NSW
CN109948097A (en) The method, apparatus and storage medium of recommendation
Choi et al. Load balanced wavelength routing algorithm for the layered-graph model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191126

RJ01 Rejection of invention patent application after publication