CN105488167A - Index database updating method and device - Google Patents

Index database updating method and device Download PDF

Info

Publication number
CN105488167A
CN105488167A CN201510857602.3A CN201510857602A CN105488167A CN 105488167 A CN105488167 A CN 105488167A CN 201510857602 A CN201510857602 A CN 201510857602A CN 105488167 A CN105488167 A CN 105488167A
Authority
CN
China
Prior art keywords
data
deleted
internet
index database
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510857602.3A
Other languages
Chinese (zh)
Other versions
CN105488167B (en
Inventor
虞航仲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Internet Security Software Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN201510857602.3A priority Critical patent/CN105488167B/en
Publication of CN105488167A publication Critical patent/CN105488167A/en
Application granted granted Critical
Publication of CN105488167B publication Critical patent/CN105488167B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses an index database updating method and device, which is characterized in that an index database for storing data of a target data type is pre-established; the method comprises the following steps: obtaining first data of the target data type from the Internet; determining data to be deleted in the first data and the second data according to the effective duration of the data of the target data type, wherein the second data is the data of the target data type stored in the index database; deleting data to be deleted in the first data and the second data; and adding the data which are not to be deleted in the obtained first data into the index database. By applying the embodiment of the invention, the data with the time length exceeding the effective time length is deleted, the occupation of the storage space is reduced, the capacity expansion of the storage space is not required, and the cost is further saved.

Description

A kind of index database update method and device
Technical field
The present invention relates to index technology field, particularly a kind of index database update method and device.
Background technology
Index service is a system service (IndexingService), uses document screening washer to read whole document, and extracts document and attribute passes to concordance program, and this process is called " index ".Index service can be extracted and organizational information from one group of document, easily to access this information fast by Windows function of search, index service question blank or Web browser, this information can comprise the characteristic sum parameter (attribute) of text (content) in document, document.Fast access can be carried out to information by index.
At present, the method upgrading index database is: the data at every turn got be added in index database.
But, apply above-mentioned method and upgrade index database, comparatively take system memory space, need operation maintenance personnel to carry out dilatation to system memory space at set intervals.
Summary of the invention
The object of the embodiment of the present invention is to provide a kind of index database update method and device, to reduce taking of storage space.
For achieving the above object, the embodiment of the invention discloses a kind of index database update method, setting up the index database of the data for storing target data type in advance; Method comprises:
The first data of described target data type are obtained from internet;
According to effective duration of the data of described target data type, determine data to be deleted in described first data and the second data, wherein, described second data are the data of the described target data type stored in described index database;
Delete the data to be deleted in described first data and described second data;
By the data non-to be deleted in the first obtained data, be added in described index database.
Optionally, described the first data obtaining described target data type from internet, comprising:
Utilize crawler technology, from internet, obtain described first data.
Optionally, described the first data obtaining described target data type from internet, comprising:
Every Preset Time or often reach Preset Time point, from internet, obtain described first data.
Optionally, described the first data obtaining described target data type from internet, comprising:
The first data of the described target data type not exceeding described effective duration are obtained from internet.
Optionally, effective duration of the described data according to described target data type, determine data to be deleted in described first data and the second data, comprising:
By in described first data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described first data;
By in described second data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described second data.
For achieving the above object, the embodiment of the invention discloses a kind of index database updating device, comprising: set up module, obtain module, determination module, removing module and update module, wherein,
Describedly set up module, for setting up the index database of the data for storing target data type in advance;
Described acquisition module, for obtaining the first data of described target data type from internet;
Described determination module, for effective duration of the data according to described target data type, determines data to be deleted in described first data and the second data, and wherein, described second data are the data of the described target data type stored in described index database;
Described removing module, for deleting the data to be deleted that described determination module is determined;
Described update module, for by the data non-to be deleted in the first data of described acquisition module acquisition, is added on described foundation in the index database of module foundation.
Optionally, described acquisition module, specifically for:
Utilize crawler technology, from internet, obtain described first data.
Optionally, described acquisition module, specifically for:
Every Preset Time or often reach Preset Time point, from internet, obtain described first data.
Optionally, described acquisition module, specifically for:
The first data of the described target data type not exceeding described effective duration are obtained from internet.
Optionally, described determination module, specifically for:
By in described first data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described first data;
By in described second data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described second data.
As seen from the above technical solutions, embodiments providing a kind of index database update method and device, setting up the index database of the data for storing target data type in advance; Method comprises: the first data obtaining described target data type from internet; According to effective duration of the data of described target data type, determine data to be deleted in described first data and the second data, wherein, described second data are the data of the described target data type stored in described index database; Delete the data to be deleted in described first data and described second data; By the data non-to be deleted in the first obtained data, be added in described index database.
The data exceeding effective duration are deleted, are decreased taking of storage space by the technical scheme that the application embodiment of the present invention provides, and without the need to carrying out dilatation to storage space, and then provide cost savings.
Certainly, arbitrary product of the present invention is implemented or method must not necessarily need to reach above-described all advantages simultaneously.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The schematic flow sheet of a kind of index database update method that Fig. 1 provides for the embodiment of the present invention;
The structural representation of a kind of index database updating device that Fig. 2 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
In order to solve prior art problem, embodiments provide a kind of index database update method and device.First a kind of index database update method that the embodiment of the present invention provides is introduced below.
It should be noted that, embodiments of the invention are preferably applicable to electronic equipment, and in actual applications, this electronic equipment can be server, and the present invention is not construed as limiting this.
Index database being upgraded, needing the index database of the data set up in advance for storing target data type; Wherein, in actual applications, target data type can be text, picture, video, audio frequency, webpage etc., and the present invention does not limit target data type; And and if only if is established once for index database, when upgrading index database, without the need to re-establishing index database, to the renewal rewards theory of index database all for this index database.
The schematic flow sheet of a kind of index database update method that Fig. 1 provides for the embodiment of the present invention, can comprise:
S101: the first data obtaining target data type from internet;
S102: according to effective duration of the data of target data type, determines data to be deleted in the first data and the second data;
Wherein, the second data are the data of the target data type stored in index database;
S103: delete the data to be deleted in the first data and the second data;
S104: by the data non-to be deleted in the first obtained data, be added in index database.
Concrete, in actual applications, the first data of target data type are obtained from internet, crawler technology can be utilized, the first data of target data type are obtained from internet, namely utilize the capture program of the data for target data type, from internet, capture the data of target data type.Wherein, utilize crawler technology, from internet, capture data is prior art, and the embodiment of the present invention does not repeat it at this.
In actual applications, every Preset Time, the first data of target data type can be obtained from internet, such as: the first data obtaining target data type every 1 hour or 1 day from internet; Also often can reach Preset Time point, from internet, obtain the first data of target data type, such as: Preset Time point is the 8:00 of every day, 11:00,13:00,17:00, when then often arriving above-mentioned time point, from internet, obtain the first data of target data type.
Exemplary, hypothetical target data type is news.The effective duration presetting news data is 1 day.
Suppose that the data of the news type of current storage in index database are respectively news x, news y and news z, the data of the news type grabbed are respectively news a, news b, news c, then for each news data in news x, news y, news z, news a, news b and news c, respectively according to effective duration of news data, judge whether it is data to be deleted.
Suppose to determine that in the data of current storage, data to be deleted are news y; In the data grabbed, data to be deleted are news a and news c, then news y, news a and news c are deleted.
By data to be deleted non-in grabbed data, be added in index database.
In actual applications, each data all have data creation time attribute, therefore according to effective duration of the data of described target data type, determine data to be deleted in described first data and the second data, can by described first data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described first data; By in described second data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described second data.
Exemplary, for above-mentioned news a, suppose that the creation-time of news a is 18:00 on October 21st, 2015; Current time is 13:00 on October 25th, 2015, and the difference of the creation-time of current time and news a is greater than 1 day, then news a is defined as data to be deleted.And then data all to be deleted in the first data and the second data can be determined.
Concrete, in actual applications, each data all have data creation time attribute, from internet, therefore obtain the first data of described target data type, can obtain the first data of the described target data type not exceeding described effective duration from internet.When namely obtaining data from internet, whether first judge the creation-time of these data to current time more than 1 day, if exceeded, then abandon these data, only obtain creation-time to the data of current time more than 1 day, namely obtain the data not exceeding effective duration.
For above-mentioned news a, news b and news c, suppose that current time is 13:00 on October 25th, 2015; The creation-time of news a is 18:00 on October 21st, 2015; The creation-time of news b is 17:35 on October 24th, 2015; The creation-time of news c is 8:52 on October 22nd, 2015; Can judge the creation-time of news a and news c to current time more than 1 day, the creation-time of news b more than 1 day, then only obtains news b to current time.Because the data obtained all do not exceed effective duration, therefore when determining data to be deleted, only need data to be deleted in the data determining to store in index database.
It should be noted that, above-mentioned take target data type as news, and news x, news y, news z, news a, news b and news c are that example is described, and are only an instantiation of the present invention, and paired not restriction of the present invention.
The present invention is embodiment illustrated in fig. 1 in application, the data exceeding effective duration is deleted, decreases taking of storage space, and without the need to carrying out dilatation to storage space, and then provide cost savings.
Corresponding with above-mentioned embodiment of the method, the embodiment of the present invention also provides a kind of index database updating device.
The structural representation of a kind of index database updating device that Fig. 2 provides for the embodiment of the present invention, can comprise: set up module 201, obtain module 202, determination module 203, removing module 204 and update module 205, wherein,
Set up module 201, for setting up the index database of the data for storing target data type in advance;
Obtain module 202, for obtaining the first data of described target data type from internet;
In actual applications, the acquisition module 202 shown in the embodiment of the present invention, specifically may be used for:
Utilize crawler technology, from internet, obtain described first data.
In actual applications, the acquisition module 202 shown in the embodiment of the present invention, specifically may be used for:
Every Preset Time or often reach Preset Time point, from internet, obtain described first data.
In actual applications, the acquisition module 202 shown in the embodiment of the present invention, specifically may be used for:
The first data of the described target data type not exceeding described effective duration are obtained from internet.
Determination module 203, for effective duration of the data according to described target data type, determines data to be deleted in described first data and the second data, and wherein, described second data are the data of the described target data type stored in described index database;
In actual applications, the determination module 203 shown in the embodiment of the present invention, specifically may be used for:
By in described first data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described first data;
By in described second data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described second data.
Removing module 204, for deleting the data to be deleted that determination module 203 is determined;
Update module 205, for the data non-to be deleted that will obtain in the first data of obtaining of module 202, is added on and sets up in index database that module 201 sets up.
The present invention is embodiment illustrated in fig. 2 in application, the data exceeding effective duration is deleted, decreases taking of storage space, and without the need to carrying out dilatation to storage space, and then provide cost savings.
It should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.
Each embodiment in this instructions all adopts relevant mode to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
One of ordinary skill in the art will appreciate that all or part of step realized in said method embodiment is that the hardware that can carry out instruction relevant by program has come, described program can be stored in computer read/write memory medium, here the alleged storage medium obtained, as: ROM/RAM, magnetic disc, CD etc.
The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims (10)

1. an index database update method, is characterized in that, sets up the index database of the data for storing target data type in advance; Method comprises:
The first data of described target data type are obtained from internet;
According to effective duration of the data of described target data type, determine data to be deleted in described first data and the second data, wherein, described second data are the data of the described target data type stored in described index database;
Delete the data to be deleted in described first data and described second data;
By the data non-to be deleted in the first obtained data, be added in described index database.
2. method according to claim 1, is characterized in that, described the first data obtaining described target data type from internet, comprising:
Utilize crawler technology, from internet, obtain described first data.
3. method according to claim 1, is characterized in that, described the first data obtaining described target data type from internet, comprising:
Every Preset Time or often reach Preset Time point, from internet, obtain described first data.
4. method according to claim 3, is characterized in that, described the first data obtaining described target data type from internet, comprising:
The first data of the described target data type not exceeding described effective duration are obtained from internet.
5. method according to claim 1, is characterized in that, effective duration of the described data according to described target data type, determines data to be deleted in described first data and the second data, comprising:
By in described first data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described first data;
By in described second data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described second data.
6. an index database updating device, is characterized in that, comprising: set up module, obtain module, determination module, removing module and update module, wherein,
Describedly set up module, for setting up the index database of the data for storing target data type in advance;
Described acquisition module, for obtaining the first data of described target data type from internet;
Described determination module, for effective duration of the data according to described target data type, determines data to be deleted in described first data and the second data, and wherein, described second data are the data of the described target data type stored in described index database;
Described removing module, for deleting the data to be deleted that described determination module is determined;
Described update module, for by the data non-to be deleted in the first data of described acquisition module acquisition, is added on described foundation in the index database of module foundation.
7. device according to claim 6, is characterized in that, described acquisition module, specifically for:
Utilize crawler technology, from internet, obtain described first data.
8. device according to claim 6, is characterized in that, described acquisition module, specifically for:
Every Preset Time or often reach Preset Time point, from internet, obtain described first data.
9. device according to claim 6, is characterized in that, described acquisition module, specifically for:
The first data of the described target data type not exceeding described effective duration are obtained from internet.
10. device according to claim 6, is characterized in that, described determination module, specifically for:
By in described first data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described first data;
By in described second data, current time is not less than data corresponding to described effective duration with the difference of data creation time, is defined as data to be deleted in described second data.
CN201510857602.3A 2015-11-30 2015-11-30 Index database updating method and device Active CN105488167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510857602.3A CN105488167B (en) 2015-11-30 2015-11-30 Index database updating method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510857602.3A CN105488167B (en) 2015-11-30 2015-11-30 Index database updating method and device

Publications (2)

Publication Number Publication Date
CN105488167A true CN105488167A (en) 2016-04-13
CN105488167B CN105488167B (en) 2019-12-13

Family

ID=55675141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510857602.3A Active CN105488167B (en) 2015-11-30 2015-11-30 Index database updating method and device

Country Status (1)

Country Link
CN (1) CN105488167B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446101A (en) * 2016-09-13 2017-02-22 郑州云海信息技术有限公司 Data management system
WO2018215912A1 (en) * 2017-05-24 2018-11-29 International Business Machines Corporation A method to estimate the deletability of data objects

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216821A (en) * 2007-01-05 2008-07-09 中兴通讯股份有限公司 Data acquisition system storage management method
CN101556589A (en) * 2008-04-09 2009-10-14 北京闻言科技有限公司 Method for Oracle regularly deleting stale data in database
CN102750376A (en) * 2012-06-25 2012-10-24 天津神舟通用数据技术有限公司 Multi-version database storage engine system and related processing implementation method thereof
CN103530349A (en) * 2013-09-30 2014-01-22 乐视致新电子科技(天津)有限公司 Method and equipment for cache updating
CN103678459A (en) * 2012-09-14 2014-03-26 德商赛克公司 Systems and/or methods for statistical online analysis of large and potentially heterogeneous data sets
US20140201192A1 (en) * 2013-01-15 2014-07-17 Syscom Computer Engineering Co. Automatic data index establishment method
CN103997753A (en) * 2014-06-03 2014-08-20 杭州东信网络技术有限公司 Method for adding and collecting mobile communication wireless network performance data alternately
CN104572920A (en) * 2014-12-27 2015-04-29 北京奇虎科技有限公司 Data arrangement method and data arrangement device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216821A (en) * 2007-01-05 2008-07-09 中兴通讯股份有限公司 Data acquisition system storage management method
CN101556589A (en) * 2008-04-09 2009-10-14 北京闻言科技有限公司 Method for Oracle regularly deleting stale data in database
CN102750376A (en) * 2012-06-25 2012-10-24 天津神舟通用数据技术有限公司 Multi-version database storage engine system and related processing implementation method thereof
CN103678459A (en) * 2012-09-14 2014-03-26 德商赛克公司 Systems and/or methods for statistical online analysis of large and potentially heterogeneous data sets
US20140201192A1 (en) * 2013-01-15 2014-07-17 Syscom Computer Engineering Co. Automatic data index establishment method
CN103530349A (en) * 2013-09-30 2014-01-22 乐视致新电子科技(天津)有限公司 Method and equipment for cache updating
CN103997753A (en) * 2014-06-03 2014-08-20 杭州东信网络技术有限公司 Method for adding and collecting mobile communication wireless network performance data alternately
CN104572920A (en) * 2014-12-27 2015-04-29 北京奇虎科技有限公司 Data arrangement method and data arrangement device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446101A (en) * 2016-09-13 2017-02-22 郑州云海信息技术有限公司 Data management system
WO2018215912A1 (en) * 2017-05-24 2018-11-29 International Business Machines Corporation A method to estimate the deletability of data objects
GB2576453A (en) * 2017-05-24 2020-02-19 Ibm A Method To Estimate The Deletability Of The Data Objects
US10956453B2 (en) 2017-05-24 2021-03-23 International Business Machines Corporation Method to estimate the deletability of data objects

Also Published As

Publication number Publication date
CN105488167B (en) 2019-12-13

Similar Documents

Publication Publication Date Title
CN102129425B (en) The access method of big object set table and device in data warehouse
CN104423960A (en) Continuous project integration method and continuous project integration system
CN109194711B (en) Synchronization method, client, server and medium for organization architecture
US8315978B2 (en) Synchronization adapter for synchronizing data to applications that do not directly support synchronization
CN104809025A (en) Method and device for enabling programs to be online
CN104142954A (en) Data sheet comparing and updating method and device based on frequentness partition
CN105260639A (en) Face recognition system data update method and device
CN104657435A (en) Storage management method for application data and network management system
CN104461505A (en) Terminal
CN113254445A (en) Real-time data storage method and device, computer equipment and storage medium
CN105488167A (en) Index database updating method and device
CN103023978A (en) Application property information synchronization method and system with application aggregation platform
CN105446824B (en) Table increment acquisition methods and long-distance data backup method
CN105426128A (en) Index maintenance method and device
CN103593345A (en) Webpage flow chart editing method and system
CN105653550A (en) Web page filtering method and device
CN104462462A (en) Service change frequency based data warehouse modeling method and device
CN112416934A (en) hive table incremental data synchronization method and device, computer equipment and storage medium
CN108205559B (en) Data management method and equipment thereof
CN105488166A (en) Index establishing method and device
CN104021203A (en) Method and device used for having access to webpage
CN108595262B (en) Data processing method and device
CN103809915A (en) Read-write method and device of magnetic disk files
CN106408199B (en) Information processing method and device
CN112235332B (en) Method and device for switching reading and writing of clusters

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant