CN105488167B - Index database updating method and device - Google Patents

Index database updating method and device Download PDF

Info

Publication number
CN105488167B
CN105488167B CN201510857602.3A CN201510857602A CN105488167B CN 105488167 B CN105488167 B CN 105488167B CN 201510857602 A CN201510857602 A CN 201510857602A CN 105488167 B CN105488167 B CN 105488167B
Authority
CN
China
Prior art keywords
data
deleted
module
obtaining
internet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510857602.3A
Other languages
Chinese (zh)
Other versions
CN105488167A (en
Inventor
虞航仲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Internet Security Software Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN201510857602.3A priority Critical patent/CN105488167B/en
Publication of CN105488167A publication Critical patent/CN105488167A/en
Application granted granted Critical
Publication of CN105488167B publication Critical patent/CN105488167B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses an index database updating method and device, which is characterized in that an index database for storing data of a target data type is pre-established; the method comprises the following steps: obtaining first data of the target data type from the Internet; determining data to be deleted in the first data and the second data according to the effective duration of the data of the target data type, wherein the second data is the data of the target data type stored in the index database; deleting data to be deleted in the first data and the second data; and adding the data which are not to be deleted in the obtained first data into the index database. By applying the embodiment of the invention, the data with the time length exceeding the effective time length is deleted, the occupation of the storage space is reduced, the capacity expansion of the storage space is not required, and the cost is further saved.

Description

Index database updating method and device
Technical Field
The present invention relates to the field of index technologies, and in particular, to an index base updating method and apparatus.
Background
The Indexing Service is a system Service (Indexing Service) that reads the entire document using a document filter and extracts the document and attributes for delivery to an indexer, a process called "Indexing". The indexing service may extract and organize information from a set of documents, which may include text (content) in the documents, features and parameters (attributes) of the documents, for quick and easy access to the information through a Windows search function, an indexing service lookup table, or a Web browser. The information can be accessed quickly through the index.
At present, the method for updating the index library includes: and adding the data acquired each time into the index database.
However, the above method for updating the index library occupies a storage space of the system, and requires operation and maintenance personnel to expand the storage space of the system at intervals.
disclosure of Invention
the embodiment of the invention aims to provide an index library updating method and device so as to reduce the occupation of storage space.
In order to achieve the above object, the embodiment of the present invention discloses an index database updating method, which pre-establishes an index database for storing data of a target data type; the method comprises the following steps:
Obtaining first data of the target data type from the Internet;
Determining data to be deleted in the first data and the second data according to the effective duration of the data of the target data type, wherein the second data is the data of the target data type stored in the index database;
deleting data to be deleted in the first data and the second data;
and adding the data which are not to be deleted in the obtained first data into the index database.
Optionally, the obtaining the first data of the target data type from the internet includes:
and obtaining the first data from the Internet by using a crawler technology.
optionally, the obtaining the first data of the target data type from the internet includes:
and acquiring the first data from the Internet at preset time intervals or at preset time points.
optionally, the obtaining the first data of the target data type from the internet includes:
and obtaining first data of the target data type which does not exceed the effective time length from the Internet.
optionally, the determining, according to the effective duration of the data of the target data type, data to be deleted in the first data and the second data includes:
determining data corresponding to the effective duration as data to be deleted in the first data, wherein the difference between the current time and the data creation time is not less than the effective duration;
And determining data corresponding to the effective duration as the data to be deleted in the second data, wherein the difference between the current time and the data creation time is not less than the effective duration.
in order to achieve the above object, an embodiment of the present invention discloses an index repository updating apparatus, including: an establishing module, an obtaining module, a determining module, a deleting module and an updating module, wherein,
The establishing module is used for establishing an index database for storing data of the target data type in advance;
the obtaining module is used for obtaining first data of the target data type from the Internet;
The determining module is configured to determine data to be deleted in the first data and the second data according to an effective duration of the data of the target data type, where the second data is the data of the target data type stored in the index database;
the deleting module is used for deleting the data to be deleted determined by the determining module;
The updating module is configured to add the data that is not to be deleted in the first data obtained by the obtaining module to the index library established by the establishing module.
optionally, the obtaining module is specifically configured to:
And obtaining the first data from the Internet by using a crawler technology.
Optionally, the obtaining module is specifically configured to:
And acquiring the first data from the Internet at preset time intervals or at preset time points.
optionally, the obtaining module is specifically configured to:
and obtaining first data of the target data type which does not exceed the effective time length from the Internet.
Optionally, the determining module is specifically configured to:
determining data corresponding to the effective duration as data to be deleted in the first data, wherein the difference between the current time and the data creation time is not less than the effective duration;
And determining data corresponding to the effective duration as the data to be deleted in the second data, wherein the difference between the current time and the data creation time is not less than the effective duration.
As can be seen from the foregoing technical solutions, embodiments of the present invention provide a method and an apparatus for updating an index base, where an index base for storing data of a target data type is pre-established; the method comprises the following steps: obtaining first data of the target data type from the Internet; determining data to be deleted in the first data and the second data according to the effective duration of the data of the target data type, wherein the second data is the data of the target data type stored in the index database; deleting data to be deleted in the first data and the second data; and adding the data which are not to be deleted in the obtained first data into the index database.
By applying the technical scheme provided by the embodiment of the invention, the data with the time length exceeding the effective time length is deleted, the occupation of the storage space is reduced, the capacity expansion of the storage space is not needed, and the cost is further saved.
of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of an index repository updating method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an index repository updating apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
in order to solve the problem of the prior art, the embodiment of the invention provides an index base updating method and device. First, a method for updating an index library according to an embodiment of the present invention is described below.
it should be noted that the embodiment of the present invention is preferably applied to an electronic device, and in practical applications, the electronic device may be a server, which is not limited in this respect.
Updating the index database, wherein the index database for storing the data of the target data type needs to be established in advance; in practical application, the target data type may be a text, a picture, a video, an audio, a web page, and the like, and the target data type is not limited in the present invention; and if and only once the index base is established, when the index base is updated, the index base does not need to be reestablished, and the updating operation of the index base is completely directed to the index base.
Fig. 1 is a schematic flowchart of an index repository updating method according to an embodiment of the present invention, which may include:
S101: obtaining first data of a target data type from the Internet;
S102: determining data to be deleted in the first data and the second data according to the effective duration of the data of the target data type;
The second data is data of a target data type stored in the index database;
s103: deleting data to be deleted in the first data and the second data;
S104: and adding the data which are not to be deleted in the obtained first data into an index library.
Specifically, in practical applications, the first data of the target data type is obtained from the internet by using a crawler technology, that is, the data of the target data type is captured from the internet by using a capture program for the data of the target data type. The data capture from the internet by using the crawler technology is the prior art, and the embodiment of the invention is not described herein in detail.
In practical applications, the first data of the target data type may be obtained from the internet at preset time intervals, for example: obtaining first data of a target data type from the internet every 1 hour or 1 day; first data of the target data type may also be obtained from the internet each time the preset time point is reached, for example: the preset time points are 8:00, 11:00, 13:00 and 17:00 every day, and the first data of the target data type is obtained from the Internet every time the preset time points are reached.
for example, assume that the target data type is news. The validity duration of the news data is preset to be 1 day.
assuming that the currently stored data of the news types in the index library are news x, news y and news z respectively, and the captured data of the news types are news a, news b and news c respectively, judging whether the data is to-be-deleted data or not according to the effective duration of the news data aiming at each news data in the news x, the news y, the news z, the news a, the news b and the news c respectively.
supposing that the data to be deleted in the currently stored data is determined to be news y; and deleting the news y, the news a and the news c if the data to be deleted in the captured data are the news a and the news c.
And adding data which are not to be deleted in the captured data into the index library.
In practical application, each data has a data creation time attribute, so that data to be deleted in the first data and the second data is determined according to the effective duration of the data of the target data type, and the data corresponding to the effective duration, in the first data, in which the difference between the current time and the data creation time is not less than the effective duration, can be determined as the data to be deleted in the first data; and determining data corresponding to the effective duration as the data to be deleted in the second data, wherein the difference between the current time and the data creation time is not less than the effective duration.
for example, taking the above-mentioned news a as an example, suppose that the creation time of the news a is 2015, 10 months, 21 days, 18: 00; and the current time is 2015, 10 months, 25 days and 13:00, and the difference between the current time and the creation time of the news a is more than 1 day, determining the news a as the data to be deleted. And then all the data to be deleted in the first data and the second data can be determined.
Specifically, in practical applications, each data has a data creation time attribute, so that the first data of the target data type is obtained from the internet, and the first data of the target data type that does not exceed the valid duration can be obtained from the internet. When data is obtained from the internet, whether the creation time of the data exceeds 1 day to the current time is judged, if yes, the data is discarded, and only the data of which the creation time does not exceed 1 day to the current time is obtained, namely the data of which the effective time does not exceed.
Taking news a, news b, and news c as examples, suppose the current time is 2015, 10 months, 25 days, 13: 00; news a was created at 18:00, 10 months, 21 days 2015; news b was created at 2015, 10, 24, 17: 35; news c was created at 2015, 10, 22, 8: 52; it can be determined that the time from the creation of news a and news c to the current time exceeds 1 day, and the time from the creation of news b to the current time does not exceed 1 day, and then only news b is obtained. Because the obtained data does not exceed the effective time, when determining the data to be deleted, only the data to be deleted in the data stored in the index database needs to be determined.
it should be noted that, the target data types are news, news x, news y, news z, news a, news b, and news c, which are used as examples for the description above, and are only specific examples of the present invention, and the present invention is not limited thereto.
By applying the embodiment shown in fig. 1 of the present invention, data exceeding the effective duration is deleted, the occupation of the storage space is reduced, the capacity expansion of the storage space is not required, and the cost is further saved.
Corresponding to the above method embodiment, the embodiment of the present invention further provides an index repository updating apparatus.
Fig. 2 is a schematic structural diagram of an index repository updating apparatus according to an embodiment of the present invention, which may include: an establishing module 201, an obtaining module 202, a determining module 203, a deleting module 204, and an updating module 205, wherein,
an establishing module 201, configured to establish an index library for storing data of a target data type in advance;
An obtaining module 202, configured to obtain first data of the target data type from the internet;
in practical applications, the obtaining module 202 shown in the embodiment of the present invention may be specifically configured to:
and obtaining the first data from the Internet by using a crawler technology.
In practical applications, the obtaining module 202 shown in the embodiment of the present invention may be specifically configured to:
And acquiring the first data from the Internet at preset time intervals or at preset time points.
In practical applications, the obtaining module 202 shown in the embodiment of the present invention may be specifically configured to:
And obtaining first data of the target data type which does not exceed the effective time length from the Internet.
a determining module 203, configured to determine, according to an effective duration of the data of the target data type, data to be deleted in the first data and the second data, where the second data is the data of the target data type stored in the index database;
In practical applications, the determining module 203 shown in the embodiment of the present invention may be specifically configured to:
Determining data corresponding to the effective duration as data to be deleted in the first data, wherein the difference between the current time and the data creation time is not less than the effective duration;
and determining data corresponding to the effective duration as the data to be deleted in the second data, wherein the difference between the current time and the data creation time is not less than the effective duration.
a deleting module 204, configured to delete the data to be deleted determined by the determining module 203;
The updating module 205 is configured to add the data that is not to be deleted in the first data obtained by the obtaining module 202 to the index base established by the establishing module 201.
By applying the embodiment shown in fig. 2 of the present invention, the data exceeding the effective duration is deleted, the occupation of the storage space is reduced, the capacity expansion of the storage space is not required, and the cost is further saved.
it is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those skilled in the art will appreciate that all or part of the steps in the above method embodiments may be implemented by a program to instruct relevant hardware to perform the steps, and the program may be stored in a computer-readable storage medium, which is referred to herein as a storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.
the above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (6)

1. an index database updating method is characterized in that an index database for storing data of a target data type is established in advance; the method comprises the following steps:
obtaining first data of the target data type which does not exceed the effective time length from the Internet;
Determining data to be deleted in the first data and the second data according to the effective duration of the data of the target data type, wherein the second data is the data of the target data type stored in the index database, and the method comprises the following steps: determining data corresponding to the effective duration as data to be deleted in the first data, wherein the difference between the current time and the data creation time is not less than the effective duration; determining data corresponding to the effective duration as data to be deleted in the second data, wherein the difference between the current time and the data creation time is not less than the effective duration;
Deleting data to be deleted in the first data and the second data;
and adding the data which are not to be deleted in the obtained first data into the index database.
2. the method of claim 1, wherein obtaining the first data of the target data type from the internet comprises:
And obtaining the first data from the Internet by using a crawler technology.
3. The method of claim 1, wherein obtaining the first data of the target data type from the internet comprises:
And acquiring the first data from the Internet at preset time intervals or at preset time points.
4. An index repository updating apparatus, comprising: an establishing module, an obtaining module, a determining module, a deleting module and an updating module, wherein,
the establishing module is used for establishing an index database for storing data of the target data type in advance;
the obtaining module is used for obtaining first data of the target data type, the valid duration of which is not exceeded, from the Internet;
the determining module is configured to determine, according to the effective duration of the data of the target data type, data to be deleted in the first data and the second data, where the second data is the data of the target data type stored in the index database, and specifically configured to: determining data corresponding to the effective duration as data to be deleted in the first data, wherein the difference between the current time and the data creation time is not less than the effective duration; determining data corresponding to the effective duration as data to be deleted in the second data, wherein the difference between the current time and the data creation time is not less than the effective duration;
The deleting module is used for deleting the data to be deleted determined by the determining module;
The updating module is configured to add the data that is not to be deleted in the first data obtained by the obtaining module to the index library established by the establishing module.
5. the apparatus according to claim 4, wherein the obtaining module is specifically configured to:
And obtaining the first data from the Internet by using a crawler technology.
6. the apparatus according to claim 4, wherein the obtaining module is specifically configured to:
and acquiring the first data from the Internet at preset time intervals or at preset time points.
CN201510857602.3A 2015-11-30 2015-11-30 Index database updating method and device Active CN105488167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510857602.3A CN105488167B (en) 2015-11-30 2015-11-30 Index database updating method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510857602.3A CN105488167B (en) 2015-11-30 2015-11-30 Index database updating method and device

Publications (2)

Publication Number Publication Date
CN105488167A CN105488167A (en) 2016-04-13
CN105488167B true CN105488167B (en) 2019-12-13

Family

ID=55675141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510857602.3A Active CN105488167B (en) 2015-11-30 2015-11-30 Index database updating method and device

Country Status (1)

Country Link
CN (1) CN105488167B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446101A (en) * 2016-09-13 2017-02-22 郑州云海信息技术有限公司 Data management system
US10956453B2 (en) 2017-05-24 2021-03-23 International Business Machines Corporation Method to estimate the deletability of data objects

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216821A (en) * 2007-01-05 2008-07-09 中兴通讯股份有限公司 Data acquisition system storage management method
CN101556589A (en) * 2008-04-09 2009-10-14 北京闻言科技有限公司 Method for Oracle regularly deleting stale data in database
CN103530349A (en) * 2013-09-30 2014-01-22 乐视致新电子科技(天津)有限公司 Method and equipment for cache updating
CN104572920A (en) * 2014-12-27 2015-04-29 北京奇虎科技有限公司 Data arrangement method and data arrangement device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750376A (en) * 2012-06-25 2012-10-24 天津神舟通用数据技术有限公司 Multi-version database storage engine system and related processing implementation method thereof
US9122786B2 (en) * 2012-09-14 2015-09-01 Software Ag Systems and/or methods for statistical online analysis of large and potentially heterogeneous data sets
US20140201192A1 (en) * 2013-01-15 2014-07-17 Syscom Computer Engineering Co. Automatic data index establishment method
CN103997753B (en) * 2014-06-03 2017-11-07 杭州东信网络技术有限公司 The method that compartment adds collection mobile communication wireless network performance data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216821A (en) * 2007-01-05 2008-07-09 中兴通讯股份有限公司 Data acquisition system storage management method
CN101556589A (en) * 2008-04-09 2009-10-14 北京闻言科技有限公司 Method for Oracle regularly deleting stale data in database
CN103530349A (en) * 2013-09-30 2014-01-22 乐视致新电子科技(天津)有限公司 Method and equipment for cache updating
CN104572920A (en) * 2014-12-27 2015-04-29 北京奇虎科技有限公司 Data arrangement method and data arrangement device

Also Published As

Publication number Publication date
CN105488167A (en) 2016-04-13

Similar Documents

Publication Publication Date Title
CN105608117B (en) Information recommendation method and device
US8332763B2 (en) Aggregating dynamic visual content
CN102693305B (en) A kind of fileinfo method for previewing and system
CN106874481B (en) Method and system for reading metadata information of distributed file system
US20070005652A1 (en) Apparatus and method for gathering of objectional web sites
US9619546B2 (en) Synchronization adapter for synchronizing data to applications that do not directly support synchronization
CN107016123B (en) File management method and terminal device
US20140075301A1 (en) Information processing apparatus, control method, and recording medium
CN106777179B (en) Document online preview method and system
CN110781372B (en) Method and device for optimizing website, computer equipment and storage medium
CN107291768B (en) Index establishing method and device
CN105488167B (en) Index database updating method and device
CN104462096A (en) Public opinion monitoring and analysis method and device
CN108415748B (en) Information display method and system, computer storage medium and device
CN105824827A (en) File path storage and local file visiting method and apparatus
CN103744875A (en) Data rapid transferring method and system based on file system
CN105426128A (en) Index maintenance method and device
CN107193870B (en) Webpage content extraction method and system
CN105893640B (en) Favorite merging method and device
CN112416934A (en) hive table incremental data synchronization method and device, computer equipment and storage medium
CN104850386A (en) Software system internationalization resource processing method
CN105488166A (en) Index establishing method and device
CN110543627A (en) Method and system for storing report configuration information
CN110888847A (en) Recycle bin system and file recycling method
KR20180021152A (en) Information push method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant