CN101354718A - Method and apparatus for determining file bag resource identification information - Google Patents

Method and apparatus for determining file bag resource identification information Download PDF

Info

Publication number
CN101354718A
CN101354718A CNA200810134724XA CN200810134724A CN101354718A CN 101354718 A CN101354718 A CN 101354718A CN A200810134724X A CNA200810134724X A CN A200810134724XA CN 200810134724 A CN200810134724 A CN 200810134724A CN 101354718 A CN101354718 A CN 101354718A
Authority
CN
China
Prior art keywords
file
literature kit
information
identification information
name information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA200810134724XA
Other languages
Chinese (zh)
Other versions
CN101354718B (en
Inventor
陈晓东
张国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xunlei Network Technology Co Ltd
Original Assignee
Shenzhen Xunlei Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xunlei Network Technology Co Ltd filed Critical Shenzhen Xunlei Network Technology Co Ltd
Priority to CN200810134724XA priority Critical patent/CN101354718B/en
Publication of CN101354718A publication Critical patent/CN101354718A/en
Application granted granted Critical
Publication of CN101354718B publication Critical patent/CN101354718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for acquiring identification information of file package resources, comprising the following steps: attribute information of a candidate object is acquired, the candidate object is files and/or folders in a file package; if the attribute information of one candidate object meets the preset condition, the name information of the candidate object with the attribute information meeting the preset condition is the resource identification information of the file package. In the process of the acquisition of the resource identification information of the file package, the method simplifies the operation, reduces the resource occupation, lowers the exploration cost, and effectively guarantees the accuracy of the extraction of the identification information of the file package resources.

Description

Determine the method and the device of file bag resource identification information
Technical field
The present invention relates to a kind of file bag resource information processing technology, particularly a kind of method of definite file bag resource identification information and device, the generation method and the device of file bag resource database, and the searching method of file bag resource and device.
Background technology
Literature kit is a kind of packet that the file or folder packing is generated, and literature kit can make a plurality of file or folders be combined into a file.Usually, can also generate the compressed file bag by some compression algorithm compressed files or file.After overcompression, file size can reduce, and a plurality of files can be combined into a packet.The compressed file bag both can conveniently transmit, can reduce memory space again, so in compunication and internet arena, it all is that comparatively popular at present compressed file comprises rar form and zip form etc. with the form of compressed file bag transmission and mutual that a lot of resources are arranged.
Along with the fast development of network technology, the user often need be on network search file bag resource, general way of search is, the identification information of search key and file bag resource mated, to obtain corresponding file bag resource Search Results; So if obtain good search effect, it is very important that file bag resource is identified, and generally obtains the identification information of file bag resource at present in the following manner:
Step S1, be that the identical webpage of a class content of pages layout is provided with a template in advance;
Step S2, utilize the web crawlers program, apply mechanically this template is obtained corresponding file bag resource from such webpage sign.
For example, the page of describing each film in certain web film all adopts unified page structure to be: the text behind keyword " film title " is the title of film that this page is described, and the text after " protagonist " is the performer of film that this page is described; The process that then obtains file bag resource identification is, at this page structure a template is set, and utilizes text and record after web crawlers grasps described keyword, obtains the film title and the actor information of each film.
Obviously, in case file bag resource identification is not enough to or can not accurately reflect the flesh and blood of current file bag resource, then can cause the problem of the efficient reduction of Search Results; Yet,,, just can cause the definite inaccurate problem of file bag resource identification in case mistake is arranged slightly at the web page template setting of page structure because the page structure difference of each website is more; Moreover the realization of this method depends on the different web pages template that is provided with at different page structures, realizes very complicatedly, and need take more resource, the bigger cost of development of cost.
In a word, a technical matters that needs at present to solve is exactly: the mechanism of a kind of definite file bag resource identification information of proposition how can innovate, in order to improve the representativeness and the accuracy of the file bag resource identification that obtains, simplify the operation, reduce resource occupation and reduce cost of development.
Summary of the invention
Embodiment of the invention technical matters to be solved provides a kind of method and device of definite file bag resource identification information, in order to when obtaining file bag resource identification, simplifying the operation, to reduce resource occupation, to reduce cost of development, and effectively guarantee the representativeness and the accuracy of the file bag resource identification that obtains.
Another purpose of the embodiment of the invention provides a kind of generation method of file bag resource database and a kind of searching method of file bag resource, in order to effective raising file bag resource search efficiency and accuracy rate.
For solving the problems of the technologies described above, the embodiment of the invention provides a kind of method of definite file bag resource identification information, comprising:
Obtain the attribute information of candidate object, described candidate object is file and/or the file that a literature kit is comprised;
If it is pre-conditioned that the attribute information of one of them candidate object satisfies, the name information that then definite described attribute information satisfies pre-conditioned candidate target is the resource identification information of described literature kit.
Preferably, file that described literature kit comprised and/or file are:
File and/or file that the root directory layer of described literature kit is comprised.
Preferably, described attribute information is a document size information.
Preferably, describedly pre-conditionedly be:
The size of file or folder is maximum in the described literature kit; Or,
The size of file or folder accounts for the magnitude proportion of described literature kit more than or equal to a threshold value.
Preferably, described literature kit is the compressed file bag, and described document size information is the size information before file or folder is compressed.
Preferably, described attribute information is a name information.
Preferably, describedly pre-conditionedly be:
The name information of file or folder does not match with the invalid key that presets; And/or,
The name information of file or folder is complementary with the effective key word that presets.
Preferably, described candidate target also comprises described literature kit, describedly pre-conditionedly is:
The name information of described literature kit, or the name information of file or folder does not match with the invalid key that presets in the described literature kit; And/or,
The name information of described literature kit, or the name information of file or folder is complementary with the effective key word that presets in the described literature kit.
Preferably, described pre-conditionedly also comprise:
To be weighted with the name information that the effective key word that presets is complementary, extract the highest name information of weights.
Preferably, described method also comprises:
Replace the name information of described literature kit with described resource identification information.
The embodiment of the invention also discloses a kind of generation method of file bag resource database, comprising:
Obtain the URL and the content sig ID thereof of literature kit in the Webpage, described content sig ID obtains by the content-data calculating back of pre-defined algorithm to literature kit, and described pre-defined algorithm obtains different disposal result's algorithm for the content-data of handling different binary files;
Obtain the resource identification information of described literature kit, described resource identification information is the name information that attribute information satisfies a pre-conditioned candidate object, and described candidate object comprises file and/or the file that described literature kit comprises;
Write down described content sig ID and corresponding file bag URL and file bag resource identification information, form database.
The embodiment of the invention also discloses a kind of searching method of file bag resource, comprising:
Initialized data base, described database comprise content sig ID and corresponding file bag URL and file bag resource identification information, and described file bag resource URL obtains by the file bag resource that grasps in the Webpage; Described content sig ID obtains by the content-data calculating of pre-defined algorithm to literature kit, and described pre-defined algorithm obtains different disposal result's algorithm for the content-data of handling different binary files; Described file bag resource identification information is the name information that attribute information satisfies a pre-conditioned candidate object, and described candidate object comprises file and/or the file that described literature kit comprises;
Search key according to user's input mates corresponding file bag resource identification information in described database;
File bag resource identification information by coupling is searched corresponding file bag URL; Or, by searching corresponding content sig ID, and obtain corresponding literature kit URL according to described content sig ID.
Preferably, described method also comprises:
Data from described URL download file resource.
The embodiment of the invention also discloses a kind of device of definite file bag resource identification information, comprising:
The fileinfo acquiring unit is used to obtain the attribute information of candidate object, and described candidate object is file and/or the file that a literature kit is comprised;
Judging unit is used to judge whether the attribute information of described candidate object satisfies pre-conditioned;
Determining unit is used for therein the attribute information of a candidate target and satisfies when pre-conditioned, determines that the described name information that satisfies pre-conditioned candidate object is the resource identification information of described literature kit.
Preferably, file that described literature kit comprised and/or file are:
File and/or file that the root directory layer of described literature kit is comprised.
Preferably, described attribute information is a document size information.
Preferably, describedly pre-conditionedly be:
The size of file or folder is maximum in the described literature kit; Or
The size of file or folder accounts for the magnitude proportion of described literature kit more than or equal to a threshold value.
Preferably, described literature kit is the compressed file bag, and described document size information is the size information before file or folder is compressed.
Preferably, described attribute information is a name information.
Preferably, describedly pre-conditionedly be:
The file or folder name information does not match with the invalid key that presets; And/or
The file or folder name information is complementary with the effective key word that presets.
Preferably, described candidate target also comprises described literature kit, describedly pre-conditionedly is:
The name information of described literature kit, or the name information of file or folder does not match with the invalid key that presets in the described literature kit; And/or,
The name information of described literature kit, or the name information of file or folder is complementary with the effective key word that presets in the described literature kit.
Preferably, described pre-conditionedly also comprise:
To be weighted with the name information that the effective key word that presets is complementary, extract the highest name information of weights.
Preferably, described device also comprises:
The unit of renaming is used for replacing with described resource identification information the name information of described literature kit.
The embodiment of the invention also discloses a kind of generating apparatus of file bag resource database, comprising:
Resource URL placement unit is used for obtaining the URL of the literature kit of Webpage;
Content sig ID computing unit is used for obtaining by the content-data of pre-defined algorithm calculation document bag the content sig ID of described literature kit, and described pre-defined algorithm obtains different disposal result's algorithm for the content-data of handling different binary files;
The resource identification information acquiring unit, be used to obtain the identification information of described file bag resource, described identification information is the name information that attribute information satisfies a pre-conditioned candidate object, and described candidate object comprises file and/or the file that described literature kit comprises;
Record cell is used to write down described content sig ID and corresponding file bag resource URL and file bag resource identification information, forms database.
The embodiment of the invention also discloses a kind of searcher of file bag resource, comprising:
Database, described database comprise content sig ID and corresponding file bag resource URL and file bag resource identification information, and described file bag resource URL obtains by the file bag resource that grasps in the Webpage; Described content sig ID obtains by the content-data calculating of pre-defined algorithm to literature kit, and described pre-defined algorithm obtains different disposal result's algorithm for the content-data of handling different binary files; Described file bag resource identification information is the name information that attribute information satisfies a pre-conditioned candidate object, and described candidate object comprises file and/or the file that described literature kit comprises;
Matching unit is used for mating corresponding file bag resource identification information according to the search key of user's input at described database;
Search the unit, be used for searching corresponding file bag URL by the file bag resource identification information of coupling; Or, by searching corresponding content sig ID, and obtain corresponding literature kit URL according to described content sig ID.
Preferably, described device also comprises:
Download unit is used for from the data of described URL download file resource.
The embodiment of the invention has the following advantages:
At first, the embodiment of the invention meets pre-conditioned file or folder by choose an attribute information from the file or folder that literature kit comprised, and with the resource identification information of its name information as literature kit, because literature kit is made up of file or folder, so choose the identification information of the name information of one of them, more accurately the content of identification document bag as literature kit.
The embodiment of the invention can also be according to each file of literature kit root directory layer and the size of file, choose maximum, perhaps, size satisfies the name information of the file or folder of certain threshold value, resource identification information for the current file bag, promptly choose the identification information of the name information of the file or folder of weight maximum in the whole file bag resource, can obtain more representative file bag resource identification as literature kit;
The embodiment of the invention can also be further from described maximum, perhaps, size satisfies the name information of the file or folder of certain threshold value, in the name information of current file bag, choose the name information of file or folder more accurately according to presetting rule, as the identification information of literature kit, thereby further improved the accuracy that file bag resource identification information is determined;
Moreover the embodiment of the invention can also generate database by the described file bag resource identification information of record, in practice, can utilize this database to carry out file bag resource search and down operation.Because the accuracy of file bag resource identification information is higher, thereby can improve the accuracy of Search Results; Because described identification information is corresponding to the content sig ID record of literature kit, this content sig ID can be avoided the literature kit re-treatment that content is identical, thereby can improve search efficiency;
At last, the present invention is for the service provider, and technology realizes simple, and no technology barrier does not have special secret algorithm, and cost and risk is lower.
Description of drawings
Fig. 1 is the process flow diagram of the method embodiment one of a kind of definite file bag resource identification information of the present invention;
Fig. 2 is the process flow diagram of the method embodiment two of a kind of definite file bag resource identification information of the present invention;
Fig. 3 is the process flow diagram of the method embodiment three of a kind of definite file bag resource identification information of the present invention;
Fig. 4 is the process flow diagram of the method embodiment four of a kind of definite file bag resource identification information of the present invention;
Fig. 5 is the process flow diagram of the method embodiment five of a kind of definite file bag resource identification information of the present invention;
Fig. 6 is the structured flowchart of the device embodiment one of a kind of definite file bag resource identification information of the present invention;
Fig. 7 is the process flow diagram of the generation method embodiment of a kind of file bag resource database of the present invention;
Fig. 8 is the structured flowchart of the generating apparatus embodiment of a kind of file bag resource database of the present invention;
Fig. 9 is the process flow diagram of the searching method embodiment of a kind of file bag resource of the present invention;
Figure 10 is the structured flowchart of the searcher embodiment of a kind of file bag resource of the present invention.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
The present invention can be used in numerous general or special purpose computingasystem environment or the configuration.For example: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system, comprise distributed computing environment of above any system or equipment or the like.
The present invention can describe in the general context of the computer executable instructions of being carried out by computing machine, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract, program, object, assembly, data structure or the like.Also can in distributed computing environment, put into practice the present invention, in these distributed computing environment, by by communication network connected teleprocessing equipment execute the task.In distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium that comprises memory device.
With reference to figure 1, show the process flow diagram of method embodiment one of a kind of definite file bag resource identification information of the embodiment of the invention, can may further comprise the steps:
Step 101, obtain the attribute information of candidate object, described candidate object comprises file and/or the file that a literature kit is comprised;
Step 102, pre-conditioned if the attribute information of one of them candidate object satisfies determines that then the described name information that satisfies pre-conditioned candidate target is the resource identification information of described literature kit.
Preferably, file that described literature kit comprised and/or file can for, file and/or file that the root directory layer of described literature kit (being ground floor catalogue in the literature kit) is comprised.Because generally the content information that the name information of the file or folder of the root directory layer of literature kit more can the representation file bag so can improve the efficient that file bag resource identification information is determined.In this case, only need be at the included file of literature kit root directory layer and/or the attribute information of file, whether satisfy pre-conditioned judgement, can determine the name information that satisfies some file or folders pre-conditioned, described literature kit root directory layer is the resource identification information of described literature kit.
In practice, each literature kit all has a corresponding file tabulation, is used for the information of log file bag All Files and file.The form that listed files shows in dissimilar literature kit is different, every kind of all corresponding agreement separately of compression type, so agreement according to each compression type, grasp the file header of literature kit by web crawlers (spider) or other instrument, file header is analyzed, just can be obtained corresponding listed files.
Particularly, the acquisition process of described listed files is: the spider program is creeped automatically according to the link between the page in the internet, in case find the literature kit (as the compressed file bag) of preset kind, then grasps; After the extracting, the file header of current file bag is analyzed, promptly obtained corresponding listed files.
Generally include in the listed files: size information, file type, modification time and annotation information after name information (filename), directory information (being embodied in the filename usually), document size information, the compression or the like.For example, the file header information of certain zip formatted file bag is as shown in the table:
Compression method (compression algorithm) 2bytes
Crc-32 (cyclic redundancy check (CRC)) 4bytes
Compressed size (compression back size) 4bytes
Uncompressed size (original size) 4bytes
Filename length (filename length) 2bytes
Filename (file name) variable size
Wherein, filename is exactly the name information of file, and uncompressed size is exactly the original size information of uncompressed.In practice, after the information extracting of spider with file header, can also extract filename, uncompressed size information deposits in the database.
In embodiments of the present invention, described attribute information can be document size information, in this case, in the described step 102 pre-conditioned can for:
The size of condition S1, file or folder is maximum in the described literature kit.
Perhaps,
The size of condition S2, file or folder accounts for the magnitude proportion of described literature kit more than or equal to a threshold value.
With reference to figure 2, show and use described condition S1 to determine the process flow diagram of the method embodiment two of file bag resource identification information as of the present invention another, can may further comprise the steps:
Step 201, each file that obtains a literature kit root directory layer and/or the size information of file;
Step 202, if the size information of certain file or folder is maximum in All Files and the literature kit in the described literature kit root directory layer, the name information of then extracting this document or file is the resource identification information of described literature kit.
In the present embodiment, document size information based on the listed files reflection, the file of the root directory layer of statistics file bag and/or the size of file, the size of described file or folder is compared, if corresponding size is maximum file or folder, then represent the content of this document or file is occupied an leading position in this document bag, therefore, can determine the name information of the file or folder that this is maximum, be the resource identification information of this document bag.
For example, the listed files information of certain literature kit comprises: the rule political affairs Shinjin-O first collection .rmvb, the rule political affairs Shinjin-O second collection .rmvb, TV play are shone performer's picture Xin Wan .jpg of army, TV play according to performer's picture lily.jpg, player .exe; Wherein, the file of root directory layer and file are: rule political affairs Shinjin-O, TV play are shone, player .exe, the size information of supposing described each file and file is respectively: rule political affairs Shinjin-O-3GB, TV play photograph-2MB, player-23MB, because " rule political affairs Shinjin-O " corresponding file size is maximum, be the resource identification information of described literature kit so can choose " rule political affairs Shinjin-O " in the file of all root directory layers.
With reference to figure 3, show and use described condition S2 to determine the process flow diagram of the method embodiment three of file bag resource identification information as of the present invention another, can may further comprise the steps:
Step 301, each file that obtains a literature kit root directory layer and/or the size information of file;
Step 302, if the size information of certain file or folder in described literature kit root directory layer in the big or small sum in All Files and the literature kit, proportion is more than or equal to a threshold value, and the name information of then extracting this document or file is the resource identification information of described literature kit.
In the present embodiment, document size information based on described listed files reflection, the file of the root directory layer of statistics file bag and/or the size of file, if accounting for whole literature kit All Files size, the size of certain file or folder surpasses certain proportion, then represent this document or file are occupied an leading position in literature kit, therefore, just with the name information of this document or file, be defined as the resource identification information of this document bag.
For example, the listed files information of certain literature kit comprises: the rule political affairs Shinjin-O first collection .rmvb, the rule political affairs Shinjin-O second collection .rmvb, TV play are shone performer's picture Xin Wan .jpg of army, TV play according to performer's picture lily.jpg, player .exe; Wherein, the file of root directory layer is: rule political affairs Shinjin-O, TV play photograph, and, the file of root directory layer is: player .exe, in this case, then need from: rule political affairs Shinjin-O, TV play according to and player .exe determine resource identification information, be specifically as follows:
A. obtain described each file or file size w (i), calculate a total big or small total, for example, can pass through following formulate:
Total=w(1)+w(2)+w(3)
B. the big or small w (i) that calculates each file or folder respectively accounts for the ratio f (i) of described total big or small total, for example, can pass through following formulate:
f(i)=w(i)/total
If above-mentioned f (i) is greater than certain preset ratio, as 80%, then with the name information of this document or file as resource identification information.Suppose to go up in the example, the ratio of " rule political affairs Shinjin-O " reach 80% or more than, then will " rule political affairs Shinjin-O " as resource identification information of this document bag.
Need to prove, in embodiments of the present invention, because the file of corresponding different-format, compress mode also has difference, thereby causes the file size after each compression, and existence and itself size are not the problems that reduces in proportion, thereby in this case, if described literature kit is the compressed file bag, the document size information in the foregoing description is preferably the size information before the file or folder compression, but not the size information after the compression.In addition, have under a plurality of situations at file bag resource identification information, can choose a resource identification information wantonly as literature kit, perhaps, adopt Else Rule to choose a resource information as literature kit, perhaps, all be feasible as the resource identification information of literature kit directly with described a plurality of name informations, the present invention does not limit this.
In embodiments of the present invention, described attribute information can also be name information, in this case, in the described step 102 pre-conditioned can for:
The name information of condition S3, file or folder does not match with the invalid key that presets;
And/or,
The name information of condition S4, file or folder is complementary with the effective key word that presets.
In practice, the name information of some literature kit also can reflect the main contents of file bag resource truly, effectively, thereby preferred, and described candidate target can also comprise described literature kit itself, in this case, described pre-conditioned can for:
The name information of condition S3 ', described literature kit, or the name information of file or folder does not match with the invalid key that presets in the described literature kit;
And/or,
The name information of condition S4 ', described literature kit, or the name information of file or folder is complementary with the effective key word that presets in the described literature kit.
With reference to figure 4, show described condition S3 ' of application and S4 ' to determine the process flow diagram of the method embodiment four of file bag resource identification information as of the present invention another, can may further comprise the steps:
Step 401, obtain the name information of a literature kit, and the name information of each file or folder of being comprised of this document bag;
Step 402, these name informations and the invalid key that presets are mated, if coupling, then filter the invalid name information of coupling.
By this step, can filter out invalid name information, in the present invention, described invalid name information can be understood as, the correct name information of identification document bag main contents, for example " 123.rar ", " new folder .zip ", " 20071021.rar ", " test.zip " etc., in this case, can be by compiling the data of invalid name, correspondence is provided with corresponding invalid key, as pure digi-tal, " new folder " etc. is when the name information of literature kit and the file or folder that comprised, when all mating or partly mating, then remove this name information with invalid key.If after above-mentioned removal processing, only surplus next name information or a plurality of identical name information then can be directly with the resource identification information of this name information as the current file bag.Otherwise, can continue to carry out following steps.
Step 403, remaining effective name information after the above-mentioned filtration treatment is further mated with the effective key word that presets, choose effective name information of coupling.
For example, effective key word is set is " software ", " formal version ", " beta version " or the like (the common energy of this effective key word supporting paper bag is by standardize naming), if when wherein one or more identical effective name informations are all mated or partly mated with this effective keyword, then can determine the resource identification information that this effective name information is the current file bag.If also there are a plurality of different effective name informations, then can continue to carry out following steps and determine best file bag resource identification.
Preferably, described effective key word can comprise the software version number key word.This set is primarily aimed at the resource identification information of software class.Because most of software all can embody version number nominally in the reality, for example " PowerWord 5.7.6.426 ", " fleeing hare V3.0.108 ", " V2008 Build 1890 seen repeatedly in Chinese idiom " etc., for this type of name information, can mate at this software version number again by extracting its software version number.More preferred, the extracting method of described software version number can for, judge whether the continuation character string that separates by period ". " is arranged in the title, perhaps, whether initial is arranged is ' V ' or ' v ', and the back connects the name information of digit strings, if then be software version number.
For guaranteeing that further determined file bag resource identification has the maximum representative of literature kit content, can also increase pre-conditioned S5: will be weighted with the name information that the effective key word that presets is complementary, extracting the highest name information of weights is the resource identification information of current file bag.In this case, present embodiment can also may further comprise the steps:
Step 404, be weighted at effective name information of the effective key word of coupling, extracting the highest effective name information of weight is the resource identification information of described literature kit.
Below further specify present embodiment by an object lesson.
Suppose the file of include file " Foundation of Software Engineering " by name in file " test.zip " literature kit by name, preset invalid key and be " test ", effectively key word is " software ", when determining file bag resource identification information, a kind of processing can for, since the filename of literature kit and invalid key " test " coupling, the filename of described filtration " test.zip ", and " Foundation of Software Engineering " directly will being left is as the resource identification information of this document bag; Another kind of processing can for, continue will " Foundation of Software Engineering " this filename to mate with effective key word " software ", find that their mate, so " Foundation of Software Engineering " this file is weighted.Through above-mentioned processing, the weight of " Foundation of Software Engineering " file will be apparently higher than the weight of " test.zip " filename, so determine that the resource identification information of this document bag is " Foundation of Software Engineering ".
In embodiments of the present invention, described attribute information can also comprise document size information and name information, and in this case, those skilled in the art are can combination in any above-mentioned pre-conditioned, as combination condition S2, S3 ', S4 ' and S5 etc.
With reference to figure 5, show the process flow diagram of the method embodiment five of a kind of definite file bag resource identification information of the present invention, can may further comprise the steps:
Step 501, obtain the attribute information of candidate object;
Described candidate object comprises file and/or the file that a literature kit is comprised; Described attribute information comprises document size information and name information.
Step 502, pre-conditioned if the attribute information of one of them candidate object satisfies determines that then the described name information that satisfies pre-conditioned candidate target is the resource identification information of described literature kit.
This step can comprise following substep:
Substep 5021, choose file size and account for alternative file and/or the file of the ratio of described literature kit All Files size sum more than or equal to a threshold value;
Substep 5022, with the name information of described literature kit, the name information of alternative file or file is mated with the invalid key that presets, and filters the invalid name information of coupling;
Substep 5023, will filter the remaining name information in back and mate, effective name information of coupling will be weighted with effective key word of presetting;
Substep 5024, the highest name information of extraction weights are the resource identification information of literature kit.
Below further specify present embodiment by an object lesson.
Suppose to comprise among literature kit 17173 mht37.rar three files: " software interface .JPG ", " practical tool kit V3.7.exe " and " more new description .txt ", total size of calculating above file is: total=5733902, the f (i) that obtains each file then is: f (1)=1.2%, f (2)=98.6%, f (3)=0.2%, if the preset ratio threshold value is 80%, then owing to f (2)>80%, so " dreamlike Journey to the West practical tool kit V3.7 " can be used as candidate's resource identification information; Because this information is also mated effective keyword " tool box ", and has software version number " V3.7 ", therefore further to its weighting.In this case, determine that candidate's resource identification information " dreamlike Journey to the West practical tool kit V3.7 " is the resource identification information of current file bag.
Preferably, in embodiments of the present invention, the file bag resource identification information for determining can also be used to replacing the raw filename of described literature kit, as above in the example, replace original bag name information " 17173 mht37 " with " dreamlike Journey to the West practical tool kit V3.7 "; It can be saved as database, with demands such as the search of satisfying file bag resource, downloads, the present invention does not need this to limit yet.
What those skilled in the art were easy to expect is, the combination in any of the embodiment of the invention all is embodiment of the present invention, but this instructions has not just described in detail one by one at this as space is limited.
For aforesaid each method embodiment, for simple description, so it all is expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not subjected to the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in the instructions all belongs to preferred embodiment, and related action and module might not be that the present invention is necessary.
With reference to figure 6, show the structured flowchart of the device embodiment of a kind of definite file bag resource identification information of the present invention, can comprise with lower unit:
Fileinfo acquiring unit 601 is used to obtain the attribute information of candidate object, and described candidate object comprises file and/or the file that a literature kit is comprised; Preferably, file that described literature kit comprised and/or file can for, file and/or file that the root directory layer of described literature kit is comprised;
Judging unit 602 is used to judge whether the attribute information of described candidate object satisfies pre-conditioned;
Determining unit 603 is used for therein the attribute information of a candidate target and satisfies when pre-conditioned, determines that the described name information that satisfies pre-conditioned candidate object is the resource identification information of described literature kit.
In the present embodiment, preferred, described attribute information can be document size information; In this case, described pre-conditioned can for:
The size of file or folder is maximum in the described literature kit; Or
The size of file or folder accounts for the magnitude proportion of described literature kit more than or equal to a threshold value.
In practice, described literature kit can be the compressed file bag, is the accuracy that guarantees that file bag resource identification information is determined, described document size information can be compressed preceding size information for file or folder.
In this case, use first kind of preferred embodiment shown in Figure 6 and determine that the process of file bag resource identification information can may further comprise the steps:
Step B1, fileinfo acquiring unit obtain the size information of literature kit institute's include file and/or file;
As another embodiment, this step can also for, the fileinfo acquiring unit obtains each file of a literature kit root directory layer and/or the size information of file;
Whether the size information of step B2, certain file or folder of judgment unit judges is maximum in All Files and the literature kit in the described literature kit root directory layer; Perhaps, in the big or small sum in All Files and the literature kit, proportion is more than or equal to a threshold value, if then trigger determining unit execution in step B3 in described literature kit root directory layer for the size information of judging certain file or folder;
Step B3, determining unit determine that a name information that satisfies the file or folder of above-mentioned condition is the resource identification information of described literature kit.
As another embodiment, described attribute information can be name information; In this case, described pre-conditioned can for:
The file or folder name information does not match with the invalid key that presets;
And/or
The file or folder name information is complementary with the effective key word that presets.
Preferably, described pre-conditioned can also comprising:
To be weighted with the name information that the effective key word that presets is complementary, extract the highest name information of weights.
In this case, use second kind of preferred embodiment shown in Figure 6 and determine that the process of file bag resource identification information can may further comprise the steps:
Step C1, fileinfo acquiring unit obtain the name information of literature kit institute's include file and/or file;
As another embodiment, this step can also for, the fileinfo acquiring unit obtains each file of a literature kit root directory layer and/or the name information of file;
Step C2, judging unit mate these name informations and the invalid key that presets, after filtering the invalid name information of coupling, remaining effective name information is further mated with the effective key word that presets, and be weighted at effective name information of coupling;
Step C3, the highest effective name information of extraction weight are the resource identification information of described literature kit.
As another embodiment, described attribute information can be name information, and described candidate target can also comprise described literature kit itself, in this case, described pre-conditioned can for:
The name information of described literature kit, or the name information of file or folder does not match with the invalid key that presets in the described literature kit; And/or,
The name information of described literature kit, or the name information of file or folder is complementary with the effective key word that presets in the described literature kit.
Preferably, described pre-conditioned can also comprising:
To be weighted with the name information that the effective key word that presets is complementary, extract the highest name information of weights.
In this case, use the third preferred embodiment shown in Figure 6 and determine that the process of file bag resource identification information can may further comprise the steps:
Step D1, fileinfo acquiring unit obtain the name information of a literature kit, and the name information of this document bag institute's include file and/or file;
As another embodiment, this step can also for, the fileinfo acquiring unit obtains the name information of a literature kit, and each file of this document bag root directory layer and/or the name information of file;
Step D2, judging unit mate these name informations and the invalid key that presets, after filtering the invalid name information of coupling, remaining effective name information is further mated with the effective key word that presets, and be weighted at effective name information of coupling;
Step D3, the highest effective name information of extraction weight are the resource identification information of described literature kit.
In said apparatus embodiment, described literature kit can also comprise the unit of renaming, and is used for replacing with described resource identification information the raw filename of described literature kit.
What those skilled in the art were easy to expect is: it all is feasible that the combination in any of above-mentioned first kind of embodiment and second kind of embodiment, first kind of embodiment and the third embodiment is used, so the combination in any of the foregoing description all is embodiment of the present invention, but this instructions has not just described in detail one by one at this as space is limited.
For the device embodiment of above-mentioned definite file bag resource identification information, because it is substantially corresponding to the method embodiment of aforementioned definite file bag resource identification information, so description is fairly simple, relevant part gets final product referring to the part explanation of method embodiment.
With reference to figure 7, show the process flow diagram of the generation method embodiment of a kind of file bag resource database of the present invention, can may further comprise the steps:
Step 701, the URL that obtains literature kit in the Webpage and content sig ID thereof;
Wherein, described content sig ID can obtain by the content-data calculating back of pre-defined algorithm to literature kit, and described pre-defined algorithm can obtain different disposal result's algorithm for the content-data of handling different binary files;
Step 702, obtain the resource identification information of described literature kit;
Wherein, described resource identification information can satisfy the name information of a pre-conditioned candidate object for attribute information, and described candidate object can comprise file and/or the file that described literature kit comprises.
Preferably, file that described literature kit comprised and/or file can for, file and/or file that the root directory layer of described literature kit is comprised.
Step 703, the described content sig ID of record and corresponding file bag URL and file bag resource identification information form database.
In practice, file bag resource in the page and URL thereof can be grasped by web crawlers (spider), be well known that, the course of work of web crawlers can for, based on the thought of BFS (Breadth First Search), URL (Uniform Resource Locator from one or several Initial pages, URL(uniform resource locator)) beginning obtains the URL on the Initial page, in the process that grasps webpage, constantly extract new URL and put into formation, up to the certain stop condition that satisfies system from current page.Perhaps, also can be used as a program of downloading webpage automatically, filter and irrelevant the linking of theme, remain with the link of usefulness and put it into and wait for the URL formation of grasping according to certain web page analysis algorithm; Then, will from formation, select next step webpage URL that will grasp, and repeat said process, when reaching a certain condition of system, stop according to certain search strategy.
Uniqueness for the file bag resource that guarantees to grasp, avoid the file bag resource re-treatment identical to content, can calculate a content sig ID and come the corresponding literature kit of unique identification, described content sig ID can obtain by the content-data calculating back of pre-defined algorithm to binary file, described pre-defined algorithm can obtain different disposal result's algorithm for the content-data of handling different binary files, also can be the extremely low algorithm of result repetition rate, for example, content-data to each binary file carries out Hash operation, obtain the cryptographic hash of file content, the cryptographic hash of this document content promptly can be used as the content sig ID, in order to unique identification corresponding file bag.
Particularly, a kind of method of calculating the content sig ID is: choose file bag resource before, during and after each 32KB data (can certainly choose the content-data of literature kit other parts, only make example at this), utilize hash algorithm (can adopt md5-challenge, MD5, MD4, Secure Hash Algorithm etc. are as formula) respectively these three parts are calculated, after resulting three values are linked in sequence, utilize above algorithm that the data after connecting are calculated once more, with the value that obtains at last content sig ID as this document bag; For the identical a plurality of literature kit of content sig ID, can think that they have identical content.When generating database, can be major key then with described content sig ID, storage corresponding file bag resource identification information and URL information.
After having generated the file bag resource database, can further utilize this database that services such as search or download are provided.
With reference to figure 8, show the structured flowchart of the generating apparatus embodiment of a kind of file bag resource database of the present invention, can comprise with lower unit:
Resource URL placement unit 801 is used for obtaining the URL of the literature kit of Webpage;
Content sig ID computing unit 802 is used for obtaining by the content-data of pre-defined algorithm calculation document bag the content sig ID of described literature kit;
Preferably, described pre-defined algorithm can obtain different disposal result's algorithm for the content-data of handling different binary files.
Resource identification information acquiring unit 803 is used to obtain the identification information of described file bag resource;
Wherein, described identification information can satisfy the name information of a pre-conditioned candidate object for attribute information, and described candidate object can comprise file and/or the file that described literature kit comprises; Preferably, file that described literature kit comprised and/or file can for, file and/or file that the root directory layer of described literature kit is comprised.
Record cell 804 is used to write down described content sig ID and corresponding file bag resource URL and file bag resource identification information, forms database.
The process of using preferred embodiment spanned file bag resource database shown in Figure 8 can comprise:
Step e 1, resource URL placement unit obtain the URL of literature kit in the Webpage, and content sig ID computing unit obtains the content sig ID of described literature kit by the content-data of pre-defined algorithm calculation document bag;
Wherein, described pre-defined algorithm can obtain different disposal result's algorithm for the content-data of handling different binary files.
Step e 2, resource identification information acquiring unit obtain the resource identification information of described literature kit;
Wherein, described resource identification information can satisfy the name information of a pre-conditioned candidate object for attribute information, and described candidate object can comprise file and/or the file that described literature kit comprises; Preferably, file that described literature kit comprised and/or file can for, file and/or file that the root directory layer of described literature kit is comprised.
Step e 3, the described content sig ID of recording unit records and corresponding file bag resource URL and file bag resource identification information form database.
With reference to figure 9, show the process flow diagram of the searching method embodiment of a kind of file bag resource of the present invention, can may further comprise the steps:
Step 901, initialized data base;
Preferably, described database can comprise content sig ID and corresponding file bag URL and file bag resource identification information, and described file bag resource URL obtains by the file bag resource that grasps in the Webpage; Described content sig ID obtains by the content-data calculating of pre-defined algorithm to literature kit, and described pre-defined algorithm obtains different disposal result's algorithm for the content-data of handling different binary files; Described file bag resource identification information is the name information that attribute information satisfies a pre-conditioned candidate object, and described candidate object comprises file and/or the file that described literature kit comprises.Preferably, file that described literature kit comprised and/or file can for, file and/or file that the root directory layer of described literature kit is comprised.
Step 902, the search key of importing according to the user mate corresponding file bag resource identification information in described database;
Step 903, by the coupling file bag resource identification information search corresponding file bag URL; Or, by searching corresponding content sig ID, and obtain corresponding literature kit URL according to described content sig ID.
Preferably, described database can be arranged on server end, has stored the corresponding relation of content sig ID and file bag resource identification information and URL information.
Use this database carry out data search process can for, receive the search key that the user submits to, the file bag resource identification information of storing in this key word and the database is mated, if find occurrence, then find corresponding content sig ID, extract corresponding URL according to this content sig ID then by this document bag resource identification information.
Preferably, present embodiment can also comprise step 904: from the data of described URL download file resource.
The user in client downloads after the literature kit, can also download the operation that renames, be specially, calculate the content sig ID in client for this document bag, then this content sig ID is submitted to server, server is searched this content sig ID corresponding file bag resource identification information in database, if exist, then this identification information is returned to client.After client received the information that server returns, whether the prompting user renamed, if the user confirms to rename, then client is revised as the resource identification information that server returns with the name information of this document bag resource automatically.
With reference to Figure 10, show the structured flowchart of the searcher embodiment of a kind of file bag resource of the present invention, can comprise with lower unit:
Database 1001, described database comprise content sig ID and corresponding file bag resource URL and file bag resource identification information;
Wherein, described file bag resource URL can obtain by the file bag resource that grasps in the Webpage; Described content sig ID can obtain by the content-data calculating of pre-defined algorithm to literature kit, and described pre-defined algorithm can obtain different disposal result's algorithm for the content-data of handling different binary files; Described file bag resource identification information can satisfy the name information of a pre-conditioned candidate object for attribute information, and described candidate object can comprise file and/or the file that described literature kit comprises; Preferably, file that described literature kit comprised and/or file can for, file and/or file that the root directory layer of described literature kit is comprised.
Matching unit 1002 is used for mating corresponding file bag resource identification information according to the search key of user's input at described database;
Search unit 1003, be used for searching corresponding file bag URL by the file bag resource identification information of coupling; Or, by searching corresponding content sig ID, and obtain corresponding literature kit URL according to described content sig ID.Preferably, in the present embodiment, can also comprise:
Download unit is used for from the data of described URL download file resource.
The process of using preferred embodiment search file bag resource shown in Figure 10 can comprise:
Step F 1, initialized data base, described database comprise content sig ID and corresponding file bag resource URL and file bag resource identification information;
Wherein, described file bag resource URL can obtain by the file bag resource that grasps in the Webpage; Described content sig ID can obtain by the content-data calculating of pre-defined algorithm to literature kit, and described pre-defined algorithm can obtain different disposal result's algorithm for the content-data of handling different binary files; Described file bag resource identification information can satisfy the name information of a pre-conditioned candidate object for attribute information, and described candidate object can comprise file and/or the file that described literature kit comprises; Preferably, file that described literature kit comprised and/or file can for, file and/or file that the root directory layer of described literature kit is comprised.
Step F 2, matching unit mate corresponding file bag resource identification information according to the search key of user's input in described database;
Step F 3, search the unit by the coupling file bag resource identification information search corresponding file bag URL; Or, by searching corresponding content sig ID, and obtain corresponding literature kit URL according to described content sig ID.
Need to prove, in the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields do not have the part that describes in detail among certain embodiment, can be referring to the associated description of other embodiment.In addition, for each device embodiment, because it is substantially corresponding to its method embodiment, so description is fairly simple, relevant part gets final product referring to the part explanation of method embodiment.
To sum up, the embodiment of the invention is according to each file of literature kit root directory layer and the size of file, choose maximum, perhaps, size satisfies the name information of the file or folder of certain threshold value, be the resource identification information of current file bag, promptly choose the identification information of the name information of the file or folder of weight maximum in the whole file bag resource, can obtain more representative file bag resource identification as literature kit; The embodiment of the invention can also be further from described maximum, perhaps, size satisfies the name information of the file or folder of certain threshold value, in the name information of current file bag, choose the name information of file or folder more accurately according to presetting rule, as the identification information of literature kit, thereby further improved the accuracy that file bag resource identification information is determined; Moreover the embodiment of the invention can also generate database by the described file bag resource identification information of record, in practice, can utilize this database to carry out file bag resource search and down operation.Because the accuracy of file bag resource identification information is higher, thereby can improve the accuracy of Search Results; Because described identification information is corresponding to the content sig ID record of literature kit, this content sig ID can be avoided the literature kit re-treatment that content is identical, thereby can effectively improve search efficiency.
More than the searching method and the device of the method for the generation of the method for obtaining file bag resource identification information provided by the present invention and device, file bag resource database and device, file bag resource is described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (26)

1, a kind of method of definite file bag resource identification information is characterized in that, comprising:
Obtain the attribute information of candidate object, described candidate object is file and/or the file that a literature kit is comprised;
If it is pre-conditioned that the attribute information of one of them candidate object satisfies, the name information that then definite described attribute information satisfies pre-conditioned candidate target is the resource identification information of described literature kit.
2, the method for claim 1 is characterized in that, file that described literature kit comprised and/or file are:
File and/or file that the root directory layer of described literature kit is comprised.
3, method as claimed in claim 1 or 2 is characterized in that, described attribute information is a document size information.
4, method as claimed in claim 3 is characterized in that, describedly pre-conditionedly is:
The size of file or folder is maximum in the described literature kit; Or,
The size of file or folder accounts for the magnitude proportion of described literature kit more than or equal to a threshold value.
As claim 1,2 or 4 described methods, it is characterized in that 5, described literature kit is the compressed file bag, described document size information is the size information before file or folder is compressed.
6, method as claimed in claim 1 or 2 is characterized in that, described attribute information is a name information.
7, method as claimed in claim 6 is characterized in that, describedly pre-conditionedly is:
The name information of file or folder does not match with the invalid key that presets; And/or,
The name information of file or folder is complementary with the effective key word that presets.
8, method as claimed in claim 6 is characterized in that, described candidate target also comprises described literature kit, describedly pre-conditionedly is:
The name information of described literature kit, or the name information of file or folder does not match with the invalid key that presets in the described literature kit; And/or,
The name information of described literature kit, or the name information of file or folder is complementary with the effective key word that presets in the described literature kit.
9, as claim 7 or 8 described methods, it is characterized in that described pre-conditionedly also comprise:
To be weighted with the name information that the effective key word that presets is complementary, extract the highest name information of weights.
10, as claim 1,2,4,7 or 8 described methods, it is characterized in that, also comprise:
Replace the name information of described literature kit with described resource identification information.
11, a kind of generation method of file bag resource database is characterized in that, comprising:
Obtain the URL and the content sig ID thereof of literature kit in the Webpage, described content sig ID obtains by the content-data calculating back of pre-defined algorithm to literature kit, and described pre-defined algorithm obtains different disposal result's algorithm for the content-data of handling different binary files;
Obtain the resource identification information of described literature kit, described resource identification information is the name information that attribute information satisfies a pre-conditioned candidate object, and described candidate object comprises file and/or the file that described literature kit comprises;
Write down described content sig ID and corresponding file bag URL and file bag resource identification information, form database.
12, a kind of searching method of file bag resource is characterized in that, comprising:
Initialized data base, described database comprise content sig ID and corresponding file bag URL and file bag resource identification information, and described file bag resource URL obtains by the file bag resource that grasps in the Webpage; Described content sig ID obtains by the content-data calculating of pre-defined algorithm to literature kit, and described pre-defined algorithm obtains different disposal result's algorithm for the content-data of handling different binary files; Described file bag resource identification information is the name information that attribute information satisfies a pre-conditioned candidate object, and described candidate object comprises file and/or the file that described literature kit comprises;
Search key according to user's input mates corresponding file bag resource identification information in described database;
File bag resource identification information by coupling is searched corresponding file bag URL; Or, by searching corresponding content sig ID, and obtain corresponding literature kit URL according to described content sig ID.
13, method as claimed in claim 12 is characterized in that, also comprises:
Data from described URL download file resource.
14, a kind of device of definite file bag resource identification information is characterized in that, comprising:
The fileinfo acquiring unit is used to obtain the attribute information of candidate object, and described candidate object is file and/or the file that a literature kit is comprised;
Judging unit is used to judge whether the attribute information of described candidate object satisfies pre-conditioned;
Determining unit is used for therein the attribute information of a candidate target and satisfies when pre-conditioned, determines that the described name information that satisfies pre-conditioned candidate object is the resource identification information of described literature kit.
15, device as claimed in claim 14 is characterized in that, file that described literature kit comprised and/or file are:
File and/or file that the root directory layer of described literature kit is comprised.
As claim 14 or 15 described devices, it is characterized in that 16, described attribute information is a document size information.
17, device as claimed in claim 14 is characterized in that, describedly pre-conditionedly is:
The size of file or folder is maximum in the described literature kit; Or
The size of file or folder accounts for the magnitude proportion of described literature kit more than or equal to a threshold value.
As claim 14,15 or 17 described devices, it is characterized in that 18, described literature kit is the compressed file bag, described document size information is the size information before file or folder is compressed.
As claim 14 or 15 described devices, it is characterized in that 19, described attribute information is a name information.
20, device as claimed in claim 14 is characterized in that, describedly pre-conditionedly is:
The file or folder name information does not match with the invalid key that presets; And/or
The file or folder name information is complementary with the effective key word that presets.
21, device as claimed in claim 14 is characterized in that, described candidate target also comprises described literature kit, describedly pre-conditionedly is:
The name information of described literature kit, or the name information of file or folder does not match with the invalid key that presets in the described literature kit; And/or,
The name information of described literature kit, or the name information of file or folder is complementary with the effective key word that presets in the described literature kit.
22, as claim 20 or 21 described devices, it is characterized in that described pre-conditionedly also comprise:
To be weighted with the name information that the effective key word that presets is complementary, extract the highest name information of weights.
23, as claim 14,15,17,20 or 21 described devices, it is characterized in that, also comprise:
The unit of renaming is used for replacing with described resource identification information the name information of described literature kit.
24, a kind of generating apparatus of file bag resource database is characterized in that, comprising:
Resource URL placement unit is used for obtaining the URL of the literature kit of Webpage;
Content sig ID computing unit is used for obtaining by the content-data of pre-defined algorithm calculation document bag the content sig ID of described literature kit, and described pre-defined algorithm obtains different disposal result's algorithm for the content-data of handling different binary files;
The resource identification information acquiring unit, be used to obtain the identification information of described file bag resource, described identification information is the name information that attribute information satisfies a pre-conditioned candidate object, and described candidate object comprises file and/or the file that described literature kit comprises;
Record cell is used to write down described content sig ID and corresponding file bag resource URL and file bag resource identification information, forms database.
25, a kind of searcher of file bag resource is characterized in that, comprising:
Database, described database comprise content sig ID and corresponding file bag resource URL and file bag resource identification information, and described file bag resource URL obtains by the file bag resource that grasps in the Webpage; Described content sig ID obtains by the content-data calculating of pre-defined algorithm to literature kit, and described pre-defined algorithm obtains different disposal result's algorithm for the content-data of handling different binary files; Described file bag resource identification information is the name information that attribute information satisfies a pre-conditioned candidate object, and described candidate object comprises file and/or the file that described literature kit comprises;
Matching unit is used for mating corresponding file bag resource identification information according to the search key of user's input at described database;
Search the unit, be used for searching corresponding file bag URL by the file bag resource identification information of coupling; Or, by searching corresponding content sig ID, and obtain corresponding literature kit URL according to described content sig ID.
26, device as claimed in claim 25 is characterized in that, also comprises:
Download unit is used for from the data of described URL download file resource.
CN200810134724XA 2008-07-23 2008-07-23 Method and apparatus for determining file bag resource identification information Active CN101354718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810134724XA CN101354718B (en) 2008-07-23 2008-07-23 Method and apparatus for determining file bag resource identification information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810134724XA CN101354718B (en) 2008-07-23 2008-07-23 Method and apparatus for determining file bag resource identification information

Publications (2)

Publication Number Publication Date
CN101354718A true CN101354718A (en) 2009-01-28
CN101354718B CN101354718B (en) 2012-02-08

Family

ID=40307527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810134724XA Active CN101354718B (en) 2008-07-23 2008-07-23 Method and apparatus for determining file bag resource identification information

Country Status (1)

Country Link
CN (1) CN101354718B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073552B (en) * 2009-11-19 2013-01-16 北大方正集团有限公司 Digital resource packet structure verifying method and system
CN103186653A (en) * 2011-12-30 2013-07-03 国际商业机器公司 Method and equipment for assistance inquiry, method and equipment for inquiry, and named query system
CN103491451A (en) * 2013-09-26 2014-01-01 深圳Tcl新技术有限公司 Method and device for obtaining webpage data
CN108304498A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Webpage data acquiring method, device, computer equipment and storage medium
CN109902074A (en) * 2019-04-17 2019-06-18 江苏全链通信息科技有限公司 Log storing method and system based on data center
CN110245017A (en) * 2019-05-29 2019-09-17 华为技术有限公司 The distribution method and equipment of resource identification

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073552B (en) * 2009-11-19 2013-01-16 北大方正集团有限公司 Digital resource packet structure verifying method and system
CN103186653A (en) * 2011-12-30 2013-07-03 国际商业机器公司 Method and equipment for assistance inquiry, method and equipment for inquiry, and named query system
CN103186653B (en) * 2011-12-30 2016-04-13 国际商业机器公司 Nonproductive poll method and apparatus, querying method and equipment and name inquiry system
US9811554B2 (en) 2011-12-30 2017-11-07 International Business Machines Corporation Assisting query and querying
CN103491451A (en) * 2013-09-26 2014-01-01 深圳Tcl新技术有限公司 Method and device for obtaining webpage data
CN103491451B (en) * 2013-09-26 2017-06-16 深圳Tcl新技术有限公司 A kind of web data acquisition methods and device
CN108304498A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Webpage data acquiring method, device, computer equipment and storage medium
CN108304498B (en) * 2018-01-12 2020-08-25 深圳壹账通智能科技有限公司 Webpage data acquisition method and device, computer equipment and storage medium
CN109902074A (en) * 2019-04-17 2019-06-18 江苏全链通信息科技有限公司 Log storing method and system based on data center
CN110245017A (en) * 2019-05-29 2019-09-17 华为技术有限公司 The distribution method and equipment of resource identification
CN110245017B (en) * 2019-05-29 2024-03-26 华为技术有限公司 Resource identifier allocation method and equipment

Also Published As

Publication number Publication date
CN101354718B (en) 2012-02-08

Similar Documents

Publication Publication Date Title
Raghavan et al. Representing web graphs
Aizawa et al. NTCIR-11 Math-2 Task Overview.
Zhao et al. SNIAFL: Towards a static noninteractive approach to feature location
CN101354718B (en) Method and apparatus for determining file bag resource identification information
CN106126648B (en) It is a kind of based on the distributed merchandise news crawler method redo log
US9798732B2 (en) Semantic associations in data
US20090299978A1 (en) Systems and methods for keyword and dynamic url search engine optimization
KR20120124581A (en) Method, device and computer readable recording medium for improvded detection of similar documents
CN102521232B (en) Distributed acquisition and processing system and method of internet metadata
Mišutka et al. System description: Egomath2 as a tool for mathematical searching on wikipedia. org
WO2008098502A1 (en) Method and device for creating index as well as method and system for retrieving
JP7141180B2 (en) Incident search method, device, device and storage medium based on knowledge graph
WO2017036348A1 (en) Method and device for compressing and decompressing extensible markup language document
Zhong et al. Pya0: A python toolkit for accessible math-aware search
He et al. Compact full-text indexing of versioned document collections
CN109614535B (en) Method and device for acquiring network data based on Scapy framework
CN116860825A (en) Verifiable retrieval method and system based on blockchain
Broder et al. Exploiting site-level information to improve web search
JP5613536B2 (en) Method, system, and computer-readable recording medium for dynamically extracting and providing the most suitable image according to a user's request
Yu et al. The design and realization of open-source search engine based on Nutch
CN103544167A (en) Backward word segmentation method and device based on Chinese retrieval
CN102495844B (en) Improved GuTao method for creating user models
CN111291230A (en) Feature processing method and device, electronic equipment and computer-readable storage medium
Al-Jedady et al. Fast arabic query matching for compressed arabic inverted indices
US9996621B2 (en) System and method for retrieving internet pages using page partitions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20170620

Address after: A District No. 9018 building 518000 Guangdong Han innovation city of Shenzhen province Nanshan District high tech park, North Central Avenue, 4 floor 401

Patentee after: Shenzhen thunder network culture Co., Ltd.

Address before: 518057 Guangdong, Shenzhen, Nanshan District science and technology in the road, Shenzhen, No. 11, software park, building 7, level 8, two

Patentee before: Xunlei Network Technology Co., Ltd., Shenzhen

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180206

Address after: Nanshan District Guangdong streets of science and technology of Shenzhen city in Guangdong province 518057 two Road No. 11 Shenzhen Software Park Building 7, 8 floor

Patentee after: Xunlei Network Technology Co., Ltd., Shenzhen

Address before: A District No. 9018 building 518000 Guangdong Han innovation city of Shenzhen province Nanshan District high tech park, North Central Avenue, 4 floor 401

Patentee before: Shenzhen thunder network culture Co., Ltd.

TR01 Transfer of patent right