CN108255972A - A kind of text searching method and system - Google Patents

A kind of text searching method and system Download PDF

Info

Publication number
CN108255972A
CN108255972A CN201711441728.8A CN201711441728A CN108255972A CN 108255972 A CN108255972 A CN 108255972A CN 201711441728 A CN201711441728 A CN 201711441728A CN 108255972 A CN108255972 A CN 108255972A
Authority
CN
China
Prior art keywords
file
index
description information
retrieval
determined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711441728.8A
Other languages
Chinese (zh)
Inventor
张迪
崔俊啸
臧德波
蔺川
景长超
张鹏
褚波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur General Software Co Ltd
Original Assignee
Inspur General Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur General Software Co Ltd filed Critical Inspur General Software Co Ltd
Priority to CN201711441728.8A priority Critical patent/CN108255972A/en
Publication of CN108255972A publication Critical patent/CN108255972A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of text searching method and system, this method includes:At least one file data is obtained, determines the corresponding description information of each described file data;According to the description information, the corresponding file index of each described file data is built;Obtain retrieval information input by user;At least one search key is parsed from the retrieval information;File destination index corresponding at least one search key is determined from each file index;Determine that the file destination indexes corresponding goal description information, and the goal description information is shown.This programme can improve data search efficiency.

Description

A kind of text searching method and system
Technical field
The present invention relates to field of computer technology, more particularly to a kind of text searching method and system.
Background technology
With the development of computer technology, data are presented explosive growth, how from file system quick-searching is to mesh Data are marked, there is great influence to improving data-handling efficiency.
The distributed file system that Hadoop is provided can store a large amount of data, and each data dispersion is stored in different deposit In storage device, such as it is dispersed in each disk.User needs to look into storage device one by one in searched targets data It sees, to determine to whether there is target data in storage device.
Since the data volume stored in file system is larger, the storage location of each data is also not quite similar, therefore uses The mode for searching storage device one by one carrys out searched targets data, less efficient.
Invention content
An embodiment of the present invention provides a kind of text searching method and systems, can improve the recall precision of data.
In a first aspect, an embodiment of the present invention provides a kind of text searching method, including:
At least one file data is obtained, determines the corresponding description information of each described file data;
According to the description information, the corresponding file index of each described file data is built;
Obtain retrieval information input by user;
At least one search key is parsed from the retrieval information;
File destination index corresponding at least one search key is determined from each file index;
Determine that the file destination indexes corresponding goal description information, and the goal description information is shown.
Preferably,
After acquisition retrieval information input by user, further comprise:
Obtain search condition input by user;
It is described that file destination rope corresponding at least one search key is determined from each file index Draw, including:
According to the search condition and each search key, the file destination index is determined.
Preferably,
It is described that the file destination index is determined according to the search condition and each search key, including:
The retrieval time carried according to the search condition and the corresponding description information of each file index In creation time, corresponding with retrieval time alternative file index is determined from each file index;
File destination index corresponding with the search key is determined from the alternative file index determined;
Preferably,
It is described that the file destination index is determined according to the search condition and each search key, including:
The retrieval file type carried according to the search condition and the corresponding description of each file index Establishment file type in information determines alternative text corresponding with the retrieval file type from each file index Part indexes;
File destination index corresponding with the search key is determined from the alternative file index determined;
Preferably,
It is described that the file destination index is determined according to the search condition and each search key, including:
According to the splicing relationship carried in the search condition, each search key is combined;
According to the search key after combination, the file destination index is determined.
Preferably,
Further comprise:Index database is built in preset memory locations;
It is described that the corresponding file index of each described file data is built according to the description information, including:
The file content in the description information is segmented using preset segmenter, at least one content is obtained and closes Keyword;
At least one content keyword is handled using the corresponding dictionary of the preset segmenter, and will place The description information is written in the content keyword after reason;
The description information is stored in the index database using preset index creation device, forms the file index.
Preferably,
Further comprise:
Receive file deletion requests input by user;
According to the file deletion requests, file data to be deleted is determined from least one file data;
Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;
Using the index creation device by the description information to be deleted and the file index to be deleted from the rope Draw in library and delete.
Second aspect, an embodiment of the present invention provides a kind of text retrieval system, including:Index construct unit obtains list Member and retrieval unit;Wherein,
The index construct unit for obtaining at least one file data, determines that each described file data corresponds to Description information, according to the description information, build the corresponding file index of each described file data;
The acquiring unit for obtaining retrieval information input by user, and parses at least from the retrieval information One search key;
The retrieval unit, it is corresponding at least one search key for being determined from each file index File destination index;Determine that the file destination indexes corresponding goal description information, and to the goal description information into Row displaying.
Preferably,
The acquiring unit is further used for obtaining search condition input by user;
The retrieval unit, for according to the search condition and each search key, determining the target text Part indexes.
Preferably,
The retrieval unit, for the retrieval time carried according to the search condition and each file index Creation time in the corresponding description information is determined corresponding with the retrieval time from each file index Alternative file indexes;File destination rope corresponding with the search key is determined from the alternative file index determined Draw;
Preferably,
The retrieval unit, for the retrieval file type carried according to the search condition and each file The establishment file type in the corresponding description information is indexed, is determined from each file index and the retrieval text The corresponding alternative file index of part type;It is determined from the alternative file index determined corresponding with the search key File destination index;
Preferably,
The retrieval unit, it is crucial to each retrieval for according to the splicing relationship carried in the search condition Word is combined;According to the search key after combination, the file destination index is determined.
Preferably,
Further comprise:Setting unit;Wherein,
The setting unit, for building index database in preset memory locations;
The index construct unit, for being divided using preset segmenter the file content in the description information Word obtains at least one content keyword;At least one content is closed using the corresponding dictionary of the preset segmenter Keyword is handled, and the description information is written in the content keyword by treated;Utilize preset index creation device The description information is stored in the index database, forms the file index.
Preferably,
Further comprise:Index deletes unit;Wherein,
The acquiring unit is further used for receiving file deletion requests input by user;
The index deletes unit, for according to the file deletion requests, from least one file data really Fixed file data to be deleted;Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;Profit The description information to be deleted and the file index to be deleted are deleted from the index database with the index creation device.
An embodiment of the present invention provides a kind of text searching method and systems, are believed according to the description of the file data got Breath generates the corresponding file index of each file data.When getting retrieval information input by user, solved from retrieval information Search key is precipitated, and determines file destination index corresponding with search key, then file destination is indexed corresponding Goal description information is shown.The automatically retrieval to each file data is achieved in, and need not be stored using searching one by one The mode of device carrys out searched targets data, so as to improve the recall precision of data.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments, for those of ordinary skill in the art, without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart of text searching method provided by one embodiment of the present invention;
Fig. 2 is a kind of structure diagram of text retrieval system provided by one embodiment of the present invention;
Fig. 3 is a kind of structure diagram for text retrieval system that another embodiment of the present invention provides;
Fig. 4 is the structure diagram of a kind of text retrieval system that another embodiment of the invention provides.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art The all other embodiments obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
As shown in Figure 1, an embodiment of the present invention provides a kind of text searching method, this method may comprise steps of:
Step 101:At least one file data is obtained, determines the corresponding description information of each described file data;
Step 102:According to the description information, the corresponding file index of each described file data is built;
Step 103:Obtain retrieval information input by user;
Step 104:At least one search key is parsed from the retrieval information;
Step 105:Determine that target corresponding at least one search key is literary from each file index Part indexes;
Step 106:Determine that the file destination indexes corresponding goal description information, and to the goal description information into Row displaying.
In above-described embodiment, the corresponding file of each file data is generated according to the description information of the file data got Index.When getting retrieval information input by user, search key is parsed from retrieval information, and determines to close with retrieval The corresponding file destination index of keyword, then indexes corresponding goal description information to file destination and is shown.It is achieved in To the automatically retrieval of each file data, and need not using searching one by one by the way of storage device come searched targets data, thus Improve the recall precision of data.
In one embodiment of the invention, this method may further include:Index database is built in preset memory locations;
The then specific embodiment of step 102 can include:
The file content in the description information is segmented using preset segmenter, at least one content is obtained and closes Keyword;
At least one content keyword is handled using the corresponding dictionary of the preset segmenter, and will place The description information is written in the content keyword after reason;
The description information is stored in the index database using preset index creation device, forms the file index.
In the present embodiment, the storage location of index file to be stored, such as disk are determined in local file system A is determined as the storage location of index database, and builds index database in the storage location.Then index creation device, index are constructed Creator can establishment file index, and file index is stored into the position into index database, and it is additional mode to set it.Then may be used Segmenter, such as IK segmenter is configured, multiple dictionaries, such as extension dictionary, disabling dictionary and thesaurus can be built, by building The vertical corresponding dictionary of dictionary adjustment segmenter, such as the dictionary of IKAnalyzor.When establishment file indexes, according to files classes Type creates corresponding document description, and sets the content in respective attributes domain, forms the description information of this document data, specific interior Hold as shown in table 1.
Table 1
Property Name Value
fileName Filename
fileDataName The title of file upload object
content File content
path File path
type File type
fileID File identifier
category Type
createTime Creation time
top_directory Higher level's catalogue
versionID Version number
The file content in description information is segmented using segmenter, forms multiple content keywords, and utilize tune Dictionary after whole handles content keyword, for example, content keyword includes " high " and " Xing Xing " two words, it can profit It is merged into " happy " with extension dictionary, and the synonym of " happy " is determined using thesaurus, such as determined Go out " happiness " and " happy ".Then by treated, description information is written in content keyword, replaces original file content, and profit Replaced description information is stored in index database by index of reference creator, forms the corresponding file index of this document data.As a result, will Each file index is unified in index database and is stored, and need to only be retrieved in retrieval for storage location residing for index database, The complexity that each disk is avoided to search, so as to further improve the recall precision of data.
In one embodiment of the invention, this method may further include:
Receive file deletion requests input by user;
According to the file deletion requests, file data to be deleted is determined from least one file data;
Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;
Using the index creation device by the description information to be deleted and the file index to be deleted from the rope Draw in library and delete.
Herein, it when receiving file deletion requests input by user, needs to delete corresponding file index, specifically Ground, can be determined from the file data obtained with the corresponding file data to be deleted of file deletion requests, will determine File data to be deleted deleted, and determine the corresponding description information to be deleted of the file data to be deleted and to be deleted Then file index is deleted the file index to be deleted and description information to be deleted using index creation device.Exist as a result, During deleting file data, its corresponding file index is also deleted together, avoids that specific text can not be obtained by file index Number of packages evidence, so as to improve the accuracy of retrieval.
It is noted that when being moved or being changed to file data, the corresponding file rope of this document can be first deleted Draw and description information, generate new description information further according to the file data of modification, re-establish modified file data File index thus when file data changes, can automatically create new file index, and realization is synchronous with file data, So that it is guaranteed that the accuracy of file index, the accuracy of retrieval is improved with this.
In one embodiment of the invention, after step 103, further comprise:
Obtain search condition input by user;
The specific embodiment of step 105 can include:
According to the search condition and each search key, the file destination index is determined.
Herein, user can customize search condition, such as retrieval time, retrieval file type and each search key Splicing relationship etc..Before the retrieval information for obtaining user, filename and file content can be pre-set and arranged in search result Corresponding weighted value in program process for example, the weighted value of setting filename is more than the weighted value of file content, then retrieves After multiple file datas corresponding with retrieval information, it is ranked up according to filename with retrieving the degree of correlation of information, i.e. weight The higher file data ranking of value is more forward.In addition, can also configure IK segmenter, using the extension dictionary pre-established, disable Dictionary and thesaurus handle search key, are conducive to further improve retrieval accuracy.
Specifically, it is described according to the search condition and each search key in one embodiment of the invention, really The fixed file destination index, including:
The retrieval time carried according to the search condition and the corresponding description information of each file index In creation time, corresponding with retrieval time alternative file index is determined from each file index;
File destination index corresponding with the search key is determined from the alternative file index determined.
In the present embodiment, the retrieval time range of search condition limitation input by user, then can be according to each number of files According to description information in createTime, i.e. the creation time of file index is screened, for example, during retrieval input by user Between for 2017.10.1-2017.11.1, then by creation time file index within the time period alternately file index, then It determines to index with the corresponding file destination of search key from these alternative files index, it is thus further to improve retrieval Accuracy.
This method can at least be realized by following procedure language:
Term begin=new Term (" ctreateTime ", dateBegin);
Term end=new Term (" ctreateTime ", dateEnd);
Query rangequery=new TermRangeQuery (" ctreateTime ", begin.bytes (), end.bytes(),true,true);
booleanQuery.add(rangequery,Occur.MUST).
It is described according to the search condition and each search key in one embodiment of the invention, it determines described File destination indexes, including:
The retrieval file type carried according to the search condition and the corresponding description of each file index Establishment file type in information determines alternative text corresponding with the retrieval file type from each file index Part indexes;
File destination index corresponding with the search key is determined from the alternative file index determined.
User can also set retrieval file type other than it can set retrieval time range, for example, the inspection of user setting Rope file type be Word when, then doc and docx type files can only be searched for when retrieving, other file types similarly, thus It can further improve the accuracy of retrieval.It is understood that when user does not make special setting to retrieval file type, can write from memory Recognize retrieval all files type.Specifically, the correspondence of the form of the retrieval file type and file data of user setting is such as Shown in table 2.
Table 2
File type Value
All All forms
Word doc、docx
PDF pdf
Excel xls、xlsx
TXT txt
PPT ppt、pptx
PICTURE bmp、jpg、jpeg、png、gif
VEDIO avi、wma、rmvb、mp4、flash、mp3、wav
It is described according to the search condition and each search key in one embodiment of the invention, it determines described File destination indexes, including:
According to the splicing relationship carried in the search condition, each search key is combined;
According to the search key after combination, the file destination index is determined.
Herein, user is other than being configured retrieval time and retrieval file type, moreover it is possible to set advanced search, i.e., Selected by combobox " and " "or" " being free of ", each search key is combined, to splice querying condition, In, " and " it is and operation that retrieval meets the file index of condition simultaneously;"or" is or operation to meet one;" no Containing " it is inverse, remove the file index for meeting condition behind " being free of ".It is understood that when the retrieval item of user setting When part includes retrieval time and retrieval file type, the syntagmatic of retrieval can also be determined by setting its splicing relationship, To be spliced into different search conditions.Thus can be used the self-defined search condition in family, be conducive to accurately retrieve meet user need The file index asked improves user experience.
It is noted that after corresponding goal description information is shown, preview and download can be also provided a user The function of corresponding document data.For example, have the file of the types such as Word, PDF, TXT in retrieval result, after user clicks file, This document can be found by the fileinfo in Attribute domain, and caches to browser and realizes preview.User can also pass through click Download button below file can find this document by the fileinfo in Attribute domain and download, and thus can be convenient for user Corresponding file data is obtained, so as to further improve user experience.
In addition, since Lucene is a set of for full-text search and the library of increasing income searched, by Apache software funds It can support and provide.Lucene provides a simple powerful application interface, can do full-text index and search.Make For a full-text search engine, have the advantages that prominent as follows:1st, index file form is independently of application platform.Lucene determines The a set of index file form based on octet of justice so that the application of compatible system or different platform can be shared The index file of foundation.2nd, on the basis of the inverted index of traditional full-text search engine, block index is realized, can be directed to New file establishes small documents index, promotes index speed.Then by merging with original index, achieve the purpose that optimization. 3rd, the system architecture of outstanding object-oriented so that reduced for the learning difficulty of Lucene extensions, facilitate expansion new function.4、 The text analyzing interface independently of language and file format is devised, index completes index file by receiving Token streams It foundes, user extends new language and file format, it is only necessary to realize the interface of text analyzing.5th, it is default realize it is a set of Powerful query engine, user can make system that can obtain powerful query capability without oneself writing code, and Lucene's looks into Asking acquiescence in realizing realizes boolean operation, fuzzy query (Fuzzy Search), Querying by group etc..Also, it is opened in Java Lucene is a ripe free Open-Source Tools in hair ring border, with professional platform independence, can provide one for software developer A kit easy to use is established the full-text search engine for being more suitable for current application, therefore can be based on based on this Lucene establishes the searching system towards Hadoop file system.
As shown in Fig. 2, an embodiment of the present invention provides a kind of text retrieval system, including:Index construct unit 201 obtains Take unit 202 and retrieval unit 203;Wherein,
The index construct unit 201 for obtaining at least one file data, determines each described file data pair The description information answered according to the description information, builds the corresponding file index of each described file data;
The acquiring unit 202, for obtaining retrieval information input by user, and parse from the retrieval information to A few search key;
The retrieval unit 203, for being determined and at least one search key from each file index Corresponding file destination index;Determine that the file destination indexes corresponding goal description information, and the goal description is believed Breath is shown.
In one embodiment of the invention, the acquiring unit 202 is further used for obtaining search condition input by user;
The retrieval unit 203, for according to the search condition and each search key, determining the target File index.
In one embodiment of the invention, the retrieval unit 203, for carried according to the search condition retrieval when Between and the corresponding description information of each file index in creation time, from each file index really Make alternative file index corresponding with the retrieval time;It is determined and the inspection from the alternative file index determined The corresponding file destination index of rope keyword.
In one embodiment of the invention, the retrieval unit 203, for the retrieval file carried according to the search condition Establishment file type in type and the corresponding description information of each file index, from each file rope Alternative file index corresponding with the retrieval file type is determined in drawing;From the alternative file index determined really Fixed file destination index corresponding with the search key.
In one embodiment of the invention, the retrieval unit 203, for being closed according to the splicing carried in the search condition System, is combined each search key;According to the search key after combination, the file destination rope is determined Draw.
As shown in figure 3, in one embodiment of the invention, which may further include:Setting unit 301;Wherein,
The setting unit 301, for building index database in preset memory locations;
The index construct unit 201, for using preset segmenter to the file content in the description information into Row participle, obtains at least one content keyword;Using the corresponding dictionary of the preset segmenter to it is described it is at least one in Hold keyword to be handled, and the description information is written in the content keyword by treated;It is created using preset index It builds device and the description information is stored in the index database, form the file index.
As shown in Figure 4.In one embodiment of the invention, which may further include:Index deletes unit 401;Its In,
The acquiring unit 302 is further used for receiving file deletion requests input by user;
The index deletes unit 401, for according to the file deletion requests, from least one file data Determine file data to be deleted;Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted; The description information to be deleted and the file index to be deleted are deleted from the index database using the index creation device It removes.
The contents such as the information exchange between each unit, implementation procedure in above device, due to implementing with the method for the present invention Example can be found in the narration in the method for the present invention embodiment based on same design, particular content, and details are not described herein again.
The embodiment of the present invention additionally provides a kind of readable medium, including execute instruction, when the processor of storage control is held During the row execute instruction, the storage control performs the method that any of the above-described embodiment of the present invention provides.
The embodiment of the present invention additionally provides a kind of storage control, including:Processor, memory and bus;The storage For storing execute instruction, the processor is connect device with the memory by the bus, when the storage control is transported During row, the processor performs the execute instruction of the memory storage, so that the storage control performs the present invention The method that any of the above-described embodiment provides.
In conclusion the above each embodiment of the present invention at least has the advantages that:
1st, in embodiments of the present invention, each file data is generated according to the description information of the file data got to correspond to File index.When getting retrieval information input by user, search key is parsed from retrieval information, and determine with The corresponding file destination index of search key, then indexes corresponding goal description information to file destination and is shown.By This realizes the automatically retrieval to each file data, and need not be by the way of storage device is searched one by one come searched targets number According to so as to improve the recall precision of data.
2nd, in embodiments of the present invention, index database is built in preset memory locations, then will be described using index creation device Information is stored in index database, forms file index.Each file index is unified in index database as a result, to store, in retrieval It need to only be retrieved for storage location residing for index database, the complexity that each disk is avoided to search, so as to further improve The recall precisions of data.
3rd, in embodiments of the present invention, when receiving file deletion requests input by user, from the number of files obtained Determined in the corresponding file data to be deleted of file deletion requests, the file data to be deleted determined is deleted It removes, and determines the corresponding description information to be deleted of file data to be deleted and file index to be deleted, then utilize index Creator deletes the file index to be deleted and description information to be deleted.It is as a result, in deleting file data, its is right The file index answered also is deleted together, avoids that specific file data can not be obtained by file index, so as to improve retrieval Accuracy.
4th, in embodiments of the present invention, when being moved or being changed to file data, the corresponding text of this document is first deleted Part indexes and description information, generates new description information further according to the file data of modification, re-establishes modified number of files According to file index, thus when file data changes, new file index can be automatically created, realized same with file data So that it is guaranteed that the accuracy of file index, the accuracy of retrieval is improved with this for step.
5th, in embodiments of the present invention, user-defined search condition can be made, including retrieval time, retrieval file type And each search condition and the splicing relationship of search key etc..Thus be conducive to accurately retrieve the text for meeting user demand Part indexes, and improves user experience.
6th, in embodiments of the present invention, after corresponding goal description information is shown, preview can also be provided a user With the function of downloading corresponding document data.Thus user can be convenient for obtain corresponding file data, so as to further improve user Experience.
It should be noted that herein, such as first and second etc relational terms are used merely to an entity Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation Any actual relationship or order.Moreover, term " comprising ", "comprising" or its any other variant be intended to it is non- It is exclusive to include, so that process, method, article or equipment including a series of elements not only include those elements, But also it including other elements that are not explicitly listed or further includes solid by this process, method, article or equipment Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged Except in the process, method, article or apparatus that includes the element also in the presence of other identical factor.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through The relevant hardware of program instruction is completed, and aforementioned program can be stored in computer-readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is performed;And aforementioned storage medium includes:ROM, RAM, magnetic disc or light In the various media that can store program code such as disk.
It is last it should be noted that:The foregoing is merely presently preferred embodiments of the present invention, is merely to illustrate the skill of the present invention Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention, Equivalent replacement, improvement etc., are all contained in protection scope of the present invention.

Claims (10)

1. a kind of text searching method, which is characterized in that including:
At least one file data is obtained, determines the corresponding description information of each described file data;
According to the description information, the corresponding file index of each described file data is built;
Obtain retrieval information input by user;
At least one search key is parsed from the retrieval information;
File destination index corresponding at least one search key is determined from each file index;
Determine that the file destination indexes corresponding goal description information, and the goal description information is shown.
2. according to the method described in claim 1, it is characterized in that,
After acquisition retrieval information input by user, further comprise:
Obtain search condition input by user;
It is described to determine that file destination corresponding at least one search key is indexed from each file index, it wraps It includes:
According to the search condition and each search key, the file destination index is determined.
3. according to the method described in claim 2, it is characterized in that,
It is described that the file destination index is determined according to the search condition and each search key, including:
In the retrieval time carried according to the search condition and the corresponding description information of each file index Creation time determines alternative file index corresponding with the retrieval time from each file index;
File destination index corresponding with the search key is determined from the alternative file index determined;
And/or
It is described that the file destination index is determined according to the search condition and each search key, including:
The retrieval file type and the corresponding description information of each file index carried according to the search condition In establishment file type, alternative file rope corresponding with the retrieval file type is determined from each file index Draw;
File destination index corresponding with the search key is determined from the alternative file index determined;
And/or
It is described that the file destination index is determined according to the search condition and each search key, including:
According to the splicing relationship carried in the search condition, each search key is combined;
According to the search key after combination, the file destination index is determined.
4. according to the method described in claim 1, it is characterized in that,
Further comprise:Index database is built in preset memory locations;
It is described that the corresponding file index of each described file data is built according to the description information, including:
The file content in the description information is segmented using preset segmenter, it is crucial to obtain at least one content Word;
At least one content keyword is handled using the corresponding dictionary of the preset segmenter, and will be after processing The content keyword description information is written;
The description information is stored in the index database using preset index creation device, forms the file index.
5. according to the method described in claim 4, it is characterized in that,
Further comprise:
Receive file deletion requests input by user;
According to the file deletion requests, file data to be deleted is determined from least one file data;
Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;
Using the index creation device by the description information to be deleted and the file index to be deleted from the index database Middle deletion.
6. a kind of text retrieval system, which is characterized in that including:Index construct unit, acquiring unit and retrieval unit;Wherein,
The index construct unit for obtaining at least one file data, determines each the file data is corresponding to retouch Information is stated, according to the description information, builds the corresponding file index of each described file data;
The acquiring unit for obtaining retrieval information input by user, and is parsed from the retrieval information at least one Search key;
The retrieval unit, for determining mesh corresponding at least one search key from each file index Mark file index;Determine that the file destination indexes corresponding goal description information, and the goal description information is opened up Show.
7. system according to claim 6, which is characterized in that
The acquiring unit is further used for obtaining search condition input by user;
The retrieval unit, for according to the search condition and each search key, determining the file destination rope Draw.
8. system according to claim 7, which is characterized in that
The retrieval unit, retrieval time and each file index for being carried according to the search condition correspond to The description information in creation time, determined from each file index corresponding with the retrieval time alternative File index;File destination index corresponding with the search key is determined from the alternative file index determined;
And/or
The retrieval unit, for the retrieval file type carried according to the search condition and each file index Establishment file type in the corresponding description information is determined and the retrieval file class from each file index The corresponding alternative file index of type;Mesh corresponding with the search key is determined from the alternative file index determined Mark file index;
And/or
The retrieval unit, for according to the splicing relationship carried in the search condition, to each search key into Row combination;According to the search key after combination, the file destination index is determined.
9. system according to claim 6, which is characterized in that
Further comprise:Setting unit;Wherein,
The setting unit, for building index database in preset memory locations;
The index construct unit, for being segmented using preset segmenter to the file content in the description information, Obtain at least one content keyword;Using the corresponding dictionary of the preset segmenter at least one content keyword It is handled, and the description information is written in the content keyword by treated;Using preset index creation device by institute It states description information and is stored in the index database, form the file index.
10. system according to claim 9, which is characterized in that
Further comprise:Index deletes unit;Wherein,
The acquiring unit is further used for receiving file deletion requests input by user;
The index deletes unit, for according to the file deletion requests, determining to treat from least one file data Deleting file data;Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;Using institute Index creation device is stated to delete the description information to be deleted and the file index to be deleted from the index database.
CN201711441728.8A 2017-12-27 2017-12-27 A kind of text searching method and system Pending CN108255972A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711441728.8A CN108255972A (en) 2017-12-27 2017-12-27 A kind of text searching method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711441728.8A CN108255972A (en) 2017-12-27 2017-12-27 A kind of text searching method and system

Publications (1)

Publication Number Publication Date
CN108255972A true CN108255972A (en) 2018-07-06

Family

ID=62724110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711441728.8A Pending CN108255972A (en) 2017-12-27 2017-12-27 A kind of text searching method and system

Country Status (1)

Country Link
CN (1) CN108255972A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299466A (en) * 2018-10-22 2019-02-01 中国船舶工业综合技术经济研究院 A kind of document retrieval method and system towards science and techniques of defence field
CN109902150A (en) * 2019-02-25 2019-06-18 南京庚商网络信息技术有限公司 Unstructured digital resource text searching method and system
CN110399339A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 File classifying method, device, equipment and the storage medium of knowledge base management system
CN110516157A (en) * 2019-08-30 2019-11-29 盈盛智创科技(广州)有限公司 A kind of document retrieval method, equipment and storage medium
CN110598009A (en) * 2019-09-12 2019-12-20 北京达佳互联信息技术有限公司 Method and device for searching works, electronic equipment and storage medium
CN111026712A (en) * 2019-11-04 2020-04-17 厦门天锐科技股份有限公司 File uploading method and device, file querying method and device and electronic equipment
CN111581410A (en) * 2020-05-29 2020-08-25 上海依图网络科技有限公司 Image retrieval method, apparatus, medium, and system thereof
CN111680072A (en) * 2020-05-07 2020-09-18 国家计算机网络与信息安全管理中心 Social information data-based partitioning system and method
CN113553354A (en) * 2021-07-23 2021-10-26 中信银行股份有限公司 Row number fuzzy query method and system based on specific word bank
CN113987146A (en) * 2021-10-22 2022-01-28 国网江苏省电力有限公司镇江供电分公司 Dedicated novel intelligence of electric power intranet system of asking for answering
CN117033307A (en) * 2023-10-07 2023-11-10 北京天信瑞安信息技术有限公司 File indexing method, device, electronic equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391941A (en) * 2014-11-25 2015-03-04 浪潮电子信息产业股份有限公司 Method for rapidly establishing full-text retrieval tool for common files
CN105279150A (en) * 2015-10-27 2016-01-27 江苏电力信息技术有限公司 Lucene full-text retrieval based Chinese word segmentation method
CN105574062A (en) * 2015-07-01 2016-05-11 宇龙计算机通信科技(深圳)有限公司 File retrieval method and apparatus and terminal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391941A (en) * 2014-11-25 2015-03-04 浪潮电子信息产业股份有限公司 Method for rapidly establishing full-text retrieval tool for common files
CN105574062A (en) * 2015-07-01 2016-05-11 宇龙计算机通信科技(深圳)有限公司 File retrieval method and apparatus and terminal
CN105279150A (en) * 2015-10-27 2016-01-27 江苏电力信息技术有限公司 Lucene full-text retrieval based Chinese word segmentation method

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299466B (en) * 2018-10-22 2023-07-07 中国船舶工业综合技术经济研究院 Document retrieval method and system oriented to national defense science and technology field
CN109299466A (en) * 2018-10-22 2019-02-01 中国船舶工业综合技术经济研究院 A kind of document retrieval method and system towards science and techniques of defence field
CN109902150A (en) * 2019-02-25 2019-06-18 南京庚商网络信息技术有限公司 Unstructured digital resource text searching method and system
CN110399339A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 File classifying method, device, equipment and the storage medium of knowledge base management system
CN110516157A (en) * 2019-08-30 2019-11-29 盈盛智创科技(广州)有限公司 A kind of document retrieval method, equipment and storage medium
CN110598009A (en) * 2019-09-12 2019-12-20 北京达佳互联信息技术有限公司 Method and device for searching works, electronic equipment and storage medium
CN110598009B (en) * 2019-09-12 2022-04-22 北京达佳互联信息技术有限公司 Method and device for searching works, electronic equipment and storage medium
CN111026712A (en) * 2019-11-04 2020-04-17 厦门天锐科技股份有限公司 File uploading method and device, file querying method and device and electronic equipment
CN111680072A (en) * 2020-05-07 2020-09-18 国家计算机网络与信息安全管理中心 Social information data-based partitioning system and method
CN111680072B (en) * 2020-05-07 2023-12-08 国家计算机网络与信息安全管理中心 System and method for dividing social information data
CN111581410A (en) * 2020-05-29 2020-08-25 上海依图网络科技有限公司 Image retrieval method, apparatus, medium, and system thereof
CN111581410B (en) * 2020-05-29 2023-11-14 上海依图网络科技有限公司 Image retrieval method, device, medium and system thereof
CN113553354A (en) * 2021-07-23 2021-10-26 中信银行股份有限公司 Row number fuzzy query method and system based on specific word bank
CN113987146A (en) * 2021-10-22 2022-01-28 国网江苏省电力有限公司镇江供电分公司 Dedicated novel intelligence of electric power intranet system of asking for answering
CN113987146B (en) * 2021-10-22 2023-01-31 国网江苏省电力有限公司镇江供电分公司 Dedicated intelligent question-answering system of electric power intranet
CN117033307A (en) * 2023-10-07 2023-11-10 北京天信瑞安信息技术有限公司 File indexing method, device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN108255972A (en) A kind of text searching method and system
US11163957B2 (en) Performing semantic graph search
US11126647B2 (en) System and method for hierarchically organizing documents based on document portions
US10169471B2 (en) Generating and executing query language statements from natural language
US20230018582A1 (en) Identifying relevant information within a document hosting system
US9251130B1 (en) Tagging annotations of electronic books
US20160098405A1 (en) Document Curation System
US20140201203A1 (en) System, method and device for providing an automated electronic researcher
US10678820B2 (en) System and method for computerized semantic indexing and searching
US11086860B2 (en) Predefined semantic queries
US9619570B2 (en) Searching content based on transferrable user search contexts
US11630833B2 (en) Extract-transform-load script generation
US11544306B2 (en) System and method for concept-based search summaries
US11886477B2 (en) System and method for quote-based search summaries
US20130159222A1 (en) Interactive interface for object search
CN115757689A (en) Information query system, method and equipment
Schaffert et al. The linked media framework: Integrating and interlinking enterprise media content and data
KR101272656B1 (en) Method of file management based on tag and system of the same
US11861321B1 (en) Systems and methods for structure discovery and structure-based analysis in natural language processing models
US11940953B2 (en) Assisted updating of electronic documents
Holzmann et al. ABCDEF: The 6 key features behind scalable, multi-tenant web archive processing with ARCH: Archive, Big Data, Concurrent, Distributed, Efficient, Flexible
US9886497B2 (en) Indexing presentation slides
US9342586B2 (en) Managing and using shareable search lists
US20160085850A1 (en) Knowledge brokering and knowledge campaigns
Mashwani et al. 360 semantic file system: augmented directory navigation for nonhierarchical retrieval of files

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180706

RJ01 Rejection of invention patent application after publication