CN108255972A - A kind of text searching method and system - Google Patents
A kind of text searching method and system Download PDFInfo
- Publication number
- CN108255972A CN108255972A CN201711441728.8A CN201711441728A CN108255972A CN 108255972 A CN108255972 A CN 108255972A CN 201711441728 A CN201711441728 A CN 201711441728A CN 108255972 A CN108255972 A CN 108255972A
- Authority
- CN
- China
- Prior art keywords
- file
- index
- description information
- retrieval
- determined
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of text searching method and system, this method includes:At least one file data is obtained, determines the corresponding description information of each described file data;According to the description information, the corresponding file index of each described file data is built;Obtain retrieval information input by user;At least one search key is parsed from the retrieval information;File destination index corresponding at least one search key is determined from each file index;Determine that the file destination indexes corresponding goal description information, and the goal description information is shown.This programme can improve data search efficiency.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of text searching method and system.
Background technology
With the development of computer technology, data are presented explosive growth, how from file system quick-searching is to mesh
Data are marked, there is great influence to improving data-handling efficiency.
The distributed file system that Hadoop is provided can store a large amount of data, and each data dispersion is stored in different deposit
In storage device, such as it is dispersed in each disk.User needs to look into storage device one by one in searched targets data
It sees, to determine to whether there is target data in storage device.
Since the data volume stored in file system is larger, the storage location of each data is also not quite similar, therefore uses
The mode for searching storage device one by one carrys out searched targets data, less efficient.
Invention content
An embodiment of the present invention provides a kind of text searching method and systems, can improve the recall precision of data.
In a first aspect, an embodiment of the present invention provides a kind of text searching method, including:
At least one file data is obtained, determines the corresponding description information of each described file data;
According to the description information, the corresponding file index of each described file data is built;
Obtain retrieval information input by user;
At least one search key is parsed from the retrieval information;
File destination index corresponding at least one search key is determined from each file index;
Determine that the file destination indexes corresponding goal description information, and the goal description information is shown.
Preferably,
After acquisition retrieval information input by user, further comprise:
Obtain search condition input by user;
It is described that file destination rope corresponding at least one search key is determined from each file index
Draw, including:
According to the search condition and each search key, the file destination index is determined.
Preferably,
It is described that the file destination index is determined according to the search condition and each search key, including:
The retrieval time carried according to the search condition and the corresponding description information of each file index
In creation time, corresponding with retrieval time alternative file index is determined from each file index;
File destination index corresponding with the search key is determined from the alternative file index determined;
Preferably,
It is described that the file destination index is determined according to the search condition and each search key, including:
The retrieval file type carried according to the search condition and the corresponding description of each file index
Establishment file type in information determines alternative text corresponding with the retrieval file type from each file index
Part indexes;
File destination index corresponding with the search key is determined from the alternative file index determined;
Preferably,
It is described that the file destination index is determined according to the search condition and each search key, including:
According to the splicing relationship carried in the search condition, each search key is combined;
According to the search key after combination, the file destination index is determined.
Preferably,
Further comprise:Index database is built in preset memory locations;
It is described that the corresponding file index of each described file data is built according to the description information, including:
The file content in the description information is segmented using preset segmenter, at least one content is obtained and closes
Keyword;
At least one content keyword is handled using the corresponding dictionary of the preset segmenter, and will place
The description information is written in the content keyword after reason;
The description information is stored in the index database using preset index creation device, forms the file index.
Preferably,
Further comprise:
Receive file deletion requests input by user;
According to the file deletion requests, file data to be deleted is determined from least one file data;
Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;
Using the index creation device by the description information to be deleted and the file index to be deleted from the rope
Draw in library and delete.
Second aspect, an embodiment of the present invention provides a kind of text retrieval system, including:Index construct unit obtains list
Member and retrieval unit;Wherein,
The index construct unit for obtaining at least one file data, determines that each described file data corresponds to
Description information, according to the description information, build the corresponding file index of each described file data;
The acquiring unit for obtaining retrieval information input by user, and parses at least from the retrieval information
One search key;
The retrieval unit, it is corresponding at least one search key for being determined from each file index
File destination index;Determine that the file destination indexes corresponding goal description information, and to the goal description information into
Row displaying.
Preferably,
The acquiring unit is further used for obtaining search condition input by user;
The retrieval unit, for according to the search condition and each search key, determining the target text
Part indexes.
Preferably,
The retrieval unit, for the retrieval time carried according to the search condition and each file index
Creation time in the corresponding description information is determined corresponding with the retrieval time from each file index
Alternative file indexes;File destination rope corresponding with the search key is determined from the alternative file index determined
Draw;
Preferably,
The retrieval unit, for the retrieval file type carried according to the search condition and each file
The establishment file type in the corresponding description information is indexed, is determined from each file index and the retrieval text
The corresponding alternative file index of part type;It is determined from the alternative file index determined corresponding with the search key
File destination index;
Preferably,
The retrieval unit, it is crucial to each retrieval for according to the splicing relationship carried in the search condition
Word is combined;According to the search key after combination, the file destination index is determined.
Preferably,
Further comprise:Setting unit;Wherein,
The setting unit, for building index database in preset memory locations;
The index construct unit, for being divided using preset segmenter the file content in the description information
Word obtains at least one content keyword;At least one content is closed using the corresponding dictionary of the preset segmenter
Keyword is handled, and the description information is written in the content keyword by treated;Utilize preset index creation device
The description information is stored in the index database, forms the file index.
Preferably,
Further comprise:Index deletes unit;Wherein,
The acquiring unit is further used for receiving file deletion requests input by user;
The index deletes unit, for according to the file deletion requests, from least one file data really
Fixed file data to be deleted;Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;Profit
The description information to be deleted and the file index to be deleted are deleted from the index database with the index creation device.
An embodiment of the present invention provides a kind of text searching method and systems, are believed according to the description of the file data got
Breath generates the corresponding file index of each file data.When getting retrieval information input by user, solved from retrieval information
Search key is precipitated, and determines file destination index corresponding with search key, then file destination is indexed corresponding
Goal description information is shown.The automatically retrieval to each file data is achieved in, and need not be stored using searching one by one
The mode of device carrys out searched targets data, so as to improve the recall precision of data.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention
Some embodiments, for those of ordinary skill in the art, without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart of text searching method provided by one embodiment of the present invention;
Fig. 2 is a kind of structure diagram of text retrieval system provided by one embodiment of the present invention;
Fig. 3 is a kind of structure diagram for text retrieval system that another embodiment of the present invention provides;
Fig. 4 is the structure diagram of a kind of text retrieval system that another embodiment of the invention provides.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention
In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art
The all other embodiments obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
As shown in Figure 1, an embodiment of the present invention provides a kind of text searching method, this method may comprise steps of:
Step 101:At least one file data is obtained, determines the corresponding description information of each described file data;
Step 102:According to the description information, the corresponding file index of each described file data is built;
Step 103:Obtain retrieval information input by user;
Step 104:At least one search key is parsed from the retrieval information;
Step 105:Determine that target corresponding at least one search key is literary from each file index
Part indexes;
Step 106:Determine that the file destination indexes corresponding goal description information, and to the goal description information into
Row displaying.
In above-described embodiment, the corresponding file of each file data is generated according to the description information of the file data got
Index.When getting retrieval information input by user, search key is parsed from retrieval information, and determines to close with retrieval
The corresponding file destination index of keyword, then indexes corresponding goal description information to file destination and is shown.It is achieved in
To the automatically retrieval of each file data, and need not using searching one by one by the way of storage device come searched targets data, thus
Improve the recall precision of data.
In one embodiment of the invention, this method may further include:Index database is built in preset memory locations;
The then specific embodiment of step 102 can include:
The file content in the description information is segmented using preset segmenter, at least one content is obtained and closes
Keyword;
At least one content keyword is handled using the corresponding dictionary of the preset segmenter, and will place
The description information is written in the content keyword after reason;
The description information is stored in the index database using preset index creation device, forms the file index.
In the present embodiment, the storage location of index file to be stored, such as disk are determined in local file system
A is determined as the storage location of index database, and builds index database in the storage location.Then index creation device, index are constructed
Creator can establishment file index, and file index is stored into the position into index database, and it is additional mode to set it.Then may be used
Segmenter, such as IK segmenter is configured, multiple dictionaries, such as extension dictionary, disabling dictionary and thesaurus can be built, by building
The vertical corresponding dictionary of dictionary adjustment segmenter, such as the dictionary of IKAnalyzor.When establishment file indexes, according to files classes
Type creates corresponding document description, and sets the content in respective attributes domain, forms the description information of this document data, specific interior
Hold as shown in table 1.
Table 1
Property Name | Value |
fileName | Filename |
fileDataName | The title of file upload object |
content | File content |
path | File path |
type | File type |
fileID | File identifier |
category | Type |
createTime | Creation time |
top_directory | Higher level's catalogue |
versionID | Version number |
The file content in description information is segmented using segmenter, forms multiple content keywords, and utilize tune
Dictionary after whole handles content keyword, for example, content keyword includes " high " and " Xing Xing " two words, it can profit
It is merged into " happy " with extension dictionary, and the synonym of " happy " is determined using thesaurus, such as determined
Go out " happiness " and " happy ".Then by treated, description information is written in content keyword, replaces original file content, and profit
Replaced description information is stored in index database by index of reference creator, forms the corresponding file index of this document data.As a result, will
Each file index is unified in index database and is stored, and need to only be retrieved in retrieval for storage location residing for index database,
The complexity that each disk is avoided to search, so as to further improve the recall precision of data.
In one embodiment of the invention, this method may further include:
Receive file deletion requests input by user;
According to the file deletion requests, file data to be deleted is determined from least one file data;
Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;
Using the index creation device by the description information to be deleted and the file index to be deleted from the rope
Draw in library and delete.
Herein, it when receiving file deletion requests input by user, needs to delete corresponding file index, specifically
Ground, can be determined from the file data obtained with the corresponding file data to be deleted of file deletion requests, will determine
File data to be deleted deleted, and determine the corresponding description information to be deleted of the file data to be deleted and to be deleted
Then file index is deleted the file index to be deleted and description information to be deleted using index creation device.Exist as a result,
During deleting file data, its corresponding file index is also deleted together, avoids that specific text can not be obtained by file index
Number of packages evidence, so as to improve the accuracy of retrieval.
It is noted that when being moved or being changed to file data, the corresponding file rope of this document can be first deleted
Draw and description information, generate new description information further according to the file data of modification, re-establish modified file data
File index thus when file data changes, can automatically create new file index, and realization is synchronous with file data,
So that it is guaranteed that the accuracy of file index, the accuracy of retrieval is improved with this.
In one embodiment of the invention, after step 103, further comprise:
Obtain search condition input by user;
The specific embodiment of step 105 can include:
According to the search condition and each search key, the file destination index is determined.
Herein, user can customize search condition, such as retrieval time, retrieval file type and each search key
Splicing relationship etc..Before the retrieval information for obtaining user, filename and file content can be pre-set and arranged in search result
Corresponding weighted value in program process for example, the weighted value of setting filename is more than the weighted value of file content, then retrieves
After multiple file datas corresponding with retrieval information, it is ranked up according to filename with retrieving the degree of correlation of information, i.e. weight
The higher file data ranking of value is more forward.In addition, can also configure IK segmenter, using the extension dictionary pre-established, disable
Dictionary and thesaurus handle search key, are conducive to further improve retrieval accuracy.
Specifically, it is described according to the search condition and each search key in one embodiment of the invention, really
The fixed file destination index, including:
The retrieval time carried according to the search condition and the corresponding description information of each file index
In creation time, corresponding with retrieval time alternative file index is determined from each file index;
File destination index corresponding with the search key is determined from the alternative file index determined.
In the present embodiment, the retrieval time range of search condition limitation input by user, then can be according to each number of files
According to description information in createTime, i.e. the creation time of file index is screened, for example, during retrieval input by user
Between for 2017.10.1-2017.11.1, then by creation time file index within the time period alternately file index, then
It determines to index with the corresponding file destination of search key from these alternative files index, it is thus further to improve retrieval
Accuracy.
This method can at least be realized by following procedure language:
Term begin=new Term (" ctreateTime ", dateBegin);
Term end=new Term (" ctreateTime ", dateEnd);
Query rangequery=new TermRangeQuery (" ctreateTime ", begin.bytes (),
end.bytes(),true,true);
booleanQuery.add(rangequery,Occur.MUST).
It is described according to the search condition and each search key in one embodiment of the invention, it determines described
File destination indexes, including:
The retrieval file type carried according to the search condition and the corresponding description of each file index
Establishment file type in information determines alternative text corresponding with the retrieval file type from each file index
Part indexes;
File destination index corresponding with the search key is determined from the alternative file index determined.
User can also set retrieval file type other than it can set retrieval time range, for example, the inspection of user setting
Rope file type be Word when, then doc and docx type files can only be searched for when retrieving, other file types similarly, thus
It can further improve the accuracy of retrieval.It is understood that when user does not make special setting to retrieval file type, can write from memory
Recognize retrieval all files type.Specifically, the correspondence of the form of the retrieval file type and file data of user setting is such as
Shown in table 2.
Table 2
File type | Value |
All | All forms |
Word | doc、docx |
Excel | xls、xlsx |
TXT | txt |
PPT | ppt、pptx |
PICTURE | bmp、jpg、jpeg、png、gif |
VEDIO | avi、wma、rmvb、mp4、flash、mp3、wav |
It is described according to the search condition and each search key in one embodiment of the invention, it determines described
File destination indexes, including:
According to the splicing relationship carried in the search condition, each search key is combined;
According to the search key after combination, the file destination index is determined.
Herein, user is other than being configured retrieval time and retrieval file type, moreover it is possible to set advanced search, i.e.,
Selected by combobox " and " "or" " being free of ", each search key is combined, to splice querying condition,
In, " and " it is and operation that retrieval meets the file index of condition simultaneously;"or" is or operation to meet one;" no
Containing " it is inverse, remove the file index for meeting condition behind " being free of ".It is understood that when the retrieval item of user setting
When part includes retrieval time and retrieval file type, the syntagmatic of retrieval can also be determined by setting its splicing relationship,
To be spliced into different search conditions.Thus can be used the self-defined search condition in family, be conducive to accurately retrieve meet user need
The file index asked improves user experience.
It is noted that after corresponding goal description information is shown, preview and download can be also provided a user
The function of corresponding document data.For example, have the file of the types such as Word, PDF, TXT in retrieval result, after user clicks file,
This document can be found by the fileinfo in Attribute domain, and caches to browser and realizes preview.User can also pass through click
Download button below file can find this document by the fileinfo in Attribute domain and download, and thus can be convenient for user
Corresponding file data is obtained, so as to further improve user experience.
In addition, since Lucene is a set of for full-text search and the library of increasing income searched, by Apache software funds
It can support and provide.Lucene provides a simple powerful application interface, can do full-text index and search.Make
For a full-text search engine, have the advantages that prominent as follows:1st, index file form is independently of application platform.Lucene determines
The a set of index file form based on octet of justice so that the application of compatible system or different platform can be shared
The index file of foundation.2nd, on the basis of the inverted index of traditional full-text search engine, block index is realized, can be directed to
New file establishes small documents index, promotes index speed.Then by merging with original index, achieve the purpose that optimization.
3rd, the system architecture of outstanding object-oriented so that reduced for the learning difficulty of Lucene extensions, facilitate expansion new function.4、
The text analyzing interface independently of language and file format is devised, index completes index file by receiving Token streams
It foundes, user extends new language and file format, it is only necessary to realize the interface of text analyzing.5th, it is default realize it is a set of
Powerful query engine, user can make system that can obtain powerful query capability without oneself writing code, and Lucene's looks into
Asking acquiescence in realizing realizes boolean operation, fuzzy query (Fuzzy Search), Querying by group etc..Also, it is opened in Java
Lucene is a ripe free Open-Source Tools in hair ring border, with professional platform independence, can provide one for software developer
A kit easy to use is established the full-text search engine for being more suitable for current application, therefore can be based on based on this
Lucene establishes the searching system towards Hadoop file system.
As shown in Fig. 2, an embodiment of the present invention provides a kind of text retrieval system, including:Index construct unit 201 obtains
Take unit 202 and retrieval unit 203;Wherein,
The index construct unit 201 for obtaining at least one file data, determines each described file data pair
The description information answered according to the description information, builds the corresponding file index of each described file data;
The acquiring unit 202, for obtaining retrieval information input by user, and parse from the retrieval information to
A few search key;
The retrieval unit 203, for being determined and at least one search key from each file index
Corresponding file destination index;Determine that the file destination indexes corresponding goal description information, and the goal description is believed
Breath is shown.
In one embodiment of the invention, the acquiring unit 202 is further used for obtaining search condition input by user;
The retrieval unit 203, for according to the search condition and each search key, determining the target
File index.
In one embodiment of the invention, the retrieval unit 203, for carried according to the search condition retrieval when
Between and the corresponding description information of each file index in creation time, from each file index really
Make alternative file index corresponding with the retrieval time;It is determined and the inspection from the alternative file index determined
The corresponding file destination index of rope keyword.
In one embodiment of the invention, the retrieval unit 203, for the retrieval file carried according to the search condition
Establishment file type in type and the corresponding description information of each file index, from each file rope
Alternative file index corresponding with the retrieval file type is determined in drawing;From the alternative file index determined really
Fixed file destination index corresponding with the search key.
In one embodiment of the invention, the retrieval unit 203, for being closed according to the splicing carried in the search condition
System, is combined each search key;According to the search key after combination, the file destination rope is determined
Draw.
As shown in figure 3, in one embodiment of the invention, which may further include:Setting unit 301;Wherein,
The setting unit 301, for building index database in preset memory locations;
The index construct unit 201, for using preset segmenter to the file content in the description information into
Row participle, obtains at least one content keyword;Using the corresponding dictionary of the preset segmenter to it is described it is at least one in
Hold keyword to be handled, and the description information is written in the content keyword by treated;It is created using preset index
It builds device and the description information is stored in the index database, form the file index.
As shown in Figure 4.In one embodiment of the invention, which may further include:Index deletes unit 401;Its
In,
The acquiring unit 302 is further used for receiving file deletion requests input by user;
The index deletes unit 401, for according to the file deletion requests, from least one file data
Determine file data to be deleted;Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;
The description information to be deleted and the file index to be deleted are deleted from the index database using the index creation device
It removes.
The contents such as the information exchange between each unit, implementation procedure in above device, due to implementing with the method for the present invention
Example can be found in the narration in the method for the present invention embodiment based on same design, particular content, and details are not described herein again.
The embodiment of the present invention additionally provides a kind of readable medium, including execute instruction, when the processor of storage control is held
During the row execute instruction, the storage control performs the method that any of the above-described embodiment of the present invention provides.
The embodiment of the present invention additionally provides a kind of storage control, including:Processor, memory and bus;The storage
For storing execute instruction, the processor is connect device with the memory by the bus, when the storage control is transported
During row, the processor performs the execute instruction of the memory storage, so that the storage control performs the present invention
The method that any of the above-described embodiment provides.
In conclusion the above each embodiment of the present invention at least has the advantages that:
1st, in embodiments of the present invention, each file data is generated according to the description information of the file data got to correspond to
File index.When getting retrieval information input by user, search key is parsed from retrieval information, and determine with
The corresponding file destination index of search key, then indexes corresponding goal description information to file destination and is shown.By
This realizes the automatically retrieval to each file data, and need not be by the way of storage device is searched one by one come searched targets number
According to so as to improve the recall precision of data.
2nd, in embodiments of the present invention, index database is built in preset memory locations, then will be described using index creation device
Information is stored in index database, forms file index.Each file index is unified in index database as a result, to store, in retrieval
It need to only be retrieved for storage location residing for index database, the complexity that each disk is avoided to search, so as to further improve
The recall precisions of data.
3rd, in embodiments of the present invention, when receiving file deletion requests input by user, from the number of files obtained
Determined in the corresponding file data to be deleted of file deletion requests, the file data to be deleted determined is deleted
It removes, and determines the corresponding description information to be deleted of file data to be deleted and file index to be deleted, then utilize index
Creator deletes the file index to be deleted and description information to be deleted.It is as a result, in deleting file data, its is right
The file index answered also is deleted together, avoids that specific file data can not be obtained by file index, so as to improve retrieval
Accuracy.
4th, in embodiments of the present invention, when being moved or being changed to file data, the corresponding text of this document is first deleted
Part indexes and description information, generates new description information further according to the file data of modification, re-establishes modified number of files
According to file index, thus when file data changes, new file index can be automatically created, realized same with file data
So that it is guaranteed that the accuracy of file index, the accuracy of retrieval is improved with this for step.
5th, in embodiments of the present invention, user-defined search condition can be made, including retrieval time, retrieval file type
And each search condition and the splicing relationship of search key etc..Thus be conducive to accurately retrieve the text for meeting user demand
Part indexes, and improves user experience.
6th, in embodiments of the present invention, after corresponding goal description information is shown, preview can also be provided a user
With the function of downloading corresponding document data.Thus user can be convenient for obtain corresponding file data, so as to further improve user
Experience.
It should be noted that herein, such as first and second etc relational terms are used merely to an entity
Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation
Any actual relationship or order.Moreover, term " comprising ", "comprising" or its any other variant be intended to it is non-
It is exclusive to include, so that process, method, article or equipment including a series of elements not only include those elements,
But also it including other elements that are not explicitly listed or further includes solid by this process, method, article or equipment
Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged
Except in the process, method, article or apparatus that includes the element also in the presence of other identical factor.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through
The relevant hardware of program instruction is completed, and aforementioned program can be stored in computer-readable storage medium, the program
When being executed, step including the steps of the foregoing method embodiments is performed;And aforementioned storage medium includes:ROM, RAM, magnetic disc or light
In the various media that can store program code such as disk.
It is last it should be noted that:The foregoing is merely presently preferred embodiments of the present invention, is merely to illustrate the skill of the present invention
Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention,
Equivalent replacement, improvement etc., are all contained in protection scope of the present invention.
Claims (10)
1. a kind of text searching method, which is characterized in that including:
At least one file data is obtained, determines the corresponding description information of each described file data;
According to the description information, the corresponding file index of each described file data is built;
Obtain retrieval information input by user;
At least one search key is parsed from the retrieval information;
File destination index corresponding at least one search key is determined from each file index;
Determine that the file destination indexes corresponding goal description information, and the goal description information is shown.
2. according to the method described in claim 1, it is characterized in that,
After acquisition retrieval information input by user, further comprise:
Obtain search condition input by user;
It is described to determine that file destination corresponding at least one search key is indexed from each file index, it wraps
It includes:
According to the search condition and each search key, the file destination index is determined.
3. according to the method described in claim 2, it is characterized in that,
It is described that the file destination index is determined according to the search condition and each search key, including:
In the retrieval time carried according to the search condition and the corresponding description information of each file index
Creation time determines alternative file index corresponding with the retrieval time from each file index;
File destination index corresponding with the search key is determined from the alternative file index determined;
And/or
It is described that the file destination index is determined according to the search condition and each search key, including:
The retrieval file type and the corresponding description information of each file index carried according to the search condition
In establishment file type, alternative file rope corresponding with the retrieval file type is determined from each file index
Draw;
File destination index corresponding with the search key is determined from the alternative file index determined;
And/or
It is described that the file destination index is determined according to the search condition and each search key, including:
According to the splicing relationship carried in the search condition, each search key is combined;
According to the search key after combination, the file destination index is determined.
4. according to the method described in claim 1, it is characterized in that,
Further comprise:Index database is built in preset memory locations;
It is described that the corresponding file index of each described file data is built according to the description information, including:
The file content in the description information is segmented using preset segmenter, it is crucial to obtain at least one content
Word;
At least one content keyword is handled using the corresponding dictionary of the preset segmenter, and will be after processing
The content keyword description information is written;
The description information is stored in the index database using preset index creation device, forms the file index.
5. according to the method described in claim 4, it is characterized in that,
Further comprise:
Receive file deletion requests input by user;
According to the file deletion requests, file data to be deleted is determined from least one file data;
Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;
Using the index creation device by the description information to be deleted and the file index to be deleted from the index database
Middle deletion.
6. a kind of text retrieval system, which is characterized in that including:Index construct unit, acquiring unit and retrieval unit;Wherein,
The index construct unit for obtaining at least one file data, determines each the file data is corresponding to retouch
Information is stated, according to the description information, builds the corresponding file index of each described file data;
The acquiring unit for obtaining retrieval information input by user, and is parsed from the retrieval information at least one
Search key;
The retrieval unit, for determining mesh corresponding at least one search key from each file index
Mark file index;Determine that the file destination indexes corresponding goal description information, and the goal description information is opened up
Show.
7. system according to claim 6, which is characterized in that
The acquiring unit is further used for obtaining search condition input by user;
The retrieval unit, for according to the search condition and each search key, determining the file destination rope
Draw.
8. system according to claim 7, which is characterized in that
The retrieval unit, retrieval time and each file index for being carried according to the search condition correspond to
The description information in creation time, determined from each file index corresponding with the retrieval time alternative
File index;File destination index corresponding with the search key is determined from the alternative file index determined;
And/or
The retrieval unit, for the retrieval file type carried according to the search condition and each file index
Establishment file type in the corresponding description information is determined and the retrieval file class from each file index
The corresponding alternative file index of type;Mesh corresponding with the search key is determined from the alternative file index determined
Mark file index;
And/or
The retrieval unit, for according to the splicing relationship carried in the search condition, to each search key into
Row combination;According to the search key after combination, the file destination index is determined.
9. system according to claim 6, which is characterized in that
Further comprise:Setting unit;Wherein,
The setting unit, for building index database in preset memory locations;
The index construct unit, for being segmented using preset segmenter to the file content in the description information,
Obtain at least one content keyword;Using the corresponding dictionary of the preset segmenter at least one content keyword
It is handled, and the description information is written in the content keyword by treated;Using preset index creation device by institute
It states description information and is stored in the index database, form the file index.
10. system according to claim 9, which is characterized in that
Further comprise:Index deletes unit;Wherein,
The acquiring unit is further used for receiving file deletion requests input by user;
The index deletes unit, for according to the file deletion requests, determining to treat from least one file data
Deleting file data;Determine the corresponding description information to be deleted of the file to be deleted and file index to be deleted;Using institute
Index creation device is stated to delete the description information to be deleted and the file index to be deleted from the index database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711441728.8A CN108255972A (en) | 2017-12-27 | 2017-12-27 | A kind of text searching method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711441728.8A CN108255972A (en) | 2017-12-27 | 2017-12-27 | A kind of text searching method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108255972A true CN108255972A (en) | 2018-07-06 |
Family
ID=62724110
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711441728.8A Pending CN108255972A (en) | 2017-12-27 | 2017-12-27 | A kind of text searching method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108255972A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299466A (en) * | 2018-10-22 | 2019-02-01 | 中国船舶工业综合技术经济研究院 | A kind of document retrieval method and system towards science and techniques of defence field |
CN109902150A (en) * | 2019-02-25 | 2019-06-18 | 南京庚商网络信息技术有限公司 | Unstructured digital resource text searching method and system |
CN110399339A (en) * | 2019-06-18 | 2019-11-01 | 平安科技(深圳)有限公司 | File classifying method, device, equipment and the storage medium of knowledge base management system |
CN110516157A (en) * | 2019-08-30 | 2019-11-29 | 盈盛智创科技(广州)有限公司 | A kind of document retrieval method, equipment and storage medium |
CN110598009A (en) * | 2019-09-12 | 2019-12-20 | 北京达佳互联信息技术有限公司 | Method and device for searching works, electronic equipment and storage medium |
CN111026712A (en) * | 2019-11-04 | 2020-04-17 | 厦门天锐科技股份有限公司 | File uploading method and device, file querying method and device and electronic equipment |
CN111581410A (en) * | 2020-05-29 | 2020-08-25 | 上海依图网络科技有限公司 | Image retrieval method, apparatus, medium, and system thereof |
CN111680072A (en) * | 2020-05-07 | 2020-09-18 | 国家计算机网络与信息安全管理中心 | Social information data-based partitioning system and method |
CN113553354A (en) * | 2021-07-23 | 2021-10-26 | 中信银行股份有限公司 | Row number fuzzy query method and system based on specific word bank |
CN113987146A (en) * | 2021-10-22 | 2022-01-28 | 国网江苏省电力有限公司镇江供电分公司 | Dedicated novel intelligence of electric power intranet system of asking for answering |
CN117033307A (en) * | 2023-10-07 | 2023-11-10 | 北京天信瑞安信息技术有限公司 | File indexing method, device, electronic equipment and computer readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391941A (en) * | 2014-11-25 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Method for rapidly establishing full-text retrieval tool for common files |
CN105279150A (en) * | 2015-10-27 | 2016-01-27 | 江苏电力信息技术有限公司 | Lucene full-text retrieval based Chinese word segmentation method |
CN105574062A (en) * | 2015-07-01 | 2016-05-11 | 宇龙计算机通信科技(深圳)有限公司 | File retrieval method and apparatus and terminal |
-
2017
- 2017-12-27 CN CN201711441728.8A patent/CN108255972A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391941A (en) * | 2014-11-25 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Method for rapidly establishing full-text retrieval tool for common files |
CN105574062A (en) * | 2015-07-01 | 2016-05-11 | 宇龙计算机通信科技(深圳)有限公司 | File retrieval method and apparatus and terminal |
CN105279150A (en) * | 2015-10-27 | 2016-01-27 | 江苏电力信息技术有限公司 | Lucene full-text retrieval based Chinese word segmentation method |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299466B (en) * | 2018-10-22 | 2023-07-07 | 中国船舶工业综合技术经济研究院 | Document retrieval method and system oriented to national defense science and technology field |
CN109299466A (en) * | 2018-10-22 | 2019-02-01 | 中国船舶工业综合技术经济研究院 | A kind of document retrieval method and system towards science and techniques of defence field |
CN109902150A (en) * | 2019-02-25 | 2019-06-18 | 南京庚商网络信息技术有限公司 | Unstructured digital resource text searching method and system |
CN110399339A (en) * | 2019-06-18 | 2019-11-01 | 平安科技(深圳)有限公司 | File classifying method, device, equipment and the storage medium of knowledge base management system |
CN110516157A (en) * | 2019-08-30 | 2019-11-29 | 盈盛智创科技(广州)有限公司 | A kind of document retrieval method, equipment and storage medium |
CN110598009A (en) * | 2019-09-12 | 2019-12-20 | 北京达佳互联信息技术有限公司 | Method and device for searching works, electronic equipment and storage medium |
CN110598009B (en) * | 2019-09-12 | 2022-04-22 | 北京达佳互联信息技术有限公司 | Method and device for searching works, electronic equipment and storage medium |
CN111026712A (en) * | 2019-11-04 | 2020-04-17 | 厦门天锐科技股份有限公司 | File uploading method and device, file querying method and device and electronic equipment |
CN111680072A (en) * | 2020-05-07 | 2020-09-18 | 国家计算机网络与信息安全管理中心 | Social information data-based partitioning system and method |
CN111680072B (en) * | 2020-05-07 | 2023-12-08 | 国家计算机网络与信息安全管理中心 | System and method for dividing social information data |
CN111581410A (en) * | 2020-05-29 | 2020-08-25 | 上海依图网络科技有限公司 | Image retrieval method, apparatus, medium, and system thereof |
CN111581410B (en) * | 2020-05-29 | 2023-11-14 | 上海依图网络科技有限公司 | Image retrieval method, device, medium and system thereof |
CN113553354A (en) * | 2021-07-23 | 2021-10-26 | 中信银行股份有限公司 | Row number fuzzy query method and system based on specific word bank |
CN113987146A (en) * | 2021-10-22 | 2022-01-28 | 国网江苏省电力有限公司镇江供电分公司 | Dedicated novel intelligence of electric power intranet system of asking for answering |
CN113987146B (en) * | 2021-10-22 | 2023-01-31 | 国网江苏省电力有限公司镇江供电分公司 | Dedicated intelligent question-answering system of electric power intranet |
CN117033307A (en) * | 2023-10-07 | 2023-11-10 | 北京天信瑞安信息技术有限公司 | File indexing method, device, electronic equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108255972A (en) | A kind of text searching method and system | |
US11163957B2 (en) | Performing semantic graph search | |
US11126647B2 (en) | System and method for hierarchically organizing documents based on document portions | |
US10169471B2 (en) | Generating and executing query language statements from natural language | |
US20230018582A1 (en) | Identifying relevant information within a document hosting system | |
US9251130B1 (en) | Tagging annotations of electronic books | |
US20160098405A1 (en) | Document Curation System | |
US20140201203A1 (en) | System, method and device for providing an automated electronic researcher | |
US10678820B2 (en) | System and method for computerized semantic indexing and searching | |
US11086860B2 (en) | Predefined semantic queries | |
US9619570B2 (en) | Searching content based on transferrable user search contexts | |
US11630833B2 (en) | Extract-transform-load script generation | |
US11544306B2 (en) | System and method for concept-based search summaries | |
US11886477B2 (en) | System and method for quote-based search summaries | |
US20130159222A1 (en) | Interactive interface for object search | |
CN115757689A (en) | Information query system, method and equipment | |
Schaffert et al. | The linked media framework: Integrating and interlinking enterprise media content and data | |
KR101272656B1 (en) | Method of file management based on tag and system of the same | |
US11861321B1 (en) | Systems and methods for structure discovery and structure-based analysis in natural language processing models | |
US11940953B2 (en) | Assisted updating of electronic documents | |
Holzmann et al. | ABCDEF: The 6 key features behind scalable, multi-tenant web archive processing with ARCH: Archive, Big Data, Concurrent, Distributed, Efficient, Flexible | |
US9886497B2 (en) | Indexing presentation slides | |
US9342586B2 (en) | Managing and using shareable search lists | |
US20160085850A1 (en) | Knowledge brokering and knowledge campaigns | |
Mashwani et al. | 360 semantic file system: augmented directory navigation for nonhierarchical retrieval of files |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180706 |
|
RJ01 | Rejection of invention patent application after publication |