CN112784014B - Safe full-text retrieval system and method based on multi-source heterogeneous system - Google Patents

Safe full-text retrieval system and method based on multi-source heterogeneous system Download PDF

Info

Publication number
CN112784014B
CN112784014B CN202110054652.3A CN202110054652A CN112784014B CN 112784014 B CN112784014 B CN 112784014B CN 202110054652 A CN202110054652 A CN 202110054652A CN 112784014 B CN112784014 B CN 112784014B
Authority
CN
China
Prior art keywords
retrieval
user
data
index information
service module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110054652.3A
Other languages
Chinese (zh)
Other versions
CN112784014A (en
Inventor
何腾蛟
张婷
韩飞
李庆
曾辉
幸阳文
邹瑞璋
吴斌
汪冉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuclear Power Institute of China
Original Assignee
Nuclear Power Institute of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nuclear Power Institute of China filed Critical Nuclear Power Institute of China
Priority to CN202110054652.3A priority Critical patent/CN112784014B/en
Publication of CN112784014A publication Critical patent/CN112784014A/en
Application granted granted Critical
Publication of CN112784014B publication Critical patent/CN112784014B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a safe full-text retrieval system based on a multi-source heterogeneous system.A module of an application system stores active data, the source data comprises original data and processed data, the processed data is the data of the source data identified by an OCR service module, index information is established for the source data, and the index information is transmitted to a retrieval service module; storing index information, extracting a retrieval result list according to a retrieval request of a user, and accessing source data corresponding to the retrieval result list in an application system by the user; when the original data is sent to an OCR service module for processing, system identity authentication is carried out; when the index information is transmitted to the retrieval service module, system identity authentication is carried out; and when the source data of the corresponding retrieval result list in the application system is accessed, user identity authentication is carried out. The invention can widely support the safe full-text retrieval establishment of the multisource heterogeneous system in the unit, and particularly has the requirement of the unit with higher document confidentiality and safety requirements on the unified full-text retrieval of each system.

Description

Safe full-text retrieval system and method based on multi-source heterogeneous system
Technical Field
The invention relates to the technical field of full-text retrieval, in particular to a safe full-text retrieval system and method based on a multi-source heterogeneous system.
Background
With the development of information technology, various document management systems are more and more, the data volume is more and more huge, and meanwhile, according to the importance degree of data, the data access authority control is also brought into the system management range, so that the rapid, comprehensive and accurate safe full-text retrieval across systems can be realized under the environment, which is an urgent requirement of each user. At present, the development of full-text retrieval technology is mature, and for cross-system retrieval, the process mainly includes crawling out texts in each system for unified analysis and establishing indexes. Meanwhile, aiming at unstructured documents in each system, an OCR image-text recognition tool is adopted for processing to meet the full-text retrieval requirement; because the OCR recognition tool is an open and undifferentiated application tool, when each system sends an internal unstructured document to the OCR recognition tool for recognition, system-level identity authentication and data isolation management are not performed, and the security of data cannot be guaranteed.
Disclosure of Invention
The invention aims to solve the technical problems that the prior art can not meet the requirement of safe full-text retrieval conforming to the access right control of a user and does not comprehensively consider the data safety control in the full-text retrieval service establishing process, and aims to provide a safe full-text retrieval system and a safe full-text retrieval method based on a multi-source heterogeneous system, solve the problems of boundary safety and access control safety of data of each application system in the processes of index establishing, data transmission and OCR (optical character recognition) management and meet the requirement of quick, accurate, comprehensive and effective full-text retrieval of important documents in the system by users in units.
The invention is realized by the following technical scheme:
a secure full-text retrieval system based on a multi-source heterogeneous system, comprising: the system comprises an application system module, a retrieval service module and an OCR service module; the application system module stores active data, wherein the source data comprises original data and processed data, the processed data is the data of the source data identified by the OCR service module, and the application system module establishes index information for the source data and transmits the index information to the retrieval service module; the retrieval service module stores the index information, extracts a retrieval result list from a user according to a retrieval request of the user, and accesses source data corresponding to the retrieval result list in an application system according to the retrieval result list; the OCR service module receives the original data from the application system module and performs OCR recognition on the original data; when the application system delivers the original data to the OCR service module for processing, system identity authentication is carried out; when the application system transmits the index information to the retrieval service module, system identity authentication is carried out; and when the user accesses the source data corresponding to the retrieval result list in the application system, performing user identity authentication.
In the prior art, full-text retrieval is performed for all application systems, and certain security risks and processing burdens are caused to the application systems. The invention establishes an independent retrieval service module to provide retrieval service for all application systems, all application systems uniformly transfer the index information of the source data in the system to the retrieval service module for storage, the retrieval service module is used for butting users, the users obtain a retrieval result list through the retrieval service module, and then the retrieval result list jumps to the application system to access the source data. In the process, the application system performs user identity authentication on the user, and the retrieval service module performs system identity authentication on retrieval information transmitted by the application system, so that the authority and safety problems of each system and the user are guaranteed.
Meanwhile, the source data in the application system of the present invention includes two parts, which are the original data and the original data after OCR (i.e. the original data after the character recognition processing). The OCR service module provides character recognition service, namely OCR service, for the original data, and before processing the original data, the OCR service module performs system identity authentication on the original data transmitted by the application system; and the application system carries out system identity authentication on the processed data transmitted by the OCR service module. The system identity before and after the OCR identifies the transmission data ensures the boundary safety and the access control safety in the OCR identification management process. The OCR service provides a complete safety control mechanism for the data to be identified of each application system, and data access by cross-system and system administrators is avoided; and skipping the details of the retrieval result list to an application system corresponding to the source data, namely viewing by the source system, so that the viewing authority of the user on the document is still controlled by the source system, and ensuring the safety control of data transmission among systems by a multi-system identity authentication mechanism.
The invention meets the requirement of safe full-text retrieval conforming to the access authority control of the user, comprehensively considers the data safety control in the full-text retrieval service establishing process by the retrieval service mode with authority, can widely support the safe full-text retrieval establishing of the multisource heterogeneous system in a unit, and particularly meets the requirement of uniform full-text retrieval of the unit with higher document confidentiality safety requirement on each system.
Further, when the source data of the application system is handed to the OCR service module for processing, the OCR service module performs identity authentication and source data isolation on the plurality of application systems, respectively.
Further, the isolating comprises: document security level management, different path storage management and encryption and decryption key management.
Further, different encryption and decryption keys are used for the plurality of application systems, respectively.
Further, the method comprises the following steps: the application system module integrates an index plug-in, establishes index information for the source data through the index plug-in, encrypts the index information, and transmits the encrypted index information to the retrieval service module. Index establishment is carried out inside each application system in an index plug-in mode, and system service data are prevented from crossing system boundaries.
Further, the retrieval service module comprises an index database; storing the encrypted index information into the index database; and responding to a retrieval service request from a user, extracting encrypted index information matched with the retrieval service request from the index database, decrypting the encrypted index information to obtain a query result list, and returning the query result list to the user.
Further, a retrieval portal and retrieval management are arranged between a user and the index database, keywords and user information in the retrieval service request are obtained through the retrieval portal, the keywords and the user information are encrypted through the retrieval management, the encrypted index information matched with the retrieval service request is extracted from the index database according to the encrypted keywords and the user information, the encrypted index information is decrypted through the retrieval management, and the decrypted index information is presented to the user through the retrieval portal in a form of a query result list.
Further, the user identity authentication comprises a user real-time viewing authority.
Further, the index information includes: index key, system identification, document name, context, user rights information, and security level. The index information contains user authority, and the condition that a user query result list exceeds a user knowledge range is avoided.
The invention also discloses a safe full-text retrieval method based on the multi-source heterogeneous system, which adopts any one of the systems and comprises the following steps: step S1: providing a uniform retrieval service module for each application system, integrating an index plug-in the application system, establishing index information for source data in the application system by the index plug-in, and performing system identity authentication on the encrypted index information by the retrieval service module and storing the encrypted index information; step S2: the retrieval service module extracts and decrypts encrypted index information corresponding to a retrieval request according to the retrieval request initiated by a user, and feeds the index information back to the user in a form of a query result list; step S3: the user jumps to an application system corresponding to the content of the query result list according to the query result list, and feeds back source data corresponding to the content of the query result list to the user after performing user identity authentication on the user; the source data comprises original data and OCR recognized data, and the OCR recognized data is the data of the original data processed by an OCR service module; the OCR service module carries out system identity authentication on the original data before processing; and the application system performs system identity authentication on the processed OCR recognized data.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. a uniform safe full-text retrieval portal is provided, so that a user can perform keyword retrieval at one entrance, and the retrieval experience of the user is improved;
2. providing an index information establishment standard specification containing user authority and document security level;
3. the data privacy risk control method in the processes of establishing index information, managing index information, identifying unstructured documents, checking document details and the like in an information system is provided, so that the data boundary of an application system is effectively prevented from being damaged, and the document authority is controlled by a source system;
4. a multi-system identity authentication mechanism is established, and effective control on data in the system data exchange process is ensured;
5. the method provides an effective data confidentiality scheme and a construction method for the safe full-text retrieval of the multi-source heterogeneous system in the military confidentiality unit, and has good social benefit and economic benefit.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a general architecture of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Example 1
A secure full-text retrieval system based on a multi-source heterogeneous system, comprising: the system comprises an application system module, a retrieval service module and an OCR service module; the application system module stores active data, the source data comprises original data and processed data, the processed data is data of the source data identified by an OCR service module, and the application system module establishes index information for the source data and transmits the index information to the retrieval service module; the retrieval service module stores the index information, extracts a retrieval result list from the user according to a retrieval request of the user, and accesses source data corresponding to the retrieval result list in the application system according to the retrieval result list; the OCR service module receives the original data from the application system module and performs OCR recognition on the original data; when the application system sends the original data to the OCR service module for processing, the system identity authentication is carried out; when the application system transmits the index information to the retrieval service module, system identity authentication is carried out; and when the user accesses the source data of the corresponding retrieval result list in the application system, performing user identity authentication.
In the prior art, full-text retrieval is performed for all application systems, and certain security risks and processing burdens are caused to the application systems. In this embodiment 1, a separate retrieval service module is established to provide retrieval services for all application systems, all application systems uniformly deliver index information of source data in the system to the retrieval service module for storage, the retrieval service module is used for interfacing with a user, the user obtains a retrieval result list through the retrieval service module, and then the retrieval result list jumps to the application system to access the source data. In the process, the application system performs user identity authentication on the user, and the retrieval service module performs system identity authentication on retrieval information transmitted by the application system, so that the authority and safety problems of each system and the user are guaranteed.
Meanwhile, the source data in the application system in this embodiment 1 includes two parts, which are the original data and the OCR-processed original data (i.e., the original data after the character recognition processing). The OCR service module provides character recognition service, namely OCR service, for the original data, and before processing the original data, the OCR service module performs system identity authentication on the original data transmitted by the application system; and the application system carries out system identity authentication on the processed data transmitted by the OCR service module. The system identity before and after the OCR identifies the transmission data ensures the boundary safety and the access control safety in the OCR identification management process. The OCR service provides a complete safety control mechanism for the data to be identified of each application system, and data access by cross-system and system administrators is avoided; and skipping the details of the retrieval result list to an application system corresponding to the source data, namely viewing by the source system, so that the viewing authority of the user on the document is still controlled by the source system, and ensuring the safety control of data transmission among systems by a multi-system identity authentication mechanism.
The embodiment 1 meets the requirement of safe full-text retrieval meeting the access authority control of the user, comprehensively considers the data safety control in the full-text retrieval service establishing process through the retrieval service mode with the authority, can widely support the safe full-text retrieval establishing of the multisource heterogeneous system in a unit, and particularly meets the requirement of unified full-text retrieval of each system by the unit with higher document confidentiality safety requirement.
When the source data of the plurality of application systems are processed by the OCR service module, the OCR service module respectively performs identity authentication and source data isolation on the plurality of application systems. The isolation includes: document security level management, different path storage management and encryption and decryption key management.
In embodiment 1, different encryption/decryption keys are used for a plurality of application systems. The application system module is integrated with an index plug-in, index information is established for the source data through the index plug-in, the index information is encrypted, and the encrypted index information is transmitted to the retrieval service module. Index establishment is carried out inside each application system in an index plug-in mode, and system service data are prevented from crossing system boundaries.
The retrieval service module comprises an index database; storing the encrypted index information into an index database; responding to a retrieval service request from a user, extracting encrypted index information matched with the retrieval service request from an index database, decrypting the encrypted index information to obtain a query result list, and returning the query result list to the user; a retrieval portal and retrieval management are arranged between a user and an index database, a keyword and user information in a retrieval service request are obtained through the retrieval portal, the keyword and the user information are encrypted through the retrieval management, encrypted index information matched with the retrieval service request is extracted from the index database according to the encrypted keyword and the user information, the encrypted index information is decrypted through the retrieval management, and the decrypted index information is presented to the user through the retrieval portal in a form of a query result list.
The user identity authentication comprises the real-time viewing authority of the user. The index information includes: index key, system identification, document name, context, user rights information, and security level. The index information contains user authority, and the condition that a user query result list exceeds a user knowledge range is avoided.
Example 2
In this embodiment 2, on the basis of embodiment 1, a secure full-text retrieval method based on a multi-source heterogeneous system is established, a data security problem of important document data in the whole full-text retrieval service process through index management, index query, index establishment, OCR recognition management and retrieval result viewing is comprehensively considered, boundary security and access control security of each system data in the index establishment, data transmission and OCR recognition management processes are ensured, and quick, accurate, comprehensive and effective full-text retrieval of important documents in a system by a user in a unit is satisfied.
Establishing index information standard specifications at least comprising index keywords, system identification, document name, context, user authority information, security level and the like; and developing an index plug-in, integrating the plug-ins in each application system, completing the establishment of index information, and simultaneously encrypting the index information and sending the encrypted index information to a unified retrieval service module for unified storage management.
Establishing an OCR service, and realizing the whole security management process of the identity authentication of an application system needing OCR identification, system data isolation management, data security management, system data encryption and decryption and file scheduling management in the data identification process, wherein all system data encryption and decryption keys are different.
Establishing a uniform safe full-text retrieval service portal, and providing user login and application system identity authentication services; and establishing a retrieval management module to realize encryption of retrieval keywords input by a user, query in an index database, decryption of a query result list and storage of index information of an application system.
And the source system data security access control provides a user identity authentication service when the user jumps to the source system to check details and a system identity authentication service when the OCR service returns the identified data to the application system.
The embodiment 2 can widely support the establishment of safe full-text retrieval of multisource heterogeneous systems in units, and particularly has the requirement of units with higher requirements on document confidentiality and safety on the unified full-text retrieval of each system; providing an index information establishment standard specification containing user authority and document security level; the data security risk control method in the processes of establishing index information, managing index information, identifying unstructured documents, checking document details and the like in an information system effectively ensures that the data boundary of an application system is not damaged and the document authority is controlled by a source system; a multi-system identity authentication mechanism is established, and effective control on data in the system data exchange process is ensured; the method provides an effective data confidentiality scheme and a construction method for the safe full-text retrieval of the multi-source heterogeneous system in the military confidentiality unit, and has good social benefit and economic benefit.
Example 3
In this embodiment 3, based on embodiment 2, the secure full-text search is mainly divided into 3 parts: and the retrieval service module, the OCR service module and the application system index are established.
Basic principle
1. Index establishment is carried out inside each application system in an index plug-in mode, and system service data are prevented from crossing system boundaries;
2. the index information contains user authority, so that the condition that a user query result list exceeds a user knowledge range is avoided;
3. the OCR service provides a complete safety control mechanism for the data to be identified of each application system, different encryption and decryption keys and different storage paths of different system data are realized, and data access by crossing systems and system administrators is avoided;
4. the detail of the query result jumps to the source system for viewing, and the viewing authority of the user on the document is still controlled by the source system;
5. the multiple system identity authentication mechanism ensures the safety control of data transmission between systems.
(II) main process
The first step is as follows: confirming standard specifications of index information, and developing an index plug-in for establishing the index information;
the second step is that: the application system integrates an index plug-in, indexes and establishes a text document in the system to form index information containing index keywords, system identification, document name, context, user authority information, security level and the like, and encrypts and sends the index information to a retrieval server;
the third step: establishing an OCR service module data security management mechanism, realizing system identity authentication of each application system accessing OCR service, and performing isolation management on OCR identification data of each application system, specifically comprising document security management and encryption and decryption key management, ensuring that the data of each application system are stored independently, and avoiding cross-system encryption and decryption;
the fourth step: the OCR service encrypts the recognized data in each system according to each system key and sends the encrypted data to the source system, and meanwhile, the source system performs identity authentication on a data transmission request of the OCR service;
and after receiving the text document identified by the OCR, the source system executes the second step of operation.
The fifth step: the full-text retrieval service encrypts the user retrieval key words according to the key by using the retrieval management module, queries the database and returns an encrypted data list in each application system with the viewing authority of the user;
and a sixth step: decrypting the encrypted data list of the query result and returning to a user display interface;
the seventh step: and when the user clicks the data entry, jumping back to the source system corresponding to the entry, and displaying the details by the source system according to the real-time viewing authority of the user.
The index information contains the viewing authority of the user to the corresponding document, namely a user list corresponding to the document, the document security level and the like.
Index plug-ins are integrated in each application system, so that document entities cannot cross system boundaries in the process of establishing index information.
Packaging the OCR recognition tool, establishing an OCR recognition service, and realizing the safety management of data needing OCR recognition of each application system, wherein the safety management comprises security management, different path storage management and different encryption and decryption keys.
The data cross-system transmission of the application system for index storage service, the application system for OCR recognition service, the application system for returning the document after OCR recognition to the application system and the like needs system-level identity authentication.
The storage form of the index information in the index database is encryption storage, and during retrieval, retrieval keywords are required to be encrypted and then inquired, and returned results are decrypted and then displayed to a user;
when the user inquires the details of the retrieval result items, the user needs to jump to the source system and needs to perform identity authentication, and the specific viewing authority of the user is still controlled by the source system.
In an embodiment of the present invention, a "source system" refers to an application system that stores target source data retrieved by a user.
In the present invention, OCR refers to character recognition, and specifically refers to a processing method for converting character information in a PDF or picture or other non-text format file into an editable text document.
In the invention, the user identity authentication comprises the authentication of user authority, and the system identity authentication comprises the authentication of system authority.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. A secure full-text retrieval system based on a multi-source heterogeneous system, comprising: the system comprises an application system module, a retrieval service module and an OCR service module;
the application system module stores active data, wherein the source data comprises original data and processed data, the processed data is the data of the source data identified by the OCR service module, and the application system module establishes index information for the source data and transmits the index information to the retrieval service module;
the retrieval service module stores the index information, extracts a retrieval result list from a user according to a retrieval request of the user, and accesses source data corresponding to the retrieval result list in an application system according to the retrieval result list;
the OCR service module receives the original data from the application system module and performs OCR recognition on the original data;
when the application system delivers the original data to the OCR service module for processing, system identity authentication is carried out; when the application system transmits the index information to the retrieval service module, system identity authentication is carried out; when the user accesses the source data corresponding to the retrieval result list in the application system, user identity authentication is carried out;
when the source data of the application system is delivered to the OCR service module for processing, the OCR service module respectively performs identity authentication and source data isolation on the plurality of application systems.
2. The secure full-text retrieval system based on a multi-source heterogeneous system according to claim 1, wherein the isolating comprises: document security level management, different path storage management and encryption and decryption key management.
3. The secure full-text retrieval system based on the multi-source heterogeneous system according to claim 2, wherein the plurality of application systems respectively use different encryption and decryption keys.
4. The secure full-text retrieval system based on the multi-source heterogeneous system according to claim 1, comprising: the application system module integrates an index plug-in, establishes index information for the source data through the index plug-in, encrypts the index information, and transmits the encrypted index information to the retrieval service module.
5. The secure full-text retrieval system based on a multi-source heterogeneous system according to claim 4, wherein the retrieval service module comprises an index database; storing the encrypted index information into the index database; and responding to a retrieval service request from a user, extracting encrypted index information matched with the retrieval service request from the index database, decrypting the encrypted index information to obtain a query result list, and returning the query result list to the user.
6. The secure full-text retrieval system based on the multi-source heterogeneous system according to claim 5, wherein a retrieval portal and retrieval management are provided between a user and the index database, a keyword and user information in the retrieval service request are acquired through the retrieval portal, the keyword and the user information are encrypted through the retrieval management, encrypted index information matched with the retrieval service request is extracted from the index database according to the encrypted keyword and user information, the encrypted index information is decrypted through the retrieval management, and the decrypted index information is presented to the user through the retrieval portal in the form of a query result list.
7. The secure full-text retrieval system based on the multi-source heterogeneous system according to claim 1, wherein the user identity authentication comprises a user real-time viewing right.
8. The secure full-text retrieval system based on the multi-source heterogeneous system according to claim 1, wherein the index information comprises: index key, system identification, document name, context, user rights information, and security level.
9. A secure full-text retrieval method based on a multi-source heterogeneous system, characterized in that the system of any one of claims 1 to 8 is adopted, and the method comprises the following steps:
step S1: providing a uniform retrieval service module for each application system, integrating an index plug-in the application system, establishing index information for source data in the application system by the index plug-in, and performing system identity authentication on the encrypted index information by the retrieval service module and storing the encrypted index information;
step S2: the retrieval service module extracts and decrypts encrypted index information corresponding to a retrieval request according to the retrieval request initiated by a user, and feeds the index information back to the user in a form of a query result list;
step S3: the user jumps to an application system corresponding to the content of the query result list according to the query result list, and feeds back source data corresponding to the content of the query result list to the user after performing user identity authentication on the user;
the source data comprises original data and OCR recognized data, and the OCR recognized data is the data of the original data processed by an OCR service module; the OCR service module carries out system identity authentication on the original data before processing; and the application system performs system identity authentication on the processed OCR recognized data.
CN202110054652.3A 2021-01-15 2021-01-15 Safe full-text retrieval system and method based on multi-source heterogeneous system Active CN112784014B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110054652.3A CN112784014B (en) 2021-01-15 2021-01-15 Safe full-text retrieval system and method based on multi-source heterogeneous system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110054652.3A CN112784014B (en) 2021-01-15 2021-01-15 Safe full-text retrieval system and method based on multi-source heterogeneous system

Publications (2)

Publication Number Publication Date
CN112784014A CN112784014A (en) 2021-05-11
CN112784014B true CN112784014B (en) 2022-03-25

Family

ID=75756078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110054652.3A Active CN112784014B (en) 2021-01-15 2021-01-15 Safe full-text retrieval system and method based on multi-source heterogeneous system

Country Status (1)

Country Link
CN (1) CN112784014B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003017055A2 (en) * 2001-08-15 2003-02-27 Visa International Service Association Method and system for delivering multiple services electronically to customers via a centralized portal architecture

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050097080A1 (en) * 2003-10-30 2005-05-05 Kethireddy Amarender R. System and method for automatically locating searched text in an image file
CN102081669B (en) * 2011-01-24 2012-11-21 哈尔滨工业大学 Hierarchical retrieval method for multi-source remote sensing resource heterogeneous databases
CN105528604B (en) * 2016-01-31 2018-12-11 华南理工大学 A kind of bill automatic identification and processing system based on OCR
CN106156272A (en) * 2016-06-21 2016-11-23 北京工业大学 A kind of information retrieval method based on multi-source semantic analysis
CN106302449B (en) * 2016-08-15 2019-10-11 中国科学院信息工程研究所 A kind of storage of ciphertext and the open cloud service method of searching ciphertext and system
CN110489395B (en) * 2019-07-27 2022-07-29 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for automatically acquiring knowledge of multi-source heterogeneous data
CN110300191A (en) * 2019-07-29 2019-10-01 崔翛龙 Service system and data processing method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003017055A2 (en) * 2001-08-15 2003-02-27 Visa International Service Association Method and system for delivering multiple services electronically to customers via a centralized portal architecture

Also Published As

Publication number Publication date
CN112784014A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
US10762241B1 (en) Third-party platform for tokenization and detokenization of network packet data
US10013574B2 (en) Method and apparatus for secure storage and retrieval of encrypted files in public cloud-computing platforms
US9576005B2 (en) Search system
US9350714B2 (en) Data encryption at the client and server level
CN110597902B (en) Block chain-based alliance type health data retrieval system and method
CN102902932B (en) The using method of the outside encrypting and deciphering system of the database based on SQL rewrite
US11924185B2 (en) Method and system for general data protection compliance via blockchain
CN111737720B (en) Data processing method and device and electronic equipment
JP7249248B2 (en) Confidential Information Processing System and Confidential Information Processing Method
KR102517582B1 (en) Personal search index with enhanced privacy
US20140019749A1 (en) Securing information exchanged via a network
CN112511599A (en) Civil air defense data sharing system and method based on block chain
CN110990877A (en) Medical image file segmentation encryption and decryption system and method based on greenplus
CN113645226A (en) Data processing method, device, equipment and storage medium based on gateway layer
US20130067595A1 (en) Data Isolation Service for Data and Information Sharing
CN113377876B (en) Data database processing method, device and platform based on Domino platform
Carminati et al. Trust and share: Trusted information sharing in online social networks
CN112784014B (en) Safe full-text retrieval system and method based on multi-source heterogeneous system
JP4594078B2 (en) Personal information management system and personal information management program
CN111614638A (en) Face recognition data distribution system and method based on big data platform
EP3388969B1 (en) Search system
CN116432193A (en) Financial database data protection transformation method and financial data protection system thereof
JP2011100334A (en) Document file retrieval system, document file registration method, document file retrieval method, program, and recording medium
Raghavendra et al. DRSIG: Domain and Range Specific Index Generation for Encrypted Cloud Data
CN118277503A (en) Text processing method, apparatus, device, medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant