CN114676231A - Target information detection method, device and medium - Google Patents

Target information detection method, device and medium Download PDF

Info

Publication number
CN114676231A
CN114676231A CN202011553669.5A CN202011553669A CN114676231A CN 114676231 A CN114676231 A CN 114676231A CN 202011553669 A CN202011553669 A CN 202011553669A CN 114676231 A CN114676231 A CN 114676231A
Authority
CN
China
Prior art keywords
text
target information
verified
information
regular expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011553669.5A
Other languages
Chinese (zh)
Inventor
文静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202011553669.5A priority Critical patent/CN114676231A/en
Publication of CN114676231A publication Critical patent/CN114676231A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a target information detection method, a target information detection device, target information detection equipment and a target information detection medium, wherein keywords are utilized to perform keyword retrieval on an initial text so as to obtain a text to be verified, which contains the keywords; when the keywords are keywords containing related enterprise related information, the screened text to be verified has more pertinence. After the text to be verified is obtained, the text to be verified can be matched according to a preset target information regular expression so as to determine a detection result. The target information regular expression expresses all possible presentation forms of various keywords in the form of the regular expression, and can more comprehensively and accurately cover target information in various forms. The target information regular expression is used for matching the text to be verified, so that the target information in the text to be verified can be accurately verified, and the accuracy of target information detection is ensured.

Description

Target information detection method, device and medium
Technical Field
The present application relates to the field of security detection technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for detecting target information.
Background
In practical application, company staff upload a code text of a company to a code hosting platform according to business requirements, the uploaded code text often contains information which is interested by the company, for example, some sensitive information inside the company, such as database connection information, server account passwords and the like, and leakage of the sensitive information may bring great potential safety hazards to the enterprise and even cause irretrievable loss. Thus, companies often need to look up information of interest contained in the code hosting platform to determine the distribution of the information of interest.
Currently, there are some detection tools for identifying information of interest in the code repository, but these detection tools only use some keywords to query the code hosting platform and directly store the detection results. The keyword is used for detecting the code text, only relatively general detection results can be obtained, a large number of false reports exist in the detection results, and the detection accuracy is greatly reduced. Of course, not only the scenario of identifying the information of interest in the code repository, but also many other scenarios suffer from the drawback that the accuracy of detecting the information of interest is not sufficient.
Therefore, how to improve the accuracy of the information detection of interest is a problem to be solved by those skilled in the art.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method, an apparatus, a device, and a computer-readable storage medium for detecting target information, which can improve the accuracy of detecting information of interest.
In order to solve the foregoing technical problem, an embodiment of the present application provides a target information detection method, including:
carrying out keyword retrieval on the initial text by using the keywords to obtain a text to be verified containing the keywords; the keyword is associated with the target information;
and matching the text to be verified according to a preset target information regular expression to determine a detection result.
Optionally, the matching the text to be verified according to a preset regular expression of target information to determine that the detection result includes:
establishing a character string matching model according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information;
matching a text to be checked based on the character string matching model, and determining a character string included in the text to be checked;
matching the text to be verified according to the effective regular expression to determine a detection result; and the effective regular expression is a target information regular expression corresponding to a character string included in the text to be verified.
Optionally, before the matching the text to be verified according to the preset target information regular expression, the method further includes:
filtering the text to be verified according to preset filtering information to obtain an effective text;
correspondingly, matching the text to be verified according to a preset regular expression of target information comprises:
and matching the effective text according to a preset target information regular expression.
Optionally, in the process of performing matching on the text to be verified according to the preset target information regular expression, the following steps are performed:
in the process of matching the text to be verified according to a preset target information regular expression, if target information matched with the target information regular expression is matched, judging whether value-taking information exists in the target information;
if the value-taking information exists in the target information, judging whether the value-taking information is visual information in the text to be verified;
and if the value-taking information is not visual information in the text to be verified, filtering the value-taking information.
The embodiment of the application also provides a target information detection device, which comprises a retrieval unit and a matching unit;
the retrieval unit is used for performing keyword retrieval on the initial text by using the keywords to obtain a text to be verified containing the keywords; the keyword is associated with the target information;
and the matching unit is used for matching the text to be verified according to a preset target information regular expression so as to determine a detection result.
Optionally, the matching unit includes an establishing subunit, a character string determining subunit, and a result determining subunit;
the establishing subunit is used for establishing a character string matching model according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information;
the character string determining subunit is configured to match a text to be verified based on the character string matching model, and determine a character string included in the text to be verified;
the result determining subunit is used for matching the text to be verified according to the effective regular expression so as to determine a detection result; and the effective regular expression is a target information regular expression corresponding to a character string included in the text to be verified.
Optionally, a filtration unit is further included;
the filtering unit is used for filtering the text to be verified according to preset filtering information before the text to be verified is matched according to a preset target information regular expression so as to obtain an effective text;
correspondingly, the matching unit is used for matching the effective text according to a preset target information regular expression.
Optionally, the matching process of the text to be verified according to the preset target information regular expression is executed, and the matching process includes a first judging unit, a second judging unit and a filtering unit;
the first judging unit is used for judging whether value-taking type information exists in the target information when the target information matched with the target information regular expression is matched in the process of matching the text to be verified according to the preset target information regular expression;
the second judging unit is configured to judge whether the value-taking information is visual information in the text to be verified if the value-taking information exists in the target information;
the filtering unit is used for filtering the value-taking information if the value-taking information is not visual information in the text to be verified.
According to the technical scheme, the initial text is subjected to keyword retrieval by using the keywords to obtain the text to be verified containing the keywords; when the keywords are keywords containing related enterprise related information, the screened text to be verified has more pertinence. After the text to be verified is obtained, the text to be verified can be matched according to a preset target information regular expression so as to determine a detection result. The target information regular expression represents all possible presentation forms of various keywords in the form of regular expressions, and can more comprehensively and accurately cover various forms of target information. The target information regular expression is used for matching the text to be verified, so that the target information in the text to be verified can be accurately verified, and the accuracy of target information detection is ensured.
The embodiment of the application also provides a target information detection method, which comprises the following steps:
establishing a character string matching model according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information;
matching a text to be verified based on the character string matching model, and determining a character string included in the text to be verified;
matching the text to be verified according to the effective regular expression, and determining a detection result; and the effective regular expression is a target information regular expression corresponding to a character string included in the text to be verified.
Optionally, in the process of performing matching on the text to be verified according to the valid regular expression, the following steps are performed:
in the process of matching the text to be verified according to the effective regular expression, if target information matched with the effective regular expression is matched, judging whether value-taking type information exists in the target information;
if the value-taking information exists in the target information, judging whether the value-taking information is visual information in the text to be verified;
and if the value-taking information is not visual information in the text to be verified, filtering the value-taking information.
Optionally, the detection result includes: each target information contained in the text to be verified;
correspondingly, after the detection result is determined, the method further comprises the following steps:
and according to a set text format, packaging each target information contained in the text to be verified, and displaying the packaged target information.
Optionally, before the matching the text to be verified according to the valid regular expression, the method further includes:
filtering the text to be verified according to preset filtering information to obtain an effective text;
correspondingly, the matching the text to be verified according to the valid regular expression comprises:
and matching the effective text according to an effective regular expression.
The embodiment of the application also provides a target information detection device, which comprises an establishing unit, a first matching unit and a second matching unit;
the establishing unit is used for establishing a character string matching model according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information;
the first matching unit is used for matching a text to be verified based on the character string matching model and determining a character string included in the text to be verified;
the second matching unit is used for matching the text to be verified according to the effective regular expression and determining a detection result; and the effective regular expression is a target information regular expression corresponding to a character string included in the text to be verified.
Optionally, the matching process of the text to be verified according to the valid regular expression includes a first judging unit, a second judging unit and a filtering unit;
the first judging unit is used for judging whether value-taking information exists in the target information or not when the target information matched with the effective regular expression is matched in the process of matching the text to be verified according to the effective regular expression;
the second judging unit is configured to judge whether the value-taking information is visual information in the text to be verified if the value-taking information exists in the target information;
the filtering unit is used for filtering the value-taking information if the value-taking information is not visual information in the text to be verified.
Optionally, the detection result includes: each target information contained in the text to be verified;
correspondingly, the packaging device also comprises a packaging unit;
and the packaging unit is used for packaging each target information contained in the text to be verified according to a set text format and displaying the packaged target information.
Optionally, a filtration unit is further included;
the filtering unit is used for filtering the text to be verified according to preset filtering information before the text to be verified is matched according to the effective regular expression so as to obtain an effective text;
correspondingly, the second matching unit is used for matching the effective text according to the effective regular expression.
According to the technical scheme, a character string matching model is established according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information; and matching the text to be verified based on the character string matching model, and determining the character strings included in the text to be verified. Fuzzy matching of the text to be verified can be achieved by constructing a character string matching model. The regular expression of the target information corresponding to the character strings included in the text to be verified can be called as an effective regular expression, the text to be verified is matched according to the effective regular expression, and the detection result is determined. Compared with the mode that the text to be checked is directly matched with all the target information regular expressions, the method and the device have the advantages that the effective regular expressions matched with the text to be checked can be quickly screened out by constructing the character string matching model, so that the regular expressions which do not need to be matched accurately are filtered, the text to be checked is only required to be matched with the effective regular expressions, and the efficiency of matching the text to be checked is greatly improved.
An embodiment of the present application further provides a target information detection device, including:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the target information detection method as described in any one of the above.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the target information detection method are implemented as any one of the above.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings required for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained by those skilled in the art without inventive effort.
Fig. 1 is a schematic flowchart of target information detection according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a target information detection method according to an embodiment of the present application;
fig. 3 is a flowchart of another target information detection method provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of a target information detection apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of another target information detection apparatus according to an embodiment of the present application;
fig. 6 is a schematic diagram of a hardware structure of a target information detection device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the present application.
In order that those skilled in the art will better understand the disclosure, the following detailed description is given with reference to the accompanying drawings.
Target information detection sources can be various, for example, a code hosting platform, a user can upload a code file to the code hosting platform in a conventional manner, and in order to detect target information (for example, sensitive information) in the code file, a detection tool containing keywords is often adopted to search the target information of the hosting platform in a whole network. Although a large amount of target information can be obtained according to the method, because the detection tool has a single keyword form, only general retrieval can be realized according to the keyword, so that a large amount of false reports exist in the detection result, and the accuracy of target information detection is not high.
Therefore, the embodiment of the application provides a target information detection method, a target information detection device, target information detection equipment and a computer-readable storage medium, and the target information detection method, the target information detection device, the target information detection equipment and the computer-readable storage medium are used for performing keyword retrieval on an initial text by using keywords so as to obtain a text to be verified, wherein the text comprises the keywords. In order to improve the accuracy of the detection result, the text to be verified can be secondarily detected, that is, the text to be verified can be matched according to a preset target information regular expression so as to determine which texts include the target information, and also determine which target information contained in the text to be verified is. Fig. 1 is a schematic view of a target information detection process provided in an embodiment of the present application, where the keywords include a large number of different keywords, and the initial screening of multiple initial texts can be implemented by using the different keywords. The target information regular expression represents all possible presentation forms of various keywords in the form of the regular expression, the text to be verified is verified again through the target information regular expression, which initial texts can be screened out more accurately and the target information contained in the text to be verified can be found out accurately.
Next, a method for detecting target information provided in an embodiment of the present application will be described in detail. Fig. 2 is a flowchart of a target information detection method provided in an embodiment of the present application, where the method includes:
s201: and performing keyword retrieval on the initial text by using the keywords to obtain the text to be verified containing the keywords.
The number of the keywords should be as large as possible, so as to prevent certain texts from being missed during preliminary screening, for example, password keywords may adopt password, pwd, password and the like. Further, in the present application, the keywords may be specifically keywords related to enterprise-related information, in which case each keyword includes: the enterprise-related information and the preset target information keywords may be, for example, combined with password, pwd, and password, respectively. In the embodiment of the present application, in order to make the detection of the target information more targeted, for example, the target information may be detected on a selected enterprise in a targeted manner, and the obtained enterprise-related information may be combined with target information keywords in a preset target information keyword library to obtain the keywords in step S201.
The following description takes target information, specifically "sensitive information related to enterprise security", as an example:
the sensitive thesaurus may contain a common plurality of pieces of sensitive information. The type of sensitive information may be various, and may include, for example, password keywords (password, pwd, password, etc.), mailbox information (email, mail, etc.), a database (pymysql) for holding information such as account passwords, IPs, etc., and the like.
The keywords may include information with enterprise representatives, such as a domain name of the enterprise, a uniform postfix of the enterprise, a name of the enterprise, and the like.
In practical applications, the character string "company.
Taking the example that the sensitive word bank includes password, email and pymysql:// as an example, the enterprise related information company.test.com is combined with the preset sensitive word bank, and the obtained keywords used in step S201 may include "company.test.password", "company.test.com email", "company.test.com://". The related information of the enterprise and the sensitive information keywords can be separated by a blank space.
S202: and matching the text to be verified according to a preset target information regular expression to determine a detection result.
The detection result may be each target information contained in the text to be verified; the text indicating the target information may be provided, and the application does not limit the specific form of the detection result.
In practical application, all possible presentation forms of various types of target information can be expressed in the form of regular expressions to obtain the regular expressions of the target information.
Taking the above-mentioned three types of target information, namely, the password keyword, the mailbox information, and the database as an example, the data form corresponding to the password keyword may include three forms, namely, password, pwd, and password, so the regular expression corresponding to the password keyword may be "password" ("password | pwd | password) (" valid "), where" | "is used to represent" or ", and" ("+" indicates that the target information may be presented in any form of password ═ any string, pwd ═ any string, or password ═ any string. "password | pwd | password ═ plus" is a password keyword regular expression, and "password" in front of the regular expression can be regarded as index information or identification information or key value of the password keyword regular expression.
The data form corresponding to the mailbox information may include email or mail, and thus the regular expression corresponding to the mailbox information may be "email" ("|) +". Where "(=)" means that the mailbox information may be expressed as "email" - "or" email: "is presented in the form of.
The data format corresponding to the database may include pymysql, and the regular expression corresponding to the database may be "mysql", "pymysql:// {5,256} @{ 5, }", where {5,256} indicates that a string exists, the length of the string is 5-256, where the string may be an account password, {5, } indicates that a string exists, the length of the string is at least 5, and where the string may be a domain name.
The target information regular expression can more comprehensively and accurately cover various specific expression forms of the target information. In the embodiment of the application, the text to be verified is matched with a preset regular expression of target information, secondary screening verification is performed through the regular expression, the text containing the target information is selected, or each piece of target information contained in the text to be verified is determined.
In the embodiment of the application, after the detection result is determined, the detection result can be stored in the database for subsequent query invocation. For example, if the detection result is each piece of target information included in the text to be verified, after the target information included in the text to be verified is determined, the target information may be stored in the database for facilitating subsequent query calls.
According to the technical scheme, the initial text is subjected to keyword retrieval by using the keywords so as to obtain the text to be verified containing the keywords; when the keywords are keywords containing related enterprise related information, the screened text to be verified has more pertinence. After the text to be verified is obtained, the text to be verified can be matched according to a preset target information regular expression so as to determine a detection result. The target information regular expression represents all possible presentation forms of various keywords in the form of regular expressions, and can more comprehensively and accurately cover various forms of target information. The target information regular expression is used for matching the text to be verified, so that the target information in the text to be verified can be accurately verified, and the accuracy of target information detection is ensured.
In the step S202, the regular expression matching operation needs to be performed, considering that in practical application, the regular expression matching is time-consuming, and in order to improve the efficiency of target information detection, in this embodiment of the application, the text to be verified and the preset target information regular expression may be subjected to fuzzy matching by using a string matching algorithm, so that a regular expression which does not need to be accurately matched is filtered, and only the text to be verified and the filtered effective regular expression need to be matched, thereby greatly improving the detection efficiency of the target information.
Specifically, in the embodiment of the present application, a character string matching model may be established according to a preset target information regular expression, the character string matching model is associated with character strings corresponding to each regular expression, and the character strings included in the text to be verified are searched through the character string matching model, so as to determine an effective regular expression. In specific implementation, an Aho-coral automation (Aho-coral automation) based on a target information regular expression can be constructed by relying on a multi-mode matching algorithm, the AC automation based on the target information regular expression is a character string matching model, the AC automation is a multi-mode matching algorithm, and can realize search operation of character strings in a text, which was born in bell laboratories in 1975, and a specific implementation manner is the prior art, which is not described herein again. Of course, the AC automaton is only one of the multi-mode matching algorithms, and other specific multi-mode matching algorithms can be used to construct the string matching model.
In the embodiment of the application, the character string matching model can match the text to be verified with each character string in the character string matching model to obtain the character string included in the text to be verified, and then the regular expression corresponding to the included character string is determined as the effective regular expression matched with the text to be verified. Compared with the mode of directly matching the text to be verified with the regular expressions, the method can quickly screen out the effective regular expressions matched with the text to be verified, and effectively reduces the number of the regular expressions for executing matching operation. After the effective regular expression is screened out, the text to be verified can be matched according to the effective regular expression to obtain a detection result, and the detection result can be specifically target information contained in the text to be verified.
Fuzzy matching of the text to be verified can be achieved by constructing a character string matching model. And matching the text to be verified according to the effective regular expression to obtain target information contained in the text to be verified, so that the text to be verified is accurately matched. The character string matching model can help to filter out regular expressions which do not need to be matched, and only the filtered effective regular expressions need to be matched, so that the detection efficiency is improved.
It is considered that the text to be verified often contains some character information which does not need to be matched, such as an HTML tag, some conventional words, and the like. In the embodiment of the application, before the text to be verified is matched according to the preset target information regular expression to determine the detection result, the text to be verified is filtered according to the preset filtering information to obtain an effective text; correspondingly, the valid text is matched according to a preset regular expression of the target information, so as to determine a detection result, such as the target information contained in the valid text.
The character information needing to be filtered is contained in the filtering information, and the character information which does not need to be matched can be filtered by filtering the text to be verified, so that the data volume for matching the effective text is effectively reduced, and the processing efficiency of target information detection is further improved.
In the embodiment of the application, the detection result can be formatted and then displayed to a user. For example, if the detection result is each piece of target information, it is considered that a user may not intuitively know text information corresponding to the target information when simply displaying the target information, and therefore, after determining the target information included in the text to be verified, the target information included in the text to be verified may be packaged according to a set text format, and the packaged target information is displayed.
The text format may include information to be encapsulated in addition to the target information, and the connection manner and arrangement order between various types of information.
For example, the encapsulated information may further include a text title, a text author, a text storage address, and the like corresponding to the target information. In practical application, the target information, the text title, the text author and the text storage address can be packaged in sequence, and various information can be connected by adopting a space character.
By encapsulating the target information, the target information can be displayed in a uniform format. And when the target information is packaged, the information related to the target information can be packaged together, so that a user can visually know the text information corresponding to the target information.
In addition, value-taking information may exist during target information detection, and some value-taking information is presented in the form of numbers, letters and/or special characters, for example, password is 123456, in which case, it may be called intuitive value-taking information; some value-taking information may also be value-taking information obtained by calculating a function defined in a text to be checked, or a function name definition, or a variable name definition. Therefore, in the embodiment of the present application, for the value-taking information that is not obtained from the text to be verified intuitively, the value-taking information may not be attributed to the target information included in the text to be verified, and the definition and detection of the specific intuitive value-taking information may be determined according to the application scenario, which is only an example.
Therefore, in the embodiment of the application, in the process of performing the regular matching, whether the matched value-taking type target information is visual information or not can be judged; if the target information is not visual information, the target information is not the target information which is wanted to be searched, and the target information can be filtered.
Through detecting the intuitiveness of the value-taking information, the information which is not intuitively presented in the text to be verified can be effectively filtered, and the accuracy of target information detection is further improved.
Fig. 3 is a flowchart of another target information detection method provided in an embodiment of the present application, where the method includes:
s301: and establishing a character string matching model according to a preset target information regular expression.
In consideration of the fact that in practical application, matching of the regular expressions is time-consuming, in order to improve the efficiency of target information detection, in the embodiment of the application, a character string matching algorithm can be used for performing fuzzy matching on the text to be verified and the preset target information regular expression, so that the regular expressions which do not need to be matched accurately are filtered.
The character string matching model is associated with character strings corresponding to the regular expressions of the target information.
In specific implementation, an AC automaton based on a target information regular expression may be constructed by relying on a multi-mode matching algorithm, the AC automaton based on the target information regular expression is a character string matching model, the AC automaton is a multi-mode matching algorithm, and can implement a search operation for a character string in a text, which was born in bell laboratories in 1975, and a specific implementation manner is the prior art, which is not described herein again. Of course, the AC automaton is only one of the multi-mode matching algorithms, and other specific multi-mode matching algorithms can be used to construct the string matching model.
For example, if the regular expression is: "password" ("password | pwd | password) } @. +", then the string associated with the string matching model may be: passd, pwd and passsd. By way of example only, those skilled in the art can determine the respective character strings associated with the character string matching model according to the regular expression in the application scenario.
S302: and matching the text to be verified based on the character string matching model, and determining the character strings included in the text to be verified.
The character string matching model can realize a search function of character strings, in the embodiment of the application, the character string matching model can match a text to be verified with each character string in the character string matching model to obtain the character strings included in the text to be verified, and then the regular expression corresponding to the included character strings is determined to be an effective regular expression matched with the text to be verified.
S303: and matching the text to be verified according to the effective regular expression, and determining a detection result.
When the effective regular expression is used for matching the text to be verified, the text to be verified can be accurately matched.
For example, assuming that 50 target information regular expressions are provided, in practical application, a multi-mode matching algorithm may be adopted, for example, an esmre algorithm may be used to construct a character string matching model based on the 50 target information regular expressions (for example, an AC automaton may be realized by adopting the esmre algorithm), the character string matching model may realize fuzzy matching of a text to be verified, so as to screen out target information regular expressions matched with the text to be verified, and assuming that 10 target information regular expressions matched with the text to be verified are provided, at this time, the 10 target information regular expressions may be used as effective regular expressions to perform precise matching on the text to be verified.
The regular expression can realize accurate matching of the target information, but the calculated amount is large, in the embodiment of the application, in order to realize the balance of the accuracy of target information detection and the detection efficiency in the text to be verified, a character string matching model can be constructed to filter out the regular expression which does not need to be matched, only the text to be verified and the filtered effective regular expression are needed to be matched, and the detection efficiency is improved. And matching the text to be verified according to the effective regular expression to obtain the target information contained in the text to be verified, so that the text to be verified is accurately matched.
Optionally, in the process of performing matching on the text to be verified according to the valid regular expression, the following steps are performed:
in the process of matching the text to be verified according to the effective regular expression, if the target information matched with the effective regular expression is matched, judging whether value-taking information exists in the target information or not;
if the value-taking information exists in the target information, judging whether the value-taking information is visual information in the text to be verified;
and if the value-taking information is not visual information in the text to be verified, filtering the value-taking information.
Optionally, the detection result includes: each target information contained in the text to be verified;
correspondingly, after the detection result is determined, the method further comprises the following steps:
and according to a set text format, packaging each target information contained in the text to be verified, and displaying the packaged target information.
Optionally, before matching the text to be verified according to the valid regular expression, the method further includes:
filtering the text to be verified according to preset filtering information to obtain an effective text;
correspondingly, matching the text to be verified according to the valid regular expression comprises:
and matching the effective text according to the effective regular expression.
The description of the features in the embodiment corresponding to fig. 3 may refer to the related description of the embodiment corresponding to fig. 2, and is not repeated here.
According to the technical scheme, a character string matching model is established according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information; and matching the text to be verified based on the character string matching model, and determining the character strings included in the text to be verified. Fuzzy matching of the text to be verified can be achieved by constructing a character string matching model. The regular expression of the target information corresponding to the character strings included in the text to be verified can be called as an effective regular expression, the text to be verified is matched according to the effective regular expression, and the detection result is determined. Compared with the mode of directly matching the text to be verified with all the target information regular expressions, the method and the device can quickly select the effective regular expressions matched with the text to be verified by constructing the character string matching model, so that the regular expressions which do not need to be matched accurately are filtered, the text to be verified only needs to be matched with the effective regular expressions, and the efficiency of matching the text to be verified is greatly improved.
Fig. 4 is a schematic structural diagram of an object information detecting apparatus provided in an embodiment of the present application, and illustrates a retrieving unit 41 and a matching unit 42;
a retrieving unit 41, configured to perform a keyword retrieval on the initial text by using the keyword to obtain a to-be-verified text containing the keyword; the keywords are associated with the target information;
and the matching unit 42 is configured to match the text to be verified according to a preset target information regular expression, so as to determine a detection result.
Optionally, the matching unit includes an establishing subunit, a character string determining subunit and a result determining subunit;
the establishing subunit is used for establishing a character string matching model according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information;
the character string determining subunit is used for matching the text to be verified based on the character string matching model and determining the character strings included in the text to be verified;
the result determining subunit is used for matching the text to be verified according to the effective regular expression so as to determine a detection result; the effective regular expression is a target information regular expression corresponding to a character string included in the text to be verified.
Optionally, a filtering unit is further included before matching the text to be verified according to a preset target information regular expression;
the filtering unit is used for filtering the text to be verified according to preset filtering information so as to obtain an effective text;
correspondingly, the matching unit is used for matching the effective text according to a preset target information regular expression.
Optionally, the method includes the steps of performing matching on a text to be verified according to a preset target information regular expression, wherein the matching includes a first judging unit, a second judging unit and a filtering unit;
the first judgment unit is used for judging whether value-taking type information exists in the target information or not when the target information matched with the target information regular expression is matched in the process of matching the text to be verified according to the preset target information regular expression;
the second judging unit is used for judging whether the value-taking type information is visual type information in the text to be verified or not if the value-taking type information exists in the target information;
and the filtering unit is used for filtering the value-taking information if the value-taking information is not visual information in the text to be verified.
The description of the features in the embodiment corresponding to fig. 4 can refer to the related description of the embodiment corresponding to fig. 2, and is not repeated here.
According to the technical scheme, the retrieval unit utilizes the keywords to perform keyword retrieval on the initial text to obtain the text to be verified containing the keywords; when the keywords are keywords containing related enterprise related information, the screened text to be verified has more pertinence. After the text to be verified is obtained, the matching unit may match the text to be verified according to a preset target information regular expression to determine a detection result. The target information regular expression expresses all possible presentation forms of various keywords in the form of the regular expression, and can more comprehensively and accurately cover target information in various forms. The target information regular expression is used for matching the text to be verified, so that the target information in the text to be verified can be accurately verified, and the accuracy of target information detection is ensured.
Fig. 5 is a schematic structural diagram of another target information detection apparatus provided in the embodiment of the present application, including a creating unit 51, a first matching unit 52, and a second matching unit 53;
the establishing unit 51 is used for establishing a character string matching model according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information;
the first matching unit 52 is configured to match the text to be verified based on the character string matching model, and determine a character string included in the text to be verified;
the second matching unit 53 is configured to match the text to be verified according to the valid regular expression, and determine a detection result; the effective regular expression is a target information regular expression corresponding to a character string included in the text to be verified.
Optionally, the method includes a first judging unit, a second judging unit and a filtering unit in the process of executing matching of the text to be verified according to the effective regular expression;
the first judging unit is used for judging whether value type information exists in target information or not when the target information matched with the effective regular expression is matched in the process of matching the text to be verified according to the effective regular expression;
the second judging unit is used for judging whether the value-taking information is visual information in the text to be verified if the value-taking information exists in the target information;
and the filtering unit is used for filtering the value-taking information if the value-taking information is not visual information in the text to be verified.
Optionally, the detection result includes: each target information contained in the text to be verified;
correspondingly, the packaging device also comprises a packaging unit;
and the packaging unit is used for packaging each target information contained in the text to be verified according to the set text format and displaying the packaged target information.
Optionally, a filtration unit is further included;
the filtering unit is used for filtering the text to be verified according to preset filtering information before the text to be verified is matched according to the effective regular expression so as to obtain an effective text;
correspondingly, the second matching unit is used for matching the effective text according to the effective regular expression.
The description of the features in the embodiment corresponding to fig. 5 may refer to the related description of the embodiment corresponding to fig. 3, and is not repeated here.
According to the technical scheme, the establishing unit establishes a character string matching model according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information; the first matching unit matches the text to be verified based on the character string matching model, and determines character strings included in the text to be verified. Fuzzy matching of the text to be checked can be achieved by constructing a character string matching model. The target information regular expression corresponding to the character string included in the text to be verified can be called as an effective regular expression, and the second matching unit matches the text to be verified according to the effective regular expression to determine the detection result. Compared with the mode of directly matching the text to be verified with all the target information regular expressions, the method and the device can quickly select the effective regular expressions matched with the text to be verified by constructing the character string matching model, so that the regular expressions which do not need to be matched accurately are filtered, the text to be verified only needs to be matched with the effective regular expressions, and the efficiency of matching the text to be verified is greatly improved.
Fig. 6 is a schematic diagram of a hardware structure of a target information detection device 60 according to an embodiment of the present application, where the hardware structure includes:
a memory 61 for storing a computer program;
a processor 62 for executing a computer program to implement the steps of the target information detection method according to any of the above embodiments.
The embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the target information detection method described in any of the above embodiments are implemented.
A method, an apparatus, a device, and a computer-readable storage medium for detecting object information provided in the embodiments of the present application are described in detail above. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Claims (10)

1. A target information detection method is characterized by comprising the following steps:
carrying out keyword retrieval on the initial text by using the keywords to obtain a text to be verified containing the keywords; the keyword is associated with the target information;
and matching the text to be verified according to a preset target information regular expression to determine a detection result.
2. The method for detecting the target information according to claim 1, wherein the matching the text to be verified according to a preset target information regular expression to determine the detection result comprises:
establishing a character string matching model according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information;
matching a text to be verified based on the character string matching model, and determining a character string included in the text to be verified;
matching the text to be verified according to the effective regular expression to determine a detection result; and the effective regular expression is a target information regular expression corresponding to a character string included in the text to be verified.
3. The method for detecting target information according to claim 1, wherein before the matching the text to be verified according to the preset target information regular expression, the method further comprises:
filtering the text to be verified according to preset filtering information to obtain an effective text;
correspondingly, matching the text to be verified according to a preset regular expression of target information comprises:
and matching the effective text according to a preset target information regular expression.
4. The method for detecting the target information according to claim 1, wherein in the process of matching the text to be verified according to the preset target information regular expression, the following steps are performed:
in the process of matching the text to be verified according to a preset target information regular expression, if target information matched with the target information regular expression is matched, judging whether value-taking information exists in the target information;
if the value-taking information exists in the target information, judging whether the value-taking information is visual information in the text to be verified;
and if the value-taking information is not visual information in the text to be verified, filtering the value-taking information.
5. A target information detection method is characterized by comprising the following steps:
establishing a character string matching model according to a preset target information regular expression; the character string matching model is associated with character strings corresponding to the regular expressions of the target information;
matching a text to be verified based on the character string matching model, and determining a character string included in the text to be verified;
matching the text to be verified according to the effective regular expression, and determining a detection result; the effective regular expression is a target information regular expression corresponding to a character string included in the text to be verified.
6. The method according to claim 5, wherein in the process of performing the matching of the text to be verified according to the valid regular expression, the following steps are performed:
in the process of matching the text to be verified according to the effective regular expression, if target information matched with the effective regular expression is matched, judging whether value-taking information exists in the target information;
if the value-taking information exists in the target information, judging whether the value-taking information is visual information in the text to be verified;
and if the value-taking type information is not visual type information in the text to be verified, filtering the value-taking type information.
7. The sensitive information detection method according to claim 5, wherein the detection result comprises: each target information contained in the text to be verified;
correspondingly, after the detection result is determined, the method further comprises the following steps:
and according to a set text format, packaging each target information contained in the text to be verified, and displaying the packaged target information.
8. The method for detecting target information according to claim 5, further comprising, before the matching the text to be verified according to the valid regular expression:
filtering the text to be verified according to preset filtering information to obtain an effective text;
correspondingly, the matching the text to be verified according to the valid regular expression comprises:
and matching the effective text according to an effective regular expression.
9. An object information detecting apparatus characterized by comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the object information detection method according to any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the object information detection method according to any one of claims 1 to 8.
CN202011553669.5A 2020-12-24 2020-12-24 Target information detection method, device and medium Pending CN114676231A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011553669.5A CN114676231A (en) 2020-12-24 2020-12-24 Target information detection method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011553669.5A CN114676231A (en) 2020-12-24 2020-12-24 Target information detection method, device and medium

Publications (1)

Publication Number Publication Date
CN114676231A true CN114676231A (en) 2022-06-28

Family

ID=82070202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011553669.5A Pending CN114676231A (en) 2020-12-24 2020-12-24 Target information detection method, device and medium

Country Status (1)

Country Link
CN (1) CN114676231A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115544213A (en) * 2022-11-28 2022-12-30 上海朝阳永续信息技术股份有限公司 Method, device and storage medium for acquiring information in text
WO2024011933A1 (en) * 2022-07-11 2024-01-18 华为云计算技术有限公司 Combined sensitive-word detection method and apparatus, and cluster

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024011933A1 (en) * 2022-07-11 2024-01-18 华为云计算技术有限公司 Combined sensitive-word detection method and apparatus, and cluster
CN115544213A (en) * 2022-11-28 2022-12-30 上海朝阳永续信息技术股份有限公司 Method, device and storage medium for acquiring information in text

Similar Documents

Publication Publication Date Title
CN110489345B (en) Crash aggregation method, device, medium and equipment
US20160306876A1 (en) Systems and methods of detecting information via natural language processing
US8359307B2 (en) Method and apparatus for building sales tools by mining data from websites
CN114676231A (en) Target information detection method, device and medium
CN112328936A (en) Website identification method, device and equipment and computer readable storage medium
CN108073708A (en) Information output method and device
CN113139025A (en) Evaluation method, device, equipment and storage medium of threat information
CN110532229B (en) Evidence file retrieval method, device, computer equipment and storage medium
CN112016317A (en) Sensitive word recognition method and device based on artificial intelligence and computer equipment
CN111881183A (en) Enterprise name matching method and device, storage medium and electronic equipment
CN109657462B (en) Data detection method, system, electronic device and storage medium
CN113626558B (en) Intelligent recommendation-based field standardization method and system
KR101742041B1 (en) an apparatus for protecting private information, a method of protecting private information, and a storage medium for storing a program protecting private information
CN110232071A (en) Search method, device and storage medium, the electronic device of drug data
US11625366B1 (en) System, method, and computer program for automatic parser creation
CN113254577A (en) Sensitive file detection method, device, equipment and storage medium
JP2013174988A (en) Similar document retrieval support apparatus and similar document retrieval support program
CN115563288B (en) Text detection method and device, electronic equipment and storage medium
CN113191777A (en) Risk identification method and device
CN112698883A (en) Configuration data processing method, device, terminal and storage medium
JP7282715B2 (en) Evaluation device, evaluation method and evaluation program
CN118069898B (en) Log generalization method and device for multiple log sources
CN113836288B (en) Method and device for determining service detection result and electronic equipment
CN117591624B (en) Test case recommendation method based on semantic index relation
CN113810237B (en) Method for checking network equipment configuration compliance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination