CN118069824A

CN118069824A - Risk identification method and device, storage medium and electronic equipment

Info

Publication number: CN118069824A
Application number: CN202410276238.0A
Authority: CN
Inventors: 赵易淳; 周书恒; 祝慧佳
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2024-03-11
Filing date: 2024-03-11
Publication date: 2024-05-24

Abstract

The specification discloses a risk identification method, a risk identification device, a storage medium and electronic equipment, wherein the risk identification method comprises the following steps: and responding to the input operation of the user, determining the text input by the user and taking the text as the text to be recognized. And determining the text matched with the text to be recognized from a pre-constructed knowledge base according to the keywords of the text to be recognized, and taking the text as a first text. And inputting the first text and the text to be identified into a pre-trained risk identification model, so that the risk identification model determines the first characteristic of the text to be identified under the prompt of the first text. And determining a plurality of rule features, and determining rule features matched with the text to be identified from the rule features as second features. And inputting the first characteristic and the second characteristic into a risk recognition model, and determining a risk recognition result of the text to be recognized. By introducing the first text and the second feature for identifying the text risk, the accuracy of the risk identification result of the text to be identified is improved.

Description

Risk identification method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a risk identification method, apparatus, storage medium, and electronic device.

Background

With the development of artificial intelligence and information technology, more and more service providers for providing services, particularly service providers for providing social platforms, are emerging. Meanwhile, privacy data is receiving public attention.

Currently, a user may generate service data in a process of using a service provided by a service provider, for example, on a social platform provided by the service provider, content uploaded, published or shared by the user is the service data, and the service data may be text, image, audio or video, etc. However, the service data generated by the user may have risks, so the service provider needs to perform risk identification on the service data to determine whether the service data has risks, so as to filter the service data having risks. Therefore, how to risk identify the business data generated by the user is a very important issue.

Based on this, a method of risk identification is provided in the present specification.

Disclosure of Invention

The present disclosure provides a risk identification method, apparatus, storage medium and electronic device, so as to partially solve the foregoing problems in the prior art.

The technical scheme adopted in the specification is as follows:

the present specification provides a method of risk identification, comprising:

Responding to input operation of a user, determining a text input by the user and taking the text as a text to be recognized;

Determining a text matched with the text to be identified from a pre-constructed knowledge base according to the keywords of the text to be identified, and taking the text as a first text;

Inputting the first text and the text to be identified into a pre-trained risk identification model, so that the risk identification model determines first characteristics of the text to be identified under the prompt of the first text;

Determining a plurality of rule features; the rule features are obtained by extracting features of rules for identifying whether the text has risks or not;

Determining rule features matched with the text to be identified from the rule features, and taking the rule features as second features;

and inputting the first characteristic and the second characteristic into the risk recognition model, and determining a risk recognition result of the text to be recognized.

Optionally, determining, according to the keywords of the text to be identified, a text matched with the text to be identified from a pre-constructed knowledge base, and taking the text as a first text, where the method specifically includes:

Determining a plurality of knowledge texts contained in a pre-constructed knowledge base;

Constructing a dictionary tree according to each knowledge text;

determining keywords of the texts to be identified, determining each text containing the keywords based on the dictionary tree, and taking the texts as each text to be selected;

And determining a first text according to each candidate text.

Optionally, determining the first text according to each candidate text specifically includes:

Performing word segmentation processing on the text to be identified by adopting a preset word segmentation device to obtain each word segment;

And according to the word segmentation, aiming at each text to be selected, when the text to be selected meets the specified condition, taking the text to be selected as a first text.

Determining a risk score corresponding to each text to be selected based on a pre-trained risk scoring model aiming at each text to be selected;

Screening each text to be selected according to the risk scores corresponding to the text to be selected respectively;

and taking the screened text to be selected as a first text.

Optionally, determining a rule feature matched with the text to be identified from the rule features, and taking the rule feature as a second feature, wherein the rule feature specifically comprises:

inputting the text to be identified into the risk identification model, and determining target characteristics;

for each rule feature, determining the similarity between the rule feature and the target feature;

screening the rule features according to the similarity;

And taking the screened rule features as second features matched with the text to be identified.

Optionally, determining a number of rule features specifically includes:

Determining a plurality of rules contained in a preset rule base; wherein the rule is used for identifying whether the text is at risk;

and inputting each rule into the risk identification model, and determining rule characteristics corresponding to the rule.

Optionally, the method further comprises:

Responding to uploading operation of wind control personnel, determining a rule uploaded by the wind control personnel and taking the rule as a first rule;

The first rule is added to the rule base.

Optionally, the method further comprises:

when the risk identification result is that no risk exists, displaying the text to be identified;

and when the risk identification result is that the risk exists, sending prompt information to the user.

Optionally, pre-training the risk identification model specifically includes:

determining a text input by a user historically and taking the text as a text sample;

Determining a text matched with the text sample from the knowledge base according to the keywords corresponding to the text sample, and taking the text as a first text;

Inputting the first text and the text sample into a risk recognition model to be trained, so that the risk recognition model to be trained determines a first characteristic of the text sample under the prompt of the first text;

Determining a plurality of rule features;

determining rule features matched with the text sample from the rule features, and taking the rule features as second features;

Inputting the first feature and the second feature into the risk recognition model to be trained, and determining a recognition result of the text sample;

Determining a risk label corresponding to the text sample;

And training the risk identification model to be trained according to the risk marking and the identification result.

The present specification provides an apparatus for risk identification, comprising:

The first determining module is used for responding to the input operation of a user, determining the text input by the user and taking the text as the text to be identified;

The first matching module is used for determining a text matched with the text to be identified from a pre-constructed knowledge base according to the keywords of the text to be identified and taking the text as a first text;

The second determining module is used for inputting the first text and the text to be identified into a pre-trained risk identification model so that the risk identification model determines first characteristics of the text to be identified under the prompt of the first text;

A third determining module for determining a plurality of rule features; the rule features are obtained by extracting features of rules for identifying whether the text has risks or not;

The second matching module is used for determining rule features matched with the text to be identified from the rule features and taking the rule features as second features;

And the recognition module is used for inputting the first characteristic and the second characteristic into the risk recognition model and determining a risk recognition result of the text to be recognized.

Optionally, the first matching module is specifically configured to determine a plurality of knowledge texts included in a pre-constructed knowledge base; constructing a dictionary tree according to each knowledge text; determining keywords of the texts to be identified, determining each text containing the keywords based on the dictionary tree, and taking the texts as each text to be selected; and determining a first text according to each candidate text.

Optionally, the first matching module is specifically configured to perform word segmentation processing on the text to be identified by using a preset word segmentation device to obtain each word segment; and according to the word segmentation, aiming at each text to be selected, when the text to be selected meets the specified condition, taking the text to be selected as a first text.

Optionally, the first matching module is specifically configured to determine, for each text to be selected, a risk score corresponding to the text to be selected based on a pre-trained risk scoring model; screening each text to be selected according to the risk scores corresponding to the text to be selected respectively; and taking the screened text to be selected as a first text.

Optionally, the second matching module is specifically configured to input the text to be identified into the risk identification model, and determine a target feature; for each rule feature, determining the similarity between the rule feature and the target feature; screening the rule features according to the similarity; and taking the screened rule features as second features matched with the text to be identified.

Optionally, the third determining module is specifically configured to determine a plurality of rules included in a preset rule base; wherein the rule is used for identifying whether the text is at risk; and inputting each rule into the risk identification model, and determining rule characteristics corresponding to the rule.

Optionally, the apparatus further comprises:

The uploading module is used for responding to the uploading operation of the wind control personnel, determining the rule uploaded by the wind control personnel and taking the rule as a first rule; the first rule is added to the rule base.

Optionally, the apparatus further comprises:

The prompting module is used for displaying the text to be recognized when the risk recognition result is that the risk is not found; and when the risk identification result is that the risk exists, sending prompt information to the user.

Optionally, the apparatus further comprises:

The training module is used for determining texts input by a user historically and taking the texts as text samples; determining a text matched with the text sample from the knowledge base according to the keywords corresponding to the text sample, and taking the text as a first text; inputting the first text and the text sample into a risk recognition model to be trained, so that the risk recognition model to be trained determines a first characteristic of the text sample under the prompt of the first text; determining a plurality of rule features; determining rule features matched with the text sample from the rule features, and taking the rule features as second features; inputting the first feature and the second feature into the risk recognition model to be trained, and determining a recognition result of the text sample; determining a risk label corresponding to the text sample; and training the risk identification model to be trained according to the risk marking and the identification result.

The present description provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of risk identification described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of risk identification described above when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

In the risk recognition method provided by the specification, text input by a user is determined in response to input operation of the user, and the text is used as text to be recognized. And determining the text matched with the text to be recognized from a pre-constructed knowledge base according to the keywords of the text to be recognized, and taking the text as a first text. And inputting the first text and the text to be identified into a pre-trained risk identification model, so that the risk identification model determines the first characteristic of the text to be identified under the prompt of the first text. Then, a plurality of rule features are determined, and from the rule features, the rule features matched with the text to be recognized are determined and used as second features. And inputting the first characteristic and the second characteristic into a risk recognition model, and determining a risk recognition result of the text to be recognized.

As can be seen from the above method, when risk identification is performed on a text, the method responds to input operation of a user, determines the text input by the user, and takes the text as the text to be identified. And determining the text matched with the text to be recognized from a pre-constructed knowledge base according to the keywords of the text to be recognized, and taking the text as a first text. And inputting the first text and the text to be identified into a pre-trained risk identification model, so that the risk identification model determines the first characteristic of the text to be identified under the prompt of the first text. And by introducing external knowledge, namely the first text, the auxiliary risk recognition model determines whether the text to be recognized has risks or not, so that the accuracy of a risk recognition result of the text to be recognized is improved. Then, a plurality of rule features are determined, and from the rule features, the rule features matched with the text to be recognized are determined and used as second features. And inputting the first characteristic and the second characteristic into a risk recognition model, and determining a risk recognition result of the text to be recognized. On the basis of the first feature, the risk recognition result of the text to be recognized, which is output by the risk recognition model, is more accurate by introducing the rule feature for recognizing the risk of the text, namely the second feature.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. Attached at

In the figure:

FIG. 1 is a flow chart of a method of risk identification provided in the present specification;

FIG. 2 is a schematic diagram of a risk identification process provided in the present specification;

FIG. 3 is a schematic flow chart of a training method of a risk identification model provided in the present specification;

FIG. 4 is a schematic diagram of an apparatus for risk identification provided in the present specification;

fig. 5 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The embodiments of the present disclosure provide a risk identification method, apparatus, storage medium, and electronic device, and in the following, with reference to the drawings, the technical solutions provided by each embodiment of the present disclosure are described in detail.

Fig. 1 is a flow chart of a risk identification method provided in the present specification, specifically including the following steps:

S100: and responding to the input operation of the user, determining the text input by the user and taking the text as the text to be recognized.

In this specification, a user may generate service data during use of a service provided by a service provider, but the service data generated by the user may have risks, so the service provider needs to perform risk identification on the service data to determine whether the service data has risks, thereby filtering the service data having risks. Based on this, the apparatus for risk recognition determines text input by the user as text to be recognized in response to an input operation by the user. The device for risk identification may be a server, or a device such as a mobile phone, a personal computer (Personal Computer, PC) or the like capable of executing the aspects of the present description. For convenience of explanation, the following description will be made with the server as the execution subject.

In this specification, the service provided by the service provider is different, and the text input by the user is different. When the service provided by the service provider is social service, the text uploaded, published or shared by the user is the text input by the user, namely the text to be identified, and the text input by the user can be chat information between the user and other users and can also be content shared by the user. When the service provided by the service provider is an intelligent dialogue service, the text input by the user is a question posed by the user. When the service provided by the service provider is comment service, the text input by the user is content of user comments, and the content of the user comments can be comments on movies or topics, and the specification is not limited specifically. The specific service provider provides what kind of service, and the text input by the user is the text generated by using what kind of service, and the specification is not particularly limited. For convenience of explanation, the following description will be given taking a service provided by a service provider as a comment service, and the text input by the user is the content of the user comment.

S102: and determining the text matched with the text to be recognized from a pre-constructed knowledge base according to the keywords of the text to be recognized, and taking the text as a first text.

In this specification, in determining whether or not there is a risk in the text to be recognized, other external knowledge is required in addition to determining the risk based on the text to be recognized itself, to assist in determining whether or not there is a risk in the text to be recognized. The external knowledge may be historical events, real-time events, personal name information, place name information, literature works, etc. The server can determine the text matched with the text to be recognized from a pre-constructed knowledge base according to the keywords of the text to be recognized and serve as the first text. The keywords of the text to be identified may be all the texts included in the text to be identified, and may be part of the content included in the text to be identified. The knowledge base is pre-constructed, and comprises a plurality of knowledge texts, wherein the knowledge texts can be obtained from the wikipedia by a server, namely the server obtains the texts in the wikipedia and serves as the knowledge texts, and the knowledge base is constructed based on the knowledge texts. The first text is a text determined from a knowledge base based on keywords of the text to be recognized, and a matching relationship exists between the first text and the text to be recognized.

Specifically, the server may determine keywords of the text to be identified, and determine several knowledge texts contained in the pre-built knowledge base. And judging whether the knowledge text is matched with the text to be identified according to the keywords aiming at each knowledge text, and if so, taking the knowledge text as the text to be selected. And determining a first text according to each candidate text. When judging whether the knowledge text is matched with the text to be identified according to the keywords, the server can judge whether the knowledge text contains the keywords or not, if yes, the knowledge text is determined to be matched with the text to be identified, and if not, the knowledge text is determined to be not matched with the text to be identified.

In addition, in order to more quickly determine the first text matching the text to be recognized based on the keyword of the text to be recognized, when determining the text matching the text to be recognized from a pre-constructed knowledge base according to the keyword of the text to be recognized and serving as the first text, the server may determine a plurality of knowledge texts contained in the pre-constructed knowledge base. And constructing a dictionary tree according to each knowledge text. And determining keywords of the texts to be identified, and determining each text containing the keywords based on the dictionary tree and taking the texts as each text to be selected. And determining a first text according to each candidate text. Wherein the nodes in the dictionary tree may include words in the knowledge text. When the dictionary tree is constructed according to each knowledge text, the server can construct the dictionary tree based on each knowledge text through an AC automaton (Aho-Corasick automaton). And determining each text containing the keywords based on the dictionary tree, and taking the text as each text to be selected. According to each text to be selected, the server can perform pattern matching on the keywords by adopting a preset pattern matching algorithm based on the dictionary tree, determine each text containing the keywords and serve as each text to be selected.

In this specification, since the knowledge text in the knowledge base is obtained from wikipedia, the knowledge text may be an article including a title and a body. If the title of the article comprises keywords of the text to be identified, the article is related to the text to be identified, and the article is matched with the text to be identified. If the title of the article does not include keywords of the text to be identified, the article is not related to the text to be identified, and the article is not matched with the text to be identified. Therefore, in order to accelerate the matching speed, the server may determine a plurality of articles included in the pre-built knowledge base when determining a plurality of knowledge texts included in the pre-built knowledge base, determine titles corresponding to the articles, and then use the titles as the knowledge texts. Based on the above, each text to be selected is determined as a title, and when determining the first text according to each text to be selected, the server may directly use each text to be selected as each first text.

In this specification, in order to make the determined first text more accurate and better help the risk recognition model recognize whether the text to be recognized is at risk, the server may determine the text at risk from the candidate text and use the determined text as the first text. If the server is used for identifying whether the text to be identified relates to sensitive information, the text with risk is the text related to the sensitive information. If the server identifies whether the text to be identified has a risk of violating rules, the text with risk is the text with risk of violating rules. The present specification does not specifically limit what kind of risk is specifically identified, what kind of text is text in which there is a risk. For ease of description, the recognition of whether text to be recognized relates to sensitive information is described in this specification as an example.

Based on the risk score, when determining the first text according to each candidate text, determining a risk score corresponding to each candidate text based on a pre-trained risk scoring model according to each candidate text. And screening each text to be selected according to the risk scores corresponding to the texts to be selected respectively, and taking the screened text to be selected as a first text. The risk scoring model is a model pre-trained by the server, and can also be any existing risk scoring model, and the specification is not particularly limited. The risk score characterizes the situation where the text is at risk. Specifically, taking an example of identifying whether the text to be identified relates to sensitive information or not, the risk scoring model is used for scoring whether the text relates to the sensitive information or not, and the higher the risk score of the text, the more the text relates to the sensitive information. Conversely, the lower the risk score of a text, the less sensitive information the text is referred to.

When the texts to be selected are screened according to the risk scores corresponding to the texts to be selected, the server can sort the texts to be selected according to the order of the risk scores from high to low, so as to obtain a text sequence. And determining a preset number of the texts to be selected as screened texts to be selected according to the text sequence. Wherein the preset number is a preset value. When a preset number of candidate texts are determined as the screened candidate texts according to the text sequences, the server can determine the preset number of candidate texts as the screened candidate texts according to the sequence of the text sequences from the first position of the text sequences. For example, the text to be selected is text 1-5, and the risk score corresponding to each of the text 1-5 is 0.2,0.4,0.9,0.8,0.7. The text sequence is text 3, text 4, text 5, text 2, text 1. And if the preset number is 3, the server determines the preset number of candidate texts as screened candidate texts, namely a text 3, a text 4 and a text 5, according to the sequence of the text sequences, starting from the first text (namely the text 3) of the text sequences.

In the present disclosure, when determining the risk score corresponding to the candidate text based on the pre-trained risk scoring model, the server may input the candidate text into the pre-trained risk scoring model to determine the risk score corresponding to the candidate text. In addition, the text to be selected may be the title of the article in the knowledge base, so in order to determine the risk score corresponding to the text to be selected more accurately, the server may determine the article corresponding to the text to be selected from the knowledge base, and then input the article into a pre-trained risk scoring model to determine the risk score corresponding to the text to be selected. Of course, to determine the risk score corresponding to the candidate text more quickly, the server may also determine the article corresponding to the candidate text from the knowledge base. And inputting texts in the articles into a pre-trained risk scoring model, and determining the risk score corresponding to the text to be selected.

S104: and inputting the first text and the text to be identified into a pre-trained risk identification model, so that the risk identification model determines the first characteristic of the text to be identified under the prompt of the first text.

The server may input the first text and the text to be identified into a pre-trained risk recognition model, so that the risk recognition model determines a first feature of the text to be identified under the prompt of the first text. The first text is determined by the server from the knowledge base, and has a matching relationship with the text to be identified, and the first text can be used as prompt information to assist a risk identification model to determine whether the text to be identified has risks. The risk recognition model is a model pre-trained by a server and is used for determining whether the text to be recognized has risk or not. The risk recognition model may be a large language model, and the risk recognition model may be trained on the basis of an encoder-decoder (T5) large language model. Specifically, the server may splice the first text and the text to be identified, and then input the spliced text into a pre-trained risk identification model to determine the first feature of the text to be identified.

In the present disclosure, the first text may be a title of an article in a knowledge base, and in order to better prompt a risk recognition model to perform risk recognition on a text to be recognized, a server may determine an article corresponding to the first text from the knowledge base, and then input the article and the text to be recognized into a pre-trained risk recognition model, so that the risk recognition model determines a first feature of the text to be recognized under the prompt of the article. Specifically, the server may determine an article corresponding to the first text from the knowledge base, splice the article and the text to be identified, and input the spliced text into a pre-trained risk identification model to determine a first feature of the text to be identified.

S106: determining a plurality of rule features; the rule features are obtained by extracting features of rules for identifying whether the text is at risk or not.

In the present specification, when determining whether a text to be identified has a risk, in addition to determining the risk based on the text to be identified itself and external knowledge, a rule for identifying whether the text has a risk is required, where the rule is preset by a wind-control person, and the rule has a correspondence with the risk, that is, identifies what type of risk, and the rule is a rule for identifying what type of risk has. If the text to be identified is related to the sensitive information, the rule is a rule for identifying whether the text to be identified is related to the sensitive information. If yes, the rule is a rule for identifying whether the text to be identified has the rule for identifying the rule, specifically identifying the risk, and the rule is used for identifying the risk, and the specification is not limited specifically. For ease of description, the recognition of whether text to be recognized relates to sensitive information is described in this specification as an example.

Based on this, the server may determine a number of rule features, wherein a rule feature is a feature extraction of a rule for identifying whether a text is at risk. Specifically, the server may determine a plurality of rules, and for each rule, determine a rule feature corresponding to the rule. The rules are used for identifying whether the text is at risk or not, and the rules are preset for wind control personnel. Since in this specification, taking as an example whether the text to be recognized relates to sensitive information, a rule is a rule for recognizing whether the text to be recognized relates to sensitive information, and the rule may be that when there is a sensitive word in the text, the text relates to sensitive information. The rule may also be that when there is a target text in the text, the text relates to sensitive information. In addition, the server may also determine a number of rules contained in a preset rule base. And inputting each rule into a risk identification model, and determining rule characteristics corresponding to the rule. Wherein the rules are used to identify whether the text is at risk.

S108: and determining rule features matched with the text to be identified from the rule features, and taking the rule features as second features.

The server may determine, from among the rule features, a rule feature that matches the text to be identified, and act as a second feature. Wherein the second feature is a rule feature matching the text to be identified. Specifically, the server may input the text to be identified into the risk identification model to determine the target feature. And determining rule features matched with the text to be identified from the rule features according to the target features, and taking the rule features as second features.

In addition, when determining the rule feature matching the text to be identified from the rule features and taking the rule feature as the second feature, the server can input the text to be identified into the risk identification model to determine the target feature. For each rule feature, a similarity between the rule feature and the target feature is determined. And screening the rule features according to the similarity. And taking the screened rule features as second features matched with the text to be identified. When screening the rule features according to the similarities, the server can sort the rule features according to the sequence from the big similarity to the small similarity to obtain a rule sequence. And determining the specified number of rule features as the screened rule features according to the rule sequence. The above specified number is a preset value.

S110: and inputting the first characteristic and the second characteristic into the risk recognition model, and determining a risk recognition result of the text to be recognized.

The server can input the first feature and the second feature into a risk recognition model to determine a risk recognition result of the text to be recognized. Wherein the risk identification result may be one of risky and risky. When the risk identification result is that the risk exists, the text to be identified is related to sensitive information, the server cannot display the text to be identified, and the server can send prompt information to the user. When the risk recognition result is that the risk is not generated, the text to be recognized is not related to the sensitive information, and the server can display the text to be recognized. The prompt information may be text or voice, and the description is not limited specifically. The prompt information is used for prompting the user that the input text to be recognized is at risk.

Specifically, the server may splice the first feature and the second feature, and input the spliced feature into the risk recognition model to determine a risk recognition result of the text to be recognized.

As can be seen from the above method, in risk recognition, the server may determine text input by the user in response to an input operation by the user, and serve as text to be recognized. And determining the text matched with the text to be recognized from a pre-constructed knowledge base according to the keywords of the text to be recognized, and taking the text as a first text. And inputting the first text and the text to be identified into a pre-trained risk identification model, so that the risk identification model determines the first characteristic of the text to be identified under the prompt of the first text. By introducing external knowledge (namely the first text), the auxiliary risk recognition model determines whether the text to be recognized has risk or not, so that the accuracy of a risk recognition result of the text to be recognized is improved. Then, a plurality of rule features are determined, and from the rule features, the rule features matched with the text to be recognized are determined and used as second features. And inputting the first characteristic and the second characteristic into a risk recognition model, and determining a risk recognition result of the text to be recognized. On the basis of the first feature, by introducing the rule feature (namely the second feature) for identifying the text risk, the risk identification result of the text to be identified, which is output by the risk identification model, is more accurate.

In this specification, in order to avoid a logical contradiction between the determined first text and the text to be recognized, when the keywords in the text to be selected are consecutive, but the keywords in the text to be recognized are separated, for example, the keywords are AB, the keywords in the text to be selected are AB, and a and B are consecutive, but the keywords AB also exist in the text to be recognized, and a and B are separated, so that although the keywords are included in the text to be selected, a logical contradiction exists between the text to be selected and the text to be recognized. Based on this, in order to increase the accuracy of the determined first text, in step S102, when determining the first text according to each text to be selected, the server may use a preset word segmentation device to perform word segmentation on the text to be identified, so as to obtain each word segment. And according to each word, aiming at each text to be selected, when the text to be selected meets the specified condition, taking the text to be selected as a first text. The word segmentation device can be a model pre-trained by the server, and can also be any existing model or algorithm, and the specification is not limited in detail.

When the text to be selected is used as a first text according to each word, and the text to be selected meets the specified condition, the server can use each word as each standard word. Aiming at each text to be selected, a preset word segmentation device is adopted, and the word segmentation contained in the text to be selected is determined and used as the word segmentation to be detected. Judging whether each standard word segment contains a word to be detected corresponding to the text to be selected or not according to each standard word segment, if so, determining that the text to be selected meets the specified condition, and taking the text to be selected as a first text. If not, determining that the text to be selected does not meet the specified condition, and taking the text to be selected as the first text. The specified condition is that each word segment corresponding to the text to be identified contains the word segment corresponding to the text to be selected. In addition, since there may be a plurality of words in the text to be selected, that is, there may be a plurality of words to be detected, when determining whether each standard word includes a word to be detected corresponding to the text to be selected according to each standard word, the server may determine whether each standard word includes at least one word to be detected corresponding to the text to be selected according to each standard word.

In this specification, the text to be selected may be a text to be selected by a server, which performs pattern matching on a keyword by using a preset pattern matching algorithm based on a dictionary tree, and each text including the keyword is determined, so that there may be an inclusion relationship between each text to be selected, that is, one text to be selected may be composed of at least two other texts to be selected, for example, one text to be selected is a great wall, and the other two texts to be selected are respectively a great wall and a great wall, so that the text to be selected (i.e., the great wall) and the other two texts to be selected (i.e., the great wall and the great wall) have an inclusion relationship. For the candidate texts with the containing relation, the server can take the candidate text with longer text length as the first text, so that the accuracy of the first text is improved. Based on this, in the step S102, when determining the first text according to each candidate text, the server may combine the candidate texts having the inclusion relationship based on each candidate text, to obtain each combination. And taking the candidate texts except the candidate texts contained in each combination in each candidate text as a first text. And then, determining the text lengths corresponding to the candidate texts contained in the combination according to the combinations, and taking the candidate text with the longest text length in the candidate texts contained in the combination as a first text.

When the candidate texts with the inclusion relationship are combined based on the candidate texts to obtain the combinations, the server can judge whether the candidate texts can be composed of at least two other candidate texts according to the candidate texts, and if so, the inclusion relationship between the candidate texts and the at least two other candidate texts is determined. And then, the server combines the candidate texts with the inclusion relationship according to the inclusion relationship among the candidate texts to obtain each combination.

In this specification, when determining the first text according to each candidate text in step S102, the server may determine the first text according to the risk data of each candidate text, and the server may also use the candidate text satisfying the specified condition as the first text, and of course, the server may also use the candidate text having a longer text length in the candidate texts having the inclusion relationship as the first text, and use the candidate texts other than the candidate texts included in each combination in each candidate text as the first text.

Based on this, when determining the first text according to each text to be selected in step S102, the server may first use a preset word segmentation device to perform word segmentation on the text to be identified, so as to obtain each word segment. And according to each word, aiming at each text to be selected, when the text to be selected meets the specified condition, taking the text to be selected as an initial text. The server may then combine the initial texts in which the inclusion relationship exists based on the respective initial texts, resulting in respective combinations. And taking the initial texts except the initial texts contained in each combination in each initial text as intermediate texts. For each combination, determining the text length corresponding to the initial text contained in the combination, and taking the initial text with the longest text length in the initial text contained in the combination as an intermediate text. Finally, the server may determine, for each intermediate text, a risk score corresponding to the intermediate text based on a pre-trained risk scoring model. And screening each intermediate text according to the risk scores corresponding to the intermediate texts respectively, and taking the screened intermediate text as a first text. The specific process is similar to the specific process in step S102, and will not be described here again.

In the present specification, the knowledge text in the knowledge base may be obtained from wikipedia, and may be internal knowledge uploaded in advance by a wind control person, where the internal knowledge may be a list text, and the list text may be a risk person list, where names on the risk person list are at risk, that is, names on the risk person list are sensitive information. The list class text may also be a risk organization list, where names on the risk organization list are at risk, that is, names on the risk organization list are sensitive information. The internal knowledge is determined by the wind control personnel and uploaded to the knowledge base by the wind control personnel. The server may also determine internal knowledge uploaded by the wind control personnel in response to the uploading knowledge operation of the wind control personnel and add the internal knowledge to the knowledge base.

In this specification, to better determine whether there is a risk in the text to be identified, the wind control personnel may add new rules to the rule base in real time. The server may determine a rule uploaded by the wind control personnel in response to the uploading operation of the wind control personnel, and use the rule as the first rule. The first rule is added to a rule base.

In this specification, the risk recognition model may include an encoding layer for extracting text features and a decoding layer for determining risk recognition results. Therefore, in the step S104, the server may input the first text and the text to be identified into the coding layer of the risk identification model trained in advance, so that the coding layer of the risk identification model determines the first feature of the text to be identified under the prompt of the first text. Specifically, the server may splice the first text and the text to be identified, and then input the spliced text into the coding layer of the pre-trained risk recognition model to determine the first feature of the text to be identified. In step S106, the rule is input into the risk recognition model, and when determining the rule feature corresponding to the rule, the server may input the rule into the coding layer of the risk recognition model, and determine the rule feature corresponding to the rule. In the step S108, the text to be identified is input into the risk identification model, and when the target feature is determined, the server may input the text to be identified into the coding layer of the risk identification model, and determine the target feature.

In the step S110, the server may input the first feature and the second feature into a decoding layer of the risk recognition model, and determine a risk recognition result of the text to be recognized. Specifically, the server may splice the first feature and the second feature, and then input the spliced features into a decoding layer of the risk recognition model to determine a risk recognition result of the text to be recognized.

In the present specification, as shown in fig. 2, fig. 2 is a schematic diagram of a risk recognition process provided in the present specification, and a server may determine text input by a user in response to an input operation of the user, and serve as text to be recognized. And determining the text matched with the text to be recognized from a pre-constructed knowledge base according to the keywords of the text to be recognized, and taking the text as a first text. The knowledge text in the knowledge base is internal knowledge obtained from Wikipedia and uploaded by wind control personnel. Then, the server may splice the first text and the text to be identified, and input the spliced text into a coding layer of a pre-trained risk recognition model to determine a first feature of the text to be identified. The server may then determine a number of rules contained in a preset rule base. And inputting each rule into a coding layer of the risk identification model, and determining rule characteristics corresponding to the rule. The server may then enter the text to be identified into the coding layer of the risk identification model, determining the target feature. And determining rule features matched with the text to be identified from the rule features according to the target features. Then, the server may input the first feature and the second feature into a decoding layer of the risk recognition model, and determine a risk recognition result of the text to be recognized.

In this specification, when the risk recognition model is pre-trained, the server may train according to the process shown in fig. 3, and fig. 3 is a flow chart of a training method of the risk recognition model provided in this specification, which specifically includes the following steps:

S200: text input by a user historically is determined and used as a text sample.

S202: and determining the text matched with the text sample from the knowledge base according to the keywords corresponding to the text sample, and taking the text as a first text.

The server may determine text historically entered by the user and act as a text sample. And determining the text matched with the text sample from the knowledge base according to the keywords corresponding to the text sample, and taking the text as a first text. The text sample may be content of a user comment, where the content of the user comment may be a comment on a movie or a comment on a topic, and the description is not limited specifically.

Specifically, the server may first determine text that has been historically entered by the user and act as a text sample. The server may then determine keywords for the text sample and determine a number of knowledge texts contained in the knowledge base. And judging whether the knowledge text is matched with the text sample according to the keywords aiming at each knowledge text, and if so, taking the knowledge text as a text to be selected. And determining a first text according to each candidate text. The specific process of step S202 is similar to the process of step S102, and will not be described here.

S204: and inputting the first text and the text sample into a risk recognition model to be trained, so that the risk recognition model to be trained determines the first characteristic of the text sample under the prompt of the first text.

And inputting the first text and the text sample into a risk recognition model to be trained, so that the risk recognition model to be trained determines the first characteristic of the text sample under the prompt of the first text. The risk recognition model to be trained may be a T5 large language model. Specifically, the server may splice the first text and the text sample, and then input the spliced text into the risk recognition model to be trained, so as to determine the first feature of the text sample. The specific process is similar to the process in step S104, and will not be described here again.

In addition, the risk recognition model to be trained can include a coding layer and a decoding layer, so that the server can splice the first text and the text sample, input the spliced text into the coding layer of the risk recognition model to be trained, and determine the first feature of the text sample.

S206: a number of rule features are determined.

S208: from among the rule features, a rule feature matching the text sample is determined and used as a second feature.

S210: and inputting the first characteristic and the second characteristic into a risk recognition model to be trained, and determining a recognition result of the text sample.

The server may determine a number of rule features, and from among the rule features, determine a rule feature that matches the text sample as a second feature. And inputting the first features and the second features into a risk recognition model to be trained, and determining a recognition result of the text sample. Wherein the recognition result may be one of risky and risky. Specifically, the server may determine a plurality of rules, and for each rule, the rule inputs a risk identification model, and determines a rule feature corresponding to the rule. The server may input the text sample into a risk recognition model to determine the target feature. And determining rule features matched with the text sample from the rule features according to the target features, and taking the rule features as second features. The server can splice the first features and the second features, input the spliced features into a risk identification model and determine a risk identification result of the text sample. The specific process is similar to the process in the above steps S106 to S110, and will not be described here again.

In addition, the risk recognition model to be trained can include a coding layer and a decoding layer, so that the server can splice the first text and the text sample, input the spliced text into the coding layer of the risk recognition model to be trained, and determine the first feature of the text sample. When the rule is input into the risk identification model and the rule characteristics corresponding to the rule are determined, the server may input the rule into the coding layer of the risk identification model and determine the rule characteristics corresponding to the rule. When the text sample is input into the risk recognition model and the target feature is determined, the server can input the text sample into the coding layer of the risk recognition model and determine the target feature. The first feature and the second feature are spliced, the spliced features are input into the risk recognition model, when the risk recognition result of the text sample is determined, the server can splice the first feature and the second feature, and then the spliced features are input into the decoding layer of the risk recognition model, so that the risk recognition result of the text sample is determined.

S212: and determining the risk label corresponding to the text sample.

S214: and training the risk identification model to be trained according to the risk marking and the identification result.

The server may determine a risk annotation corresponding to the text sample. And training the risk identification model to be trained according to the risk marking and the identification result. The risk labeling is that a wind control person labels the text sample in advance, and the risk labeling can be risk or risk-free. Specifically, the server may determine a risk annotation corresponding to the text sample. And training the risk identification model to be trained by taking the minimum difference between the risk labeling and the identification result as a target.

The above method for risk identification provided for one or more embodiments of the present disclosure further provides a corresponding apparatus for risk identification based on the same concept, as shown in fig. 4.

Fig. 4 is a schematic diagram of an apparatus for risk identification provided in the present specification, which specifically includes:

A first determining module 300, configured to determine, in response to an input operation by a user, a text input by the user, and serve as a text to be identified;

the first matching module 302 is configured to determine, from a pre-constructed knowledge base, a text that matches the text to be identified, according to the keyword of the text to be identified, and use the determined text as a first text;

a second determining module 304, configured to input the first text and the text to be identified into a pre-trained risk recognition model, so that the risk recognition model determines a first feature of the text to be identified under the prompt of the first text;

A third determining module 306 for determining a number of rule features; the rule features are obtained by extracting features of rules for identifying whether the text has risks or not;

a second matching module 308, configured to determine, from among the rule features, a rule feature that matches the text to be identified, and use the rule feature as a second feature;

And the recognition module 310 is configured to input the first feature and the second feature into the risk recognition model, and determine a risk recognition result of the text to be recognized.

Optionally, the first matching module 302 is specifically configured to determine a plurality of knowledge texts included in a pre-constructed knowledge base; constructing a dictionary tree according to each knowledge text; determining keywords of the texts to be identified, determining each text containing the keywords based on the dictionary tree, and taking the texts as each text to be selected; and determining a first text according to each candidate text.

Optionally, the first matching module 302 is specifically configured to perform word segmentation processing on the text to be identified by using a preset word segmentation device to obtain each word segment; and according to the word segmentation, aiming at each text to be selected, when the text to be selected meets the specified condition, taking the text to be selected as a first text.

Optionally, the first matching module 302 is specifically configured to determine, for each text to be selected, a risk score corresponding to the text to be selected based on a pre-trained risk scoring model; screening each text to be selected according to the risk scores corresponding to the text to be selected respectively; and taking the screened text to be selected as a first text.

Optionally, the second matching module 308 is specifically configured to input the text to be identified into the risk identification model, and determine a target feature; for each rule feature, determining the similarity between the rule feature and the target feature; screening the rule features according to the similarity; and taking the screened rule features as second features matched with the text to be identified.

Optionally, the third determining module 306 is specifically configured to determine a plurality of rules included in a preset rule base; wherein the rule is used for identifying whether the text is at risk; and inputting each rule into the risk identification model, and determining rule characteristics corresponding to the rule.

Optionally, the apparatus further comprises:

An uploading module 312, configured to determine, in response to an uploading operation of a wind control person, a rule uploaded by the wind control person, and use the rule as a first rule; the first rule is added to the rule base.

Optionally, the apparatus further comprises:

a prompt module 314, configured to display the text to be identified when the risk identification result is no risk; and when the risk identification result is that the risk exists, sending prompt information to the user.

Optionally, the apparatus further comprises:

A training module 316 for determining text historically entered by the user and as a text sample; determining a text matched with the text sample from the knowledge base according to the keywords corresponding to the text sample, and taking the text as a first text; inputting the first text and the text sample into a risk recognition model to be trained, so that the risk recognition model to be trained determines a first characteristic of the text sample under the prompt of the first text; determining a plurality of rule features; determining rule features matched with the text sample from the rule features, and taking the rule features as second features; inputting the first feature and the second feature into the risk recognition model to be trained, and determining a recognition result of the text sample; determining a risk label corresponding to the text sample; and training the risk identification model to be trained according to the risk marking and the identification result.

The present specification also provides a computer readable storage medium storing a computer program operable to perform the method of risk identification described above and shown in fig. 1.

The present specification also provides a schematic diagram of the electronic device shown in fig. 5. As shown in fig. 5, fig. 5 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification, and the electronic device includes a processor, an internal bus, a network interface, a memory, and a nonvolatile memory, and may include hardware required by other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs the computer program to implement the risk identification method shown in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language), and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of risk identification, comprising:

2. The method according to claim 1, according to the keywords of the text to be identified, determining the text matching with the text to be identified from a pre-constructed knowledge base, and taking the text as a first text, specifically including:

Constructing a dictionary tree according to each knowledge text;

And determining a first text according to each candidate text.

3. The method according to claim 2, wherein determining the first text according to the candidate texts specifically comprises:

4. The method according to claim 2, wherein determining the first text according to the candidate texts specifically comprises:

and taking the screened text to be selected as a first text.

5. The method according to claim 1, determining a rule feature matching the text to be identified from the rule features, and as a second feature, specifically comprising:

screening the rule features according to the similarity;

6. The method of claim 1, determining a number of rule features, comprising:

7. The method of claim 6, the method further comprising:

The first rule is added to the rule base.

8. The method of claim 1, the method further comprising:

9. The method of claim 1, pre-training a risk identification model, comprising in particular:

Determining a plurality of rule features;

Determining a risk label corresponding to the text sample;

10. An apparatus for risk identification, comprising:

11. The apparatus of claim 10, wherein the first matching module is specifically configured to determine a number of knowledge texts contained in a pre-constructed knowledge base; constructing a dictionary tree according to each knowledge text; determining keywords of the texts to be identified, determining each text containing the keywords based on the dictionary tree, and taking the texts as each text to be selected; and determining a first text according to each candidate text.

12. The apparatus of claim 11, wherein the first matching module is specifically configured to perform word segmentation on the text to be identified by using a preset word segmentation device to obtain each word segment; and according to the word segmentation, aiming at each text to be selected, when the text to be selected meets the specified condition, taking the text to be selected as a first text.

13. The apparatus of claim 11, wherein the first matching module is specifically configured to, for each candidate text, determine a risk score corresponding to the candidate text based on a pre-trained risk scoring model; screening each text to be selected according to the risk scores corresponding to the text to be selected respectively; and taking the screened text to be selected as a first text.

14. The apparatus of claim 10, the second matching module being specifically configured to input the text to be identified into the risk identification model to determine a target feature; for each rule feature, determining the similarity between the rule feature and the target feature; screening the rule features according to the similarity; and taking the screened rule features as second features matched with the text to be identified.

15. The device of claim 10, wherein the third determining module is specifically configured to determine a plurality of rules contained in a preset rule base; wherein the rule is used for identifying whether the text is at risk; and inputting each rule into the risk identification model, and determining rule characteristics corresponding to the rule.

16. The apparatus of claim 15, the apparatus further comprising:

17. The apparatus of claim 10, the apparatus further comprising:

18. The apparatus of claim 10, the apparatus further comprising:

19. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of the preceding claims 1 to 9.

20. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-9 when the program is executed.