WO2023240878A1 - Resource recognition method and apparatus, and device and storage medium - Google Patents

Resource recognition method and apparatus, and device and storage medium Download PDF

Info

Publication number
WO2023240878A1
WO2023240878A1 PCT/CN2022/127332 CN2022127332W WO2023240878A1 WO 2023240878 A1 WO2023240878 A1 WO 2023240878A1 CN 2022127332 W CN2022127332 W CN 2022127332W WO 2023240878 A1 WO2023240878 A1 WO 2023240878A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
resource
identified
identification
result
Prior art date
Application number
PCT/CN2022/127332
Other languages
French (fr)
Chinese (zh)
Inventor
张琳
孙想
谢强
***
于天宝
贠挺
陈国庆
林赛群
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023240878A1 publication Critical patent/WO2023240878A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a resource identification method, device, equipment and storage medium in the field of artificial intelligence technology.
  • the main methods for identifying resources in major media or platforms are: resource identification based on a priori information such as text semantic features and image features of the resources to be identified, so as to determine whether the resources to be identified are low-quality title resources.
  • the present disclosure provides a resource identification method, device, equipment and storage medium with higher accuracy.
  • a resource identification method including: obtaining posterior information and a priori information of the resource to be identified, where the posterior information is used to reflect the user's feedback information on the resource to be identified, so The a priori information is used to reflect the semantic information of the resource to be identified; the resource to be identified is identified according to the first identification model and the a posteriori information, and a first identification result is obtained; according to the second identification model and the a posteriori information Use the a priori information to identify the resource to be identified to obtain a second identification result; and generate a third identification result based on the first identification result and the second identification result.
  • a resource identification device including: a first acquisition module for acquiring posterior information and a priori information of the resource to be identified, where the posterior information is used to reflect the user's understanding of the resource to be identified.
  • Feedback information for identifying resources, the a priori information is used to reflect the semantic information of the resources to be identified;
  • the first identification module is used to identify the resources to be identified based on the first identification model and the a posteriori information.
  • the second identification module is used to identify the resource to be identified according to the second identification model and the prior information, and obtain the second identification result; and the generation module is used to identify the resource according to the second identification model and the prior information.
  • the first recognition result and the second recognition result generate a third recognition result.
  • an electronic device including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be executed by the at least one processor.
  • the instructions are executed by the at least one processor to enable the at least one processor to execute the method described in the present disclosure.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to perform the method described in the present disclosure.
  • a computer program product including a computer program that, when executed by a processor, implements the method described in the present disclosure.
  • the present disclosure provides a resource identification method, device, equipment and storage medium that combines a posteriori information and a priori information to identify whether the resource to be identified is a specific type of resource, thereby improving the accuracy of resource identification and user experience.
  • Figure 1 is a schematic flowchart of a resource identification method according to the first embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of a resource identification method according to the third embodiment of the present disclosure.
  • Figure 3 is a schematic flowchart of a resource identification method according to the fourth embodiment of the present disclosure.
  • Figure 4 is a schematic flowchart of a resource identification method according to the sixth embodiment of the present disclosure.
  • Figure 5 is a schematic flowchart of a resource identification method according to the seventh embodiment of the present disclosure.
  • Figure 6 is a schematic structural diagram of a resource identification device according to the tenth embodiment of the present disclosure.
  • Figure 7 is a block diagram of an electronic device used to implement a resource identification method according to an embodiment of the present disclosure.
  • Figure 1 is a schematic flow chart of a resource identification method according to the first embodiment of the present disclosure. As shown in Figure 1, the method mainly includes:
  • Step S101 Obtain posterior information and prior information of the resource to be identified.
  • the posterior information is used to reflect the user's feedback information of the resource to be identified, and the prior information is used to reflect the semantic information of the resource to be identified.
  • the resources to be identified can be articles, videos, or photo albums published on self-media platforms.
  • the posterior information is used to reflect the user's needs to be identified.
  • the feedback information of the resource the posterior information can reflect whether the user likes the resource to be identified.
  • the posterior information can include the user's likes, dislikes, comments, shares and reports of the resource to be identified;
  • the prior information is used for Reflects the semantic information of the resource to be identified.
  • the prior information may include information such as title text, content graphics and subtitles of the resource to be identified.
  • the posterior information and prior information of the resource to be identified can be stored in the backend database of the self-media platform to which the resource to be identified belongs, and the resource to be identified can be obtained from the backend database based on the unique identifier of the resource to be identified.
  • posterior information and prior information may be the resource number of the resource to be identified, and the resource number of the resource to be identified is used to search and extract the posterior information and a priori information of the resource to be identified from the background database.
  • Step S102 Identify the resource to be identified based on the first identification model and posterior information, and obtain a first identification result.
  • the first recognition model can be generated based on machine learning model training.
  • the machine learning model can be a neural network model, a decision tree model, a support vector machine model, etc. An appropriate machine learning model can be selected for training according to actual needs.
  • the first recognition model this disclosure does not limit the first recognition model.
  • the first recognition model is used to identify the posterior information, and the obtained first recognition result can reflect the resource to be identified from the user's perspective. Whether the title of the resource matches the content, that is, whether the title is false, exaggerated, or distorted, etc.
  • the first identification result may be "yes” or "no", respectively used to display that the resource to be identified is a specific type of resource and that the resource to be identified is not a specific type of resource.
  • the first identification result may also include the first identification result.
  • the first identification result may also include the reason why the resource to be identified is a resource of a specific type, such as the title does not match the title or the title is exaggerated.
  • Step S103 Identify the resource to be identified based on the second identification model and a priori information, and obtain a second identification result.
  • the second recognition model can also be generated based on machine learning model training.
  • the machine learning model can be a neural network model, a decision tree model, a support vector machine model, etc. An appropriate machine learning model can be selected according to actual needs.
  • the second recognition model is trained. This disclosure does not limit the second recognition model.
  • the second recognition model is used to identify the prior information, and the obtained second recognition result can reflect whether the title of the resource to be identified exists. Typos, missing words, unclear sentences, and whether there is any overlap between the semantics of the title and the semantics of the content, etc.
  • the second identification result may be "yes” or "no", respectively used to display that the resource to be identified is a specific type of resource and that the resource to be identified is not a specific type of resource.
  • the second identification result may also include the second identification result. According to the corresponding confidence level, if the resource to be identified is a resource of a specific type, the second identification result may also include the reason why the resource to be identified is a resource of a specific type, such as a typo in the title or inconsistent inscription.
  • Step S104 Generate a third recognition result based on the first recognition result and the second recognition result.
  • the first recognition result and the second recognition result can be combined to generate a third recognition result.
  • the third recognition result can be used to reflect whether the resource to be identified belongs to a specific Type of resource, for example, whether the resource to be identified is a clickbait resource.
  • the first identification result reflects whether the resource to be identified is a specific type of resource from a user perspective
  • the second identification result reflects whether the resource to be identified is a specific type of resource from a semantic perspective. That is to say, as long as the first If one of the identification results and the second identification result shows that the resource to be identified is a resource of a specific type, the resource to be identified can be considered to be a resource of a specific type; if both the first identification result and the second identification result indicate that the resource to be identified is not a resource of a specific type , it is considered that the resource to be identified is not a specific type of resource.
  • the first recognition result is obtained by using the first recognition model and a posteriori information
  • the second recognition result is obtained by using the second recognition model and a priori information
  • the resource identification method not only combines a priori information related to the semantics of the resource to be identified, but also combines a posteriori information related to the user, so that identification results consistent with the user's cognition can be obtained, which can make up for the use of only
  • the problem of insufficient identification ability of resource titles such as exaggeration and falsehood can further improve the accuracy of resource identification.
  • the first recognition model and the second recognition model are obtained in the following manner:
  • the first training set includes the first sample resource for which the resource identification result has been obtained and its posterior information.
  • the second training set includes the second sample resource for which the resource identification result has been obtained and its posterior information.
  • the first training set is input into the first machine learning model for training to obtain the first recognition model
  • the second training set is input into the second machine learning model for training to obtain the second recognition model
  • a machine learning model is used to train and generate the first recognition model and the second recognition model.
  • the first training set and the second training set need to be obtained.
  • the first training set is used to train the first recognition model.
  • the second The training set is used to train the second identification model.
  • the first training set includes the first sample resource for which the resource identification result has been obtained and its posterior information.
  • the second training set includes the second sample resource for which the resource identification result has been obtained and its prior information.
  • some resources can be manually selected and marked whether they are specific types of resources, and then the posterior information or prior information of these resources can be obtained from the existing database to form the first training set and the second training set. It should be emphasized that the first sample resource and the second sample resource may be the same or different, and this disclosure does not limit them.
  • the first machine learning model and the second machine learning model can be trained according to the first training set and the second training set respectively until the first machine learning model The learning model and the second machine learning model converge, thereby obtaining the first recognition model and the second recognition model.
  • the machine learning model can be a neural network model, a decision tree model or a natural language processing model, etc., which can be selected according to actual needs.
  • a first test training set and a second test training set can also be obtained to test the trained first recognition model and the second recognition model.
  • the first test training set can include the first resource to be tested and For its posterior information
  • the second test training set may include the second resource to be tested and its prior information.
  • the first resource to be tested and the second resource to be tested may be the same or different.
  • the test training set tests the first recognition model and the second recognition model, and determines whether the accuracy of the first recognition model and the second recognition model meets the requirements based on the recognition results.
  • the machine learning model is trained according to the first training set and the second training set respectively to obtain the first recognition model and the second recognition model.
  • the first recognition model and the second recognition model can be used for
  • the resource to be identified is identified to determine whether the resource to be identified is a specific type of resource and improve the accuracy of resource identification.
  • FIG. 2 is a schematic flow chart of a resource identification method according to the third embodiment of the present disclosure. As shown in Figure 2, step S102 specifically includes:
  • Step S201 Obtain user information corresponding to the posterior information.
  • the posterior information includes the user's likes, dislikes, comments, shares, and reports of the identified resource, it is possible to determine the corresponding user for each like, dislike, comment, sharing, and reporting behavior. , and then obtain the user information of each user based on each user's unique identifier.
  • all user information related to the posterior information is obtained based on the posterior information of the resource to be identified.
  • the user information may include the user's unique identifier, the total number of likes, and total dislikes of the user. number, total number of comments, total number of shares, total number of reports, etc.
  • Step S202 Identify abnormal user information in the user information according to the user identification model.
  • abnormal users are users who often perform negative operations such as clicking on resources, making negative comments, or reporting, that is, “troll users" ".
  • the user recognition model can be generated based on machine learning model training.
  • the machine learning model can be selected according to actual needs.
  • the training method is similar to the first recognition model and the second recognition model, and will not be described again here.
  • Step S203 Delete the posterior information corresponding to the abnormal user information in the posterior information to obtain effective posterior information.
  • the posterior information corresponding to the abnormal user information after obtaining the abnormal user information, it is necessary to delete the posterior information corresponding to the abnormal user information in the posterior information, thereby obtaining effective posterior information.
  • the posterior information corresponding to the abnormal user information cannot accurately and truly reflect whether the user likes the resource to be identified, so the posterior information corresponding to the abnormal user needs to be deleted to obtain effective posterior information.
  • Step S204 Identify the resource to be identified based on the first identification model and valid posterior information, and obtain a first identification result.
  • the resource to be identified can be identified based on the first identification model and effective posterior information, and the first identification result can be obtained.
  • abnormal user information in user information is identified according to the user identification model, and then the posterior information corresponding to the abnormal user information is deleted to obtain effective posterior information. Finally, according to the first identification model and the effective posterior information information to obtain the first recognition result.
  • the posterior information corresponding to the abnormal user information is deleted, and the resources to be identified are identified only based on the effective posterior information, which can further improve the accuracy of resource identification.
  • Figure 3 is a schematic flow chart of a resource identification method according to the fourth embodiment of the present disclosure.
  • the posterior information includes the user's click information and comment information of the resource to be identified
  • the click information includes the user's click information of the resource to be identified.
  • the number of clicks corresponding to different operations and the display information of the resources to be identified.
  • the different operations include likes, dislikes, comments, reports, collections, etc.
  • the display information includes the length of stay, frequency of stays, and completion of playback or browsing of the resources to be identified. etc., step S204 specifically includes:
  • Step S301 Add a first label to the valid comment information according to the first classification model to obtain a first label result.
  • the posterior information includes the user's click information, comment information and reporting information of the resource to be identified. Therefore, the effective posterior information includes the user's effective click information, valid comment information and effective reporting information of the resource to be identified.
  • This embodiment first adds a first label to the effective comment information according to the first classification model to obtain the first label result.
  • the first classification model is generated based on machine learning model training.
  • the machine learning model can be selected according to actual needs.
  • the training method of the first classification model is similar to the first recognition model and the second recognition model, and will not be discussed here. Repeat.
  • Comment information is divided into valid negative comment information and valid general comment information.
  • Step S302 Count the number of valid comment information corresponding to all negative comment tags in the first tag result to obtain the number of valid negative comment information.
  • Step S303 Input the number of valid negative review information and valid point information into the first recognition model, identify the resources to be identified, and obtain the first recognition result.
  • the number of valid comment information corresponding to all negative comment tags in the first tag result can be counted to obtain the number of valid negative comment information, and then the number of valid negative comment information and The effective point spread information is input into the first recognition model, the resources to be identified are identified, and the first recognition result is obtained.
  • the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified.
  • the effective click information and the effective load are The number of review information is input into the first recognition model for identification, which is equivalent to based on the user's positive reviews (including the number of likes, the number of collections, the number of valid general review information, etc.) and the negative reviews (including the number of dislikes, valid negative review information) of the resource to be identified. numbers, etc.) to jointly identify the resource to be identified, thereby obtaining the first identification result.
  • the first identification result is "yes", that is, the resource to be identified is a specific type of resource; if the number of negative evaluations of the resource to be identified is equal to If the ratio of the number of positive reviews is not greater than the preset threshold, the first identification result is "No", that is, the resource to be identified is not a specific type of resource.
  • the preset threshold can be set according to actual needs.
  • the number of valid negative review information is counted, and the number of valid negative review information and valid point information are input into the first identification model for identification, which can improve the accuracy of resource identification.
  • the posterior information includes the user's click information and reporting information of the resource to be identified
  • the click information includes the number of click information corresponding to the user's different operations on the resource to be identified and the display information of the resource to be identified, Different operations include likes, dislikes, comments, reports, collections, etc.
  • the display information includes the length of stay, frequency of stays, playback or browsing completion of the resource to be identified, etc.
  • the number of valid reporting information is counted to obtain the number of valid reporting information; the number of valid reporting information and the valid click information are input into the first identification model, the resources to be identified are identified, and the first identification result is obtained.
  • the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified, and the effective click information and effective reports are The number of information is input into the first identification model for identification, which is equivalent to identifying the resource to be identified based on the user's positive evaluation (including the number of likes and collections) and negative evaluation (including the number of dislikes, the number of valid reporting information, etc.) of the resource to be identified. , thereby obtaining the first recognition result.
  • the first identification result is "yes", that is, the resource to be identified is a specific type of resource; if the number of negative evaluations of the resource to be identified is equal to If the ratio of the number of positive reviews is not greater than the preset threshold, the first identification result is "No", that is, the resource to be identified is not a specific type of resource.
  • the preset threshold can be set according to actual needs.
  • the number of valid reporting information is counted, and the number of valid reporting information and the valid point information are input into the first identification model for identification, which can improve the accuracy of resource identification.
  • Figure 4 is a schematic flow chart of a resource identification method according to the sixth embodiment of the present disclosure.
  • the posterior information includes the user's click information, comment information and report information of the resource to be identified, and the click information includes the user's click information.
  • the number of clicks corresponding to different operations on the resource to be identified and the display information of the resource to be identified.
  • the different operations include likes, dislikes, comments, reports, collections, etc.
  • the display information includes the length of stay, stay frequency and playback of the resource to be identified. Or browsing completion, etc.
  • Step S204 specifically includes:
  • Step S401 Add a first label to the valid comment information according to the first classification model to obtain a first label result.
  • Step S401 is similar to step S301 and will not be described again here.
  • Step S402 Count the number of valid comment information and the number of valid report information corresponding to all negative review tags in the first tag result, and obtain the number of valid negative review information and the number of valid report information.
  • Step S403 Enter the number of valid negative review information, the number of valid report information, and the valid point-of-view information into the first identification model, identify the resources to be identified, and obtain the first identification result.
  • the number of valid comment information and the number of valid report information corresponding to all negative review tags in the first tag result can be counted to obtain the number of valid negative review information and valid report information. number, and then input the number of valid negative review information, the number of valid report information and the valid point-of-view information into the first identification model, identify the resources to be identified, and obtain the first identification result.
  • the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified, and the effective click information, effective negative
  • the number of comment information and the number of valid report information are input into the first identification model for identification, which is equivalent to based on the user's positive evaluation (including the number of likes, the number of collections, the number of valid general comment information, etc.) and negative evaluation (including the number of thumbs down) of the resource to be identified. number, the number of valid reporting information, the number of valid negative review information, etc.) to jointly identify the resources to be identified, thereby obtaining the first identification result.
  • the first identification result is "yes", that is, the resource to be identified is a specific type of resource; if the number of negative evaluations of the resource to be identified is equal to If the ratio of the number of positive reviews is not greater than the preset threshold, the first identification result is "No", that is, the resource to be identified is not a specific type of resource.
  • the preset threshold can be set according to actual needs.
  • the number of valid negative review information and the number of valid report information are counted, and the number of valid negative review information, the number of valid report information, and the valid click information are input into the first identification model for identification, which can further improve resources. Recognition accuracy.
  • FIG. 5 is a schematic flowchart of a resource identification method according to the seventh embodiment of the present disclosure. As shown in Figure 5, step S204 specifically includes:
  • Step S501 Add a first label to the valid comment information according to the first classification model to obtain a first label result.
  • Step S502 Count the number of valid comment information corresponding to different negative comment labels in the first label result to obtain the first statistical result.
  • its first label when training the first classification model, its first label can be set as article fabrication, title flaw, content fabrication, title incompatibility, average, excellent, etc., where article fabrication, title flaw, content Negative labels such as fabrication and title inconsistency can be classified as negative review labels; after adding the first label to the effective review information according to the first classification model, the obtained first label result will classify the effective review information into article fabrication and title falsification.
  • Step S503 Add a second label to the valid reporting information according to the second classification model to obtain a second label result.
  • Step S504 Count the number of valid reporting information corresponding to different second tags in the second tag result to obtain a second statistical result.
  • its second label when training the second classification model, its second label can be set to fabricated articles, flawed titles, low-quality titles, inconsistent titles, etc.; when adding a third label to the effective reporting information according to the second classification model After the second tag, the obtained second tag results will be divided into multiple categories such as fabricated articles, inaccurate titles, low-quality titles, inconsistent titles, etc., and then count the valid reports corresponding to different second tags in the second tag results.
  • the number of information for example, the number of valid reporting information corresponding to the article fabrication tag, the number of valid reporting information corresponding to the false title tag, the number of valid reporting information corresponding to the low-quality title tag, etc., thereby obtaining the second statistical result.
  • the second classification model is generated based on machine learning model training.
  • the machine learning model can be selected according to actual needs.
  • the training method of the second classification model is similar to the first recognition model and the second recognition model, and will not be discussed here. Repeat.
  • Step S505 Input the first statistical result, the second statistical result and the valid spread information into the first recognition model, identify the resources to be identified, and obtain the first recognition result.
  • the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified.
  • the first statistical result, the second statistical result The results and valid click information are input into the first recognition model for identification, which is equivalent to based on the user's positive evaluation of the resource to be identified (including the number of likes, the number of collections, the number of valid general comment information, etc.) and the negative evaluation (including the number of dislikes, The number of valid reporting information, the number of valid negative review information, etc.) are jointly used to identify the resources to be identified, thereby obtaining the first identification result.
  • the first identification result is "yes", that is, the resource to be identified is a specific type of resource.
  • the first identification result may also include The reason why the resource to be identified is a specific type of resource, such as the title does not match or the title is exaggerated; if the ratio of the number of negative evaluations to the number of positive evaluations of the resource to be identified is not greater than the preset threshold, the first identification result is "No", that is The resources to be identified are not resources of a specific type, and the preset threshold can be set according to actual needs.
  • the number of valid comment information corresponding to different negative review tags and the number of valid report information corresponding to different second tags are counted to obtain the first statistical result and the second statistical result, and the first statistical result is The result, the second statistical result and the effective point spread information are input into the first identification model for identification, and the first identification result is obtained, which can further improve the accuracy of resource identification.
  • the second recognition model may be a model generated based on natural language processing model training.
  • Step S103 specifically includes:
  • the prior information is segmented and converted into a vector matrix; based on the vector matrix and the second recognition model, the resources to be identified are identified to obtain the second recognition result.
  • the prior information can be used as the semantic feature of the resource to be identified.
  • the prior information is segmented and converted into a vector matrix, that is, the prior information is converted into computer-recognizable data, and then based on the vector matrix and the third
  • the second identification model identifies the resource to be identified and obtains the second identification result.
  • the natural language processing model used to train the second recognition model can be a long short-term memory model (LSTM, Long-short term memory), a Transformer model, a BERT model (Bidirectional Encoder Representation from Transformers), etc., this There are no public restrictions on the selection of natural language processing models.
  • the prior information can be manually segmented, or a word segmentation tool can be used to segment the prior information.
  • a model such as the Word2Vec model or the GloVe model for generating word vectors can be used to segment the prior information. Convert empirical information into a vector matrix.
  • the eighth embodiment of the present disclosure uses natural language processing model training to generate a second recognition model, identifies the prior information of the resources to be identified according to the second recognition model, and obtains the second recognition result, so that the first recognition result and the second recognition result can be subsequently combined.
  • the identification results determine whether the resource to be identified is a specific type of resource, thereby improving the accuracy of resource identification.
  • step S104 specifically includes:
  • a third recognition result is generated to indicate that the resource to be identified belongs to a specific type of resource; otherwise, the generated result is used to indicate that the resource to be identified belongs to a specific type of resource.
  • the third identification result is that the resource does not belong to a specific type of resource.
  • the first identification result reflects whether the resource to be identified is a specific type of resource from the user's perspective
  • the second identification result reflects whether the resource to be identified is a specific type of resource from a semantic perspective.
  • the resource to be identified can be considered to be a resource of a specific type; if both the first identification result and the second identification result show that the resource to be identified is not a resource of a specific type, It is considered that the resource to be identified is not a specific type of resource.
  • the ninth embodiment of the present disclosure combines the first recognition result and the second recognition result to determine whether the resource to be identified is a specific type of resource, which can improve the accuracy of title recognition of the resource to be identified.
  • the posterior information is the user's feedback information on the video resource, such as the number of likes, the number of dislikes, and the number of comments. , number of reports, number of collections, number of blocks, etc.
  • the posterior information includes the user’s click information, comment information and report information of the resource to be identified.
  • the prior information is the semantic information of the video resource, such as title text and subtitles, etc.; then Delete the posterior information corresponding to the abnormal user information in the posterior information to obtain effective posterior information, and input the effective posterior information into the first recognition model for identification to obtain the first recognition result.
  • the first recognition result can reflect the video Whether the resource is a specific type of resource, of course, before entering the effective posterior information into the first identification model, you can also add labels to the effective negative review information and effective report information; then enter the prior information into the second identification model for identification, and get
  • the second recognition result can also reflect whether the video resource is a specific type of resource; if one of the first recognition result and the second recognition result shows that the video resource is a specific type of resource, a third recognition result can be obtained
  • the result is that the video resource is a specific type of resource, such as a clickbait resource, etc.
  • Figure 6 is a schematic structural diagram of a resource identification device according to the tenth embodiment of the present disclosure. As shown in Figure 6, the device mainly includes:
  • the first acquisition module 60 is used to obtain posterior information and prior information of the resource to be identified, the posterior information is used to reflect the user's feedback information of the resource to be identified, and the prior information is used to reflect the semantic information of the resource to be identified;
  • first The identification module 61 is used to identify the resource to be identified based on the first identification model and a posteriori information, and obtain the first identification result;
  • the second identification module 62 is used to identify the resource to be identified based on the second identification model and a priori information. Recognize to obtain a second recognition result;
  • the generation module 63 is configured to generate a third recognition result based on the first recognition result and the second recognition result.
  • the device further includes:
  • the second acquisition module is used to acquire the first training set and the second training set.
  • the first training set includes the resources for which the resource identification results have been obtained and their posterior information.
  • the second training set includes the resources for which the resource identification results have been obtained and their posterior information.
  • the first identification module 61 mainly includes:
  • the acquisition sub-module is used to obtain the user information corresponding to the posterior information; the user identification sub-module is used to identify abnormal user information in the user information according to the user identification model; the delete sub-module is used to delete the abnormal user information in the posterior information
  • the corresponding posterior information is used to obtain effective posterior information; the first identification submodule is used to identify the resource to be identified based on the first identification model and the effective posterior information, and obtain the first identification result.
  • the posterior information includes the user's click information, comment information and report information of the resource to be identified, and the click information includes the number of clicks corresponding to the user's different operations on the resource to be identified and the display information of the resource to be identified.
  • the first identification sub-module is used to: add a first label to the valid comment information according to the first classification model to obtain the first label result; count the number of valid comment information corresponding to all negative review labels in the first label result to obtain The number of valid negative review information; input the number of valid negative review information and valid point information into the first identification model, identify the resource to be identified, and obtain the first identification result.
  • the first identification sub-module is also used to: count the number of valid reporting information to obtain the number of valid reporting information; input the number of valid reporting information and the effective click information into the first identification model to identify the resources to be identified , get the first recognition result.
  • the first identification sub-module is also used to: add a first label to the valid review information according to the first classification model to obtain a first label result; and count the number of negative review labels corresponding to all negative review labels in the first label result.
  • the number of valid comment information and the number of valid report information are used to obtain the number of valid negative review information and the number of valid report information; the number of valid negative review information, the number of valid report information and the effective click information are input into the first identification model, and the resources to be identified are processed Recognize and obtain the first recognition result.
  • the first identification sub-module is also used to: add a first tag to the valid comment information according to the first classification model to obtain a first tag result; count the valid negative comment tags corresponding to the first tag result.
  • the number of comment information is used to obtain the first statistical result; according to the second classification model, a second label is added to the effective reporting information to obtain the second label result; the number of valid reporting information corresponding to different second labels in the second label result is counted, Obtain the second statistical result; input the first statistical result, the second statistical result and the effective point spread information into the first recognition model, identify the resources to be identified, and obtain the first recognition result.
  • the second identification module 62 mainly includes:
  • the conversion submodule is used to perform word segmentation processing on the prior information and convert it into a vector matrix; the second identification submodule is used to identify the resources to be identified based on the vector matrix and the second identification model, and obtain the second identification result.
  • the generation module 63 mainly includes:
  • the first generation submodule is used to generate a third identification result used to represent that the resource to be identified belongs to a specific type of resource if any of the first identification result and the second identification result indicates that the resource to be identified belongs to a specific type of resource;
  • the second generation submodule is used to generate a third identification result that indicates that the resource to be identified does not belong to a specific type of resource.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure.
  • Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 700 includes a computing unit 701 that can execute according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 Various appropriate actions and treatments. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored.
  • Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704.
  • An input/output (I/O) interface 705 is also connected to bus 704.
  • the I/O interface 705 includes: an input unit 706, such as a keyboard, a mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; a storage unit 708, such as a magnetic disk, optical disk, etc. ; and communication unit 709, such as a network card, modem, wireless communication transceiver, etc.
  • the communication unit 709 allows the device 700 to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.
  • Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit 701 performs various methods and processes described above, such as a resource identification method.
  • a resource identification method may be implemented as a computer software program that is tangibly embodied in a machine-readable medium, such as storage unit 708.
  • part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709.
  • the computer program When the computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of a resource identification method described above may be performed.
  • computing unit 701 may be configured to perform a resource identification method in any other suitable manner (eg, by means of firmware).
  • Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system
  • CPLD complex programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor
  • the processor which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • Computer systems may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact over a communications network.
  • the relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
  • the server can be a cloud server, a distributed system server, or a server combined with a blockchain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to the technical field of computers, and in particular to the technical field of artificial intelligence. Provided are a resource recognition method and apparatus, and a device and a storage medium. The specific implementation scheme involves: acquiring posterior information and prior information of a resource to be recognized, wherein the posterior information is used for reflecting feedback information of a user for said resource, and the prior information is used for reflecting semantic information of said resource; recognizing said resource according to a first recognition model and the posterior information, so as to obtain a first recognition result; recognizing said resource according to a second recognition model and the prior information, so as to obtain a second recognition result; and generating a third recognition result according to the first recognition result and the second recognition result. By means of the resource recognition method and apparatus, and the device and the storage medium provided in the present disclosure, the accuracy of resource recognition can be improved.

Description

一种资源识别方法、装置、设备以及存储介质A resource identification method, device, equipment and storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年06月16日提交的中国专利申请202210694398.8的优先权,其全部内容通过引用整体结合在本申请中。This application claims priority from Chinese patent application 202210694398.8 submitted on June 16, 2022, the entire content of which is incorporated into this application by reference.
技术领域Technical field
本公开涉及计算机技术领域,尤其涉及人工智能技术领域的一种资源识别方法、装置、设备以及存储介质。The present disclosure relates to the field of computer technology, and in particular to a resource identification method, device, equipment and storage medium in the field of artificial intelligence technology.
背景技术Background technique
在当今移动互联网信息***时代,人们对于信息的获取频次爆发式增长,在信息供给远大于求的状况下,只有在短时间内吸引到用户的注意力,才能获得流量和成功。因此,很多自媒体资源的创作者开始利用虚假、夸大和歪曲等手法来制作资源标题,但这些标题低质的资源会严重消耗用户的好奇心,影响用户的阅读体验,使用户对资源供给侧媒体或平台的信任感降低。In today's era of mobile Internet information explosion, people's frequency of obtaining information has exploded. When the supply of information far exceeds demand, only by attracting users' attention in a short period of time can we gain traffic and success. Therefore, many self-media resource creators have begun to use falsehood, exaggeration, distortion and other techniques to create resource titles. However, these low-quality resources will seriously consume users’ curiosity, affect users’ reading experience, and make users doubt the resource supply side. Reduced trust in media or platforms.
目前各大媒体或平台对资源的识别方法主要为:基于待识别资源的文本语义特征和图像特征等先验信息进行资源识别,从而判断待识别资源是否为低质标题资源。At present, the main methods for identifying resources in major media or platforms are: resource identification based on a priori information such as text semantic features and image features of the resources to be identified, so as to determine whether the resources to be identified are low-quality title resources.
发明内容Contents of the invention
本公开提供了一种准确率更高的资源识别方法、装置、设备以及存储介质。The present disclosure provides a resource identification method, device, equipment and storage medium with higher accuracy.
根据本公开的一方面,提供了一种资源识别方法,包括:获取待识别资源的后验信息和先验信息,所述后验信息用于体现用户对所述待识别资源的反馈信息,所述先验信息用于体现所述待识别资源的语义信息;根据第一识别模型和所述后验信息,对所述待识别资源进行识别,得到第一识别结果;根据第二识别模型和所述先验信息,对所述待识别资源进行识别,得到第二识别结果;以及根据所述第一识别结果和第二识别结果,生成第三识别结果。According to one aspect of the present disclosure, a resource identification method is provided, including: obtaining posterior information and a priori information of the resource to be identified, where the posterior information is used to reflect the user's feedback information on the resource to be identified, so The a priori information is used to reflect the semantic information of the resource to be identified; the resource to be identified is identified according to the first identification model and the a posteriori information, and a first identification result is obtained; according to the second identification model and the a posteriori information Use the a priori information to identify the resource to be identified to obtain a second identification result; and generate a third identification result based on the first identification result and the second identification result.
根据本公开的一方面,提供了一种资源识别装置,包括:第一获取模块,用于获取待识别资源的后验信息和先验信息,所述后验信息用于体现用户对所述待识别资源的反馈信息,所述先验信息用于体现所述待识别资源的语义信息;第一识别模块,用于根据第 一识别模型和所述后验信息,对所述待识别资源进行识别,得到第一识别结果;第二识别模块,用于根据第二识别模型和所述先验信息,对所述待识别资源进行识别,得到第二识别结果;以及生成模块,用于根据所述第一识别结果和第二识别结果,生成第三识别结果。According to an aspect of the present disclosure, a resource identification device is provided, including: a first acquisition module for acquiring posterior information and a priori information of the resource to be identified, where the posterior information is used to reflect the user's understanding of the resource to be identified. Feedback information for identifying resources, the a priori information is used to reflect the semantic information of the resources to be identified; the first identification module is used to identify the resources to be identified based on the first identification model and the a posteriori information. , to obtain the first identification result; the second identification module is used to identify the resource to be identified according to the second identification model and the prior information, and obtain the second identification result; and the generation module is used to identify the resource according to the second identification model and the prior information. The first recognition result and the second recognition result generate a third recognition result.
根据本公开的一方面,提供了一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行本公开所述的方法。According to an aspect of the present disclosure, an electronic device is provided, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be executed by the at least one processor. The instructions are executed by the at least one processor to enable the at least one processor to execute the method described in the present disclosure.
根据本公开的一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使计算机执行本公开所述的方法。According to an aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to perform the method described in the present disclosure.
根据本公开的一方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现本公开所述的方法。According to an aspect of the present disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the method described in the present disclosure.
本公开提供的一种资源识别方法、装置、设备及存储介质,结合后验信息与先验信息识别待识别资源是否为特定类型资源,提高了资源识别的准确率和用户体验感。The present disclosure provides a resource identification method, device, equipment and storage medium that combines a posteriori information and a priori information to identify whether the resource to be identified is a specific type of resource, thereby improving the accuracy of resource identification and user experience.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.
附图说明Description of the drawings
附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used to better understand the present solution and do not constitute a limitation of the present disclosure. in:
图1是根据本公开第一实施例的一种资源识别方法的流程示意图;Figure 1 is a schematic flowchart of a resource identification method according to the first embodiment of the present disclosure;
图2是根据本公开第三实施例的一种资源识别方法的流程示意图;Figure 2 is a schematic flowchart of a resource identification method according to the third embodiment of the present disclosure;
图3是根据本公开第四实施例的一种资源识别方法的流程示意图;Figure 3 is a schematic flowchart of a resource identification method according to the fourth embodiment of the present disclosure;
图4是根据本公开第六实施例的一种资源识别方法的流程示意图;Figure 4 is a schematic flowchart of a resource identification method according to the sixth embodiment of the present disclosure;
图5是根据本公开第七实施例的一种资源识别方法的流程示意图;Figure 5 is a schematic flowchart of a resource identification method according to the seventh embodiment of the present disclosure;
图6是根据本公开第十实施例的一种资源识别装置的结构示意图;以及Figure 6 is a schematic structural diagram of a resource identification device according to the tenth embodiment of the present disclosure; and
图7是用来实现本公开实施例的一种资源识别方法的电子设备的框图。Figure 7 is a block diagram of an electronic device used to implement a resource identification method according to an embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识 到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding and should be considered to be exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.
图1是根据本公开第一实施例的一种资源识别方法的流程示意图,如图1所示,该方法主要包括:Figure 1 is a schematic flow chart of a resource identification method according to the first embodiment of the present disclosure. As shown in Figure 1, the method mainly includes:
步骤S101,获取待识别资源的后验信息和先验信息,后验信息用于体现用户对待识别资源的反馈信息,先验信息用于体现待识别资源的语义信息。Step S101: Obtain posterior information and prior information of the resource to be identified. The posterior information is used to reflect the user's feedback information of the resource to be identified, and the prior information is used to reflect the semantic information of the resource to be identified.
在本实施例中,首先需要获取待识别资源的后验信息和先验信息,待识别资源可以为在自媒体平台上发布的文章、视频或图集等,后验信息用于体现用户对待识别资源的反馈信息,后验信息可以体现出用户是否喜欢待识别资源,具体地,后验信息可以包括用户对待识别资源的点赞、点踩、评论、分享和举报等信息;先验信息用于体现待识别资源的语义信息,具体地,先验信息可以包括待识别资源的标题文本、内容图文和字幕等信息。In this embodiment, it is first necessary to obtain the posterior information and prior information of the resources to be identified. The resources to be identified can be articles, videos, or photo albums published on self-media platforms. The posterior information is used to reflect the user's needs to be identified. The feedback information of the resource, the posterior information can reflect whether the user likes the resource to be identified. Specifically, the posterior information can include the user's likes, dislikes, comments, shares and reports of the resource to be identified; the prior information is used for Reflects the semantic information of the resource to be identified. Specifically, the prior information may include information such as title text, content graphics and subtitles of the resource to be identified.
在一可实施方式中,待识别资源的后验信息和先验信息可以存储在待识别资源所属自媒体平台的后台数据库中,可以根据待识别资源的唯一标识符从后台数据库中获取待识别资源的后验信息和先验信息。具体地,待识别资源的唯一标识符可以为待识别资源的资源号,利用待识别资源的资源号从后台数据库中查找并提取出待识别资源的后验信息和先验信息。In one implementation, the posterior information and prior information of the resource to be identified can be stored in the backend database of the self-media platform to which the resource to be identified belongs, and the resource to be identified can be obtained from the backend database based on the unique identifier of the resource to be identified. posterior information and prior information. Specifically, the unique identifier of the resource to be identified may be the resource number of the resource to be identified, and the resource number of the resource to be identified is used to search and extract the posterior information and a priori information of the resource to be identified from the background database.
步骤S102,根据第一识别模型和后验信息,对待识别资源进行识别,得到第一识别结果。Step S102: Identify the resource to be identified based on the first identification model and posterior information, and obtain a first identification result.
在本实施例中,获取待识别资源的后验信息和先验信息之后,需要根据第一识别模型和后验信息,对待识别资源进行识别,得到第一识别结果。In this embodiment, after obtaining the posterior information and prior information of the resource to be identified, it is necessary to identify the resource to be identified based on the first identification model and the posterior information to obtain the first identification result.
在一可实施方式中,第一识别模型可以基于机器学习模型训练生成,机器学习模型可以为神经网络模型、决策树模型或支持向量机模型等,可以根据实际需求选择合适的机器学习模型来训练第一识别模型,本公开不对第一识别模型进行限定。In an implementation, the first recognition model can be generated based on machine learning model training. The machine learning model can be a neural network model, a decision tree model, a support vector machine model, etc. An appropriate machine learning model can be selected for training according to actual needs. The first recognition model, this disclosure does not limit the first recognition model.
在一可实施方式中,因为后验信息用于体现用户对待识别资源的反馈信息,所以利用第一识别模型对后验信息进行识别,得到的第一识别结果可以从用户的角度体现出待识别资源的标题与内容是否匹配,即标题是否虚假、夸大或歪曲等。具体地,第一识别结果可以为“是”或“否”,分别用于显示待识别资源为特定类型资源和待识别资源不为特定类型资源,第一识别结果还可以包含与第一识别结果对应的置信度,若待识别资 源为特定类型资源时,第一识别结果还可以包含待识别资源为特定类型资源的原因,比如题文不符或标题夸大等。In an implementation, since the posterior information is used to reflect the user's feedback information of the resource to be identified, the first recognition model is used to identify the posterior information, and the obtained first recognition result can reflect the resource to be identified from the user's perspective. Whether the title of the resource matches the content, that is, whether the title is false, exaggerated, or distorted, etc. Specifically, the first identification result may be "yes" or "no", respectively used to display that the resource to be identified is a specific type of resource and that the resource to be identified is not a specific type of resource. The first identification result may also include the first identification result. Corresponding confidence level, if the resource to be identified is a resource of a specific type, the first identification result may also include the reason why the resource to be identified is a resource of a specific type, such as the title does not match the title or the title is exaggerated.
步骤S103,根据第二识别模型和先验信息,对待识别资源进行识别,得到第二识别结果。Step S103: Identify the resource to be identified based on the second identification model and a priori information, and obtain a second identification result.
在本实施例中,获取待识别资源的后验信息和先验信息之后,还需要根据第二识别模型和先验信息,对待识别资源进行识别,得到第二识别结果。In this embodiment, after obtaining the posterior information and prior information of the resource to be identified, it is also necessary to identify the resource to be identified based on the second identification model and the prior information to obtain the second identification result.
在一可实施方式中,第二识别模型也可以基于机器学习模型训练生成,机器学习模型可以为神经网络模型、决策树模型或支持向量机模型等,可以根据实际需求选择合适的机器学习模型来训练第二识别模型,本公开不对第二识别模型进行限定。In an implementation, the second recognition model can also be generated based on machine learning model training. The machine learning model can be a neural network model, a decision tree model, a support vector machine model, etc. An appropriate machine learning model can be selected according to actual needs. The second recognition model is trained. This disclosure does not limit the second recognition model.
在一可实施方式中,因为先验信息用于体现待识别资源的语义信息,所以利用第二识别模型对先验信息进行识别,得到的第二识别结果可以体现出待识别资源的标题是否存在错字、漏字、语句不通以及标题语义与内容语义是否存在交叉等。具体地,第二识别结果可以为“是”或“否”,分别用于显示待识别资源为特定类型资源和待识别资源不为特定类型资源,第二识别结果还可以包含与第二识别结果对应的置信度,若待识别资源为特定类型资源时,第二识别结果还可以包含待识别资源为特定类型资源的原因,比如标题错字或题文不符等。In an implementation, since the prior information is used to reflect the semantic information of the resource to be identified, the second recognition model is used to identify the prior information, and the obtained second recognition result can reflect whether the title of the resource to be identified exists. Typos, missing words, unclear sentences, and whether there is any overlap between the semantics of the title and the semantics of the content, etc. Specifically, the second identification result may be "yes" or "no", respectively used to display that the resource to be identified is a specific type of resource and that the resource to be identified is not a specific type of resource. The second identification result may also include the second identification result. According to the corresponding confidence level, if the resource to be identified is a resource of a specific type, the second identification result may also include the reason why the resource to be identified is a resource of a specific type, such as a typo in the title or inconsistent inscription.
步骤S104,根据第一识别结果和第二识别结果,生成第三识别结果。Step S104: Generate a third recognition result based on the first recognition result and the second recognition result.
在本实施例中,得到第一识别结果和第二识别结果之后,可以结合第一识别结果和第二识别结果来生成第三识别结果,第三识别结果可以用于体现待识别资源是否属于特定类型资源,例如待识别资源是否为标题党资源。In this embodiment, after obtaining the first recognition result and the second recognition result, the first recognition result and the second recognition result can be combined to generate a third recognition result. The third recognition result can be used to reflect whether the resource to be identified belongs to a specific Type of resource, for example, whether the resource to be identified is a clickbait resource.
在一可实施方式中,第一识别结果从用户角度来体现待识别资源是否为特定类型资源,第二识别结果从语义角度来体现待识别资源是否为特定类型资源,也就是说,只要第一识别结果和第二识别结果中有一个显示待识别资源为特定类型资源,则可以认为待识别资源为特定类型资源;若第一识别结果和第二识别结果均显示待识别资源不为特定类型资源,则认为待识别资源不为特定类型资源。In one implementation, the first identification result reflects whether the resource to be identified is a specific type of resource from a user perspective, and the second identification result reflects whether the resource to be identified is a specific type of resource from a semantic perspective. That is to say, as long as the first If one of the identification results and the second identification result shows that the resource to be identified is a resource of a specific type, the resource to be identified can be considered to be a resource of a specific type; if both the first identification result and the second identification result indicate that the resource to be identified is not a resource of a specific type , it is considered that the resource to be identified is not a specific type of resource.
在本公开第一实施例中,利用第一识别模型和后验信息得到第一识别结果,利用第二识别模型和先验信息得到第二识别结果,并根据第一识别模型和第二识别模型来判断待识别资源是否为特定类型资源。本实施例提供的资源识别方法,不仅结合了与待识别资源语义相关的先验信息,还结合了与用户相关的后验信息,从而可以得到与用户认知 相符的识别结果,可以弥补只利用先验信息进行资源识别时,对资源标题夸大、虚假等问题的识别能力不足的问题,进一步提高资源识别的准确率。In the first embodiment of the present disclosure, the first recognition result is obtained by using the first recognition model and a posteriori information, the second recognition result is obtained by using the second recognition model and a priori information, and based on the first recognition model and the second recognition model To determine whether the resource to be identified is a specific type of resource. The resource identification method provided by this embodiment not only combines a priori information related to the semantics of the resource to be identified, but also combines a posteriori information related to the user, so that identification results consistent with the user's cognition can be obtained, which can make up for the use of only When using prior information for resource identification, the problem of insufficient identification ability of resource titles such as exaggeration and falsehood can further improve the accuracy of resource identification.
在本公开第二实施例中,第一识别模型和第二识别模型通过以下方式获得:In the second embodiment of the present disclosure, the first recognition model and the second recognition model are obtained in the following manner:
获取第一训练集和第二训练集,第一训练集包括已得到资源识别结果的第一样本资源及其后验信息,第二训练集包括已得到资源识别结果的第二样本资源及其先验信息;Obtain a first training set and a second training set. The first training set includes the first sample resource for which the resource identification result has been obtained and its posterior information. The second training set includes the second sample resource for which the resource identification result has been obtained and its posterior information. Prior Information;
将第一训练集输入第一机器学习模型进行训练得到第一识别模型,将第二训练集输入第二机器学习模型进行训练得到第二识别模型。The first training set is input into the first machine learning model for training to obtain the first recognition model, and the second training set is input into the second machine learning model for training to obtain the second recognition model.
在本实施例中,利用机器学习模型来训练生成第一识别模型和第二识别模型,首先需要获取第一训练集和第二训练集,第一训练集用于训练第一识别模型,第二训练集用于训练第二识别模型,第一训练集包括已得到资源识别结果的第一样本资源及其后验信息,第二训练集包括已得到资源识别结果的第二样本资源及其先验信息,之后将第一训练集输入第一机器学习模型进行训练得到第一识别模型,将第二训练集输入第二机器学习模型进行训练得到第二识别模型。In this embodiment, a machine learning model is used to train and generate the first recognition model and the second recognition model. First, the first training set and the second training set need to be obtained. The first training set is used to train the first recognition model. The second The training set is used to train the second identification model. The first training set includes the first sample resource for which the resource identification result has been obtained and its posterior information. The second training set includes the second sample resource for which the resource identification result has been obtained and its prior information. After that, the first training set is input into the first machine learning model for training to obtain the first recognition model, and the second training set is input into the second machine learning model for training to obtain the second recognition model.
在一可实施方式中,可以人工选取一些资源并标注其是否为特定类型资源,然后从现有数据库中获取这些资源的后验信息或先验信息,组成第一训练集和第二训练集,需要强调的是,第一样本资源和第二样本资源可以相同也可以不同,本公开并不对其进行限定。In an implementation, some resources can be manually selected and marked whether they are specific types of resources, and then the posterior information or prior information of these resources can be obtained from the existing database to form the first training set and the second training set. It should be emphasized that the first sample resource and the second sample resource may be the same or different, and this disclosure does not limit them.
在一可实施方式中,得到第一训练集和第二训练集之后,可以根据第一训练集和第二训练集分别对第一机器学习模型和第二机器学习模型进行训练,直到第一机器学习模型和第二机器学习模型收敛,从而得到第一识别模型和第二识别模型。具体地,机器学习模型可以为神经网络模型、决策树模型或自然语言处理模型等,可以根据实际需要进行选取。In an implementation manner, after obtaining the first training set and the second training set, the first machine learning model and the second machine learning model can be trained according to the first training set and the second training set respectively until the first machine learning model The learning model and the second machine learning model converge, thereby obtaining the first recognition model and the second recognition model. Specifically, the machine learning model can be a neural network model, a decision tree model or a natural language processing model, etc., which can be selected according to actual needs.
在一可实施方式中,还可以获取第一测试训练集和第二测试训练集对训练好的第一识别模型和第二识别模型进行测试,第一测试训练集中可以包括第一待测试资源及其后验信息,第二测试训练集中可以包括第二待测试资源及其先验信息,第一待测试资源和第二待测试资源可以相同也可以不同,分别根据第一测试训练集和第二测试训练集对第一识别模型和第二识别模型进行测试,根据识别结果判断第一识别模型和第二识别模型的准确率是否满足需求。In an implementation manner, a first test training set and a second test training set can also be obtained to test the trained first recognition model and the second recognition model. The first test training set can include the first resource to be tested and For its posterior information, the second test training set may include the second resource to be tested and its prior information. The first resource to be tested and the second resource to be tested may be the same or different. According to the first test training set and the second resource to be tested respectively, The test training set tests the first recognition model and the second recognition model, and determines whether the accuracy of the first recognition model and the second recognition model meets the requirements based on the recognition results.
在本公开第二实施例中,分别根据第一训练集和第二训练集对机器学习模型进行训练,得到第一识别模型和第二识别模型,第一识别模型和第二识别模型可以用于对待识别资源进行识别,从而判断待识别资源是否为特定类型资源,提高资源识别的准确率。In the second embodiment of the present disclosure, the machine learning model is trained according to the first training set and the second training set respectively to obtain the first recognition model and the second recognition model. The first recognition model and the second recognition model can be used for The resource to be identified is identified to determine whether the resource to be identified is a specific type of resource and improve the accuracy of resource identification.
图2是根据本公开第三实施例的一种资源识别方法的流程示意图,如图2所示,步骤S102具体包括:Figure 2 is a schematic flow chart of a resource identification method according to the third embodiment of the present disclosure. As shown in Figure 2, step S102 specifically includes:
步骤S201,获取后验信息对应的用户信息。Step S201: Obtain user information corresponding to the posterior information.
在本实施例中,首先需要获取后验信息对应的用户信息。因为后验信息中包括用户对待识别资源的点赞、点踩、评论、分享和举报等信息,因此可以确定每个点赞行为、点踩行为、评论行为、分享行为和举报行为等对应的用户,然后根据每个用户的唯一标识符获取每个用户的用户信息。In this embodiment, it is first necessary to obtain the user information corresponding to the posterior information. Because the posterior information includes the user's likes, dislikes, comments, shares, and reports of the identified resource, it is possible to determine the corresponding user for each like, dislike, comment, sharing, and reporting behavior. , and then obtain the user information of each user based on each user's unique identifier.
在一可实施方式中,根据待识别资源的后验信息,获取所有与后验信息相关的用户信息,用户信息中可以包含该用户的唯一标识符、该用户的总点赞数、总点踩数、总评论数、总分享数、总举报数等。In one possible implementation, all user information related to the posterior information is obtained based on the posterior information of the resource to be identified. The user information may include the user's unique identifier, the total number of likes, and total dislikes of the user. number, total number of comments, total number of shares, total number of reports, etc.
步骤S202,根据用户识别模型识别用户信息中的异常用户信息。Step S202: Identify abnormal user information in the user information according to the user identification model.
在本实施例中,获取用户信息之后,需要根据用户识别模型识别用户信息中的异常用户信息,异常用户为经常对资源进行点踩、负面评论或举报等负面操作的用户,即“喷子用户”。In this embodiment, after obtaining the user information, it is necessary to identify abnormal user information in the user information based on the user identification model. Abnormal users are users who often perform negative operations such as clicking on resources, making negative comments, or reporting, that is, "troll users" ".
在一可实施方式中,用户识别模型可以基于机器学习模型训练生成,机器学习模型可以根据实际需求进行选择,训练方法与第一识别模型和第二识别模型类似,此处不再赘述。将与后验信息相关的用户信息输入至用户识别模型进行识别,即可得到异常用户信息。In one implementation, the user recognition model can be generated based on machine learning model training. The machine learning model can be selected according to actual needs. The training method is similar to the first recognition model and the second recognition model, and will not be described again here. By inputting the user information related to the posterior information into the user identification model for identification, abnormal user information can be obtained.
步骤S203,删除后验信息中与异常用户信息对应的后验信息,得到有效后验信息。Step S203: Delete the posterior information corresponding to the abnormal user information in the posterior information to obtain effective posterior information.
在本实施例中,得到异常用户信息之后,需要删除后验信息中与异常用户信息对应的后验信息,从而得到有效后验信息。具体地,与异常用户信息对应的后验信息并不能准确真实的体现出用户是否喜欢待识别资源,因此需要将异常用户对应的后验信息删除,得到有效后验信息。In this embodiment, after obtaining the abnormal user information, it is necessary to delete the posterior information corresponding to the abnormal user information in the posterior information, thereby obtaining effective posterior information. Specifically, the posterior information corresponding to the abnormal user information cannot accurately and truly reflect whether the user likes the resource to be identified, so the posterior information corresponding to the abnormal user needs to be deleted to obtain effective posterior information.
步骤S204,根据第一识别模型和有效后验信息,对待识别资源进行识别,得到第一识别结果。Step S204: Identify the resource to be identified based on the first identification model and valid posterior information, and obtain a first identification result.
在本实施例中,可以根据第一识别模型和有效后验信息,对待识别资源进行识别,得到第一识别结果。In this embodiment, the resource to be identified can be identified based on the first identification model and effective posterior information, and the first identification result can be obtained.
在本公开第三实施例中,根据用户识别模型识别用户信息中的异常用户信息,然后将异常用户信息对应的后验信息删除,得到有效后验信息,最后根据第一识别模型和有效后验信息,得到第一识别结果。在本实施例中,将异常用户信息对应的后验信息删除,只根据有效后验信息对待识别资源进行识别,可以进一步提高资源识别的准确率。In the third embodiment of the present disclosure, abnormal user information in user information is identified according to the user identification model, and then the posterior information corresponding to the abnormal user information is deleted to obtain effective posterior information. Finally, according to the first identification model and the effective posterior information information to obtain the first recognition result. In this embodiment, the posterior information corresponding to the abnormal user information is deleted, and the resources to be identified are identified only based on the effective posterior information, which can further improve the accuracy of resource identification.
图3是根据本公开第四实施例的一种资源识别方法的流程示意图,如图3所示,后验信息包括用户对待识别资源的点展信息和评论信息,点展信息包括用户对待识别资源的不同操作所对应的点击信息数和待识别资源的展示信息,不同操作包括点赞、点踩、评论、举报、收藏等,展示信息包括待识别资源的停留时长、停留频次和播放或浏览完成度等,步骤S204具体包括:Figure 3 is a schematic flow chart of a resource identification method according to the fourth embodiment of the present disclosure. As shown in Figure 3, the posterior information includes the user's click information and comment information of the resource to be identified, and the click information includes the user's click information of the resource to be identified. The number of clicks corresponding to different operations and the display information of the resources to be identified. The different operations include likes, dislikes, comments, reports, collections, etc. The display information includes the length of stay, frequency of stays, and completion of playback or browsing of the resources to be identified. etc., step S204 specifically includes:
步骤S301,根据第一分类模型,对有效评论信息添加第一标签,得到第一标签结果。Step S301: Add a first label to the valid comment information according to the first classification model to obtain a first label result.
在本实施例中,后验信息包括用户对待识别资源的点展信息、评论信息和举报信息,因此,有效后验信息包括用户对待识别资源的有效点展信息、有效评论信息和有效举报信息,本实施例首先根据第一分类模型,对有效评论信息添加第一标签,得到第一标签结果。In this embodiment, the posterior information includes the user's click information, comment information and reporting information of the resource to be identified. Therefore, the effective posterior information includes the user's effective click information, valid comment information and effective reporting information of the resource to be identified. This embodiment first adds a first label to the effective comment information according to the first classification model to obtain the first label result.
在一可实施方式中,第一分类模型基于机器学习模型训练生成,机器学习模型可以根据实际需求选取,第一分类模型的训练方法与第一识别模型和第二识别模型类似,在此不再赘述。在训练第一分类模型时,可以将其第一标签只设置为负评标签和一般评论标签,在根据第一分类模型对有效评论信息添加第一标签之后,得到的第一标签结果就将有效评论信息分为有效负评信息和有效一般评论信息。In an implementation, the first classification model is generated based on machine learning model training. The machine learning model can be selected according to actual needs. The training method of the first classification model is similar to the first recognition model and the second recognition model, and will not be discussed here. Repeat. When training the first classification model, you can set its first label to only negative review labels and general comment labels. After adding the first label to the valid review information according to the first classification model, the obtained first label result will be valid. Comment information is divided into valid negative comment information and valid general comment information.
步骤S302,统计第一标签结果中所有负评标签所对应的有效评论信息的数量,得到有效负评信息数。Step S302: Count the number of valid comment information corresponding to all negative comment tags in the first tag result to obtain the number of valid negative comment information.
步骤S303,将有效负评信息数和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。Step S303: Input the number of valid negative review information and valid point information into the first recognition model, identify the resources to be identified, and obtain the first recognition result.
在本实施例中,得到第一标签结果之后,可以对第一标签结果中所有负评标签所对应的有效评论信息的数量进行统计,得到有效负评信息数,然后将有效负评信息数和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。In this embodiment, after the first tag result is obtained, the number of valid comment information corresponding to all negative comment tags in the first tag result can be counted to obtain the number of valid negative comment information, and then the number of valid negative comment information and The effective point spread information is input into the first recognition model, the resources to be identified are identified, and the first recognition result is obtained.
在一可实施方式中,有效点展信息中的点击信息数包括待识别资源的点赞数、点踩数、评论数、举报数、收藏数、屏蔽数等,将有效点展信息和有效负评信息数输入第一识别模型进行识别,相当于根据用户对待识别资源的正面评价(包括点赞数、收藏数、有效一般评论信息数等)和负面评价(包括点踩数、有效负评信息数等)共同对待识别资源进行识别,从而得到第一识别结果。具体地,若待识别资源的负面评价数与正面评价数的比值大于预设阈值,则第一识别结果为“是”,即待识别资源为特定类型资源;若待识别资源的负面评价数与正面评价数的比值不大于预设阈值,则第一识别结果为“否”,即待识别资源不为特定类型资源,其中,预设阈值可以根据实际需求自行设定。In one possible implementation, the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified. The effective click information and the effective load are The number of review information is input into the first recognition model for identification, which is equivalent to based on the user's positive reviews (including the number of likes, the number of collections, the number of valid general review information, etc.) and the negative reviews (including the number of dislikes, valid negative review information) of the resource to be identified. numbers, etc.) to jointly identify the resource to be identified, thereby obtaining the first identification result. Specifically, if the ratio of the number of negative evaluations to the number of positive evaluations of the resource to be identified is greater than the preset threshold, the first identification result is "yes", that is, the resource to be identified is a specific type of resource; if the number of negative evaluations of the resource to be identified is equal to If the ratio of the number of positive reviews is not greater than the preset threshold, the first identification result is "No", that is, the resource to be identified is not a specific type of resource. The preset threshold can be set according to actual needs.
在本公开第四实施例中,统计有效负评信息数,并将有效负评信息数和有效点展信息输入第一识别模型进行识别,可以提高资源识别的准确率。In the fourth embodiment of the present disclosure, the number of valid negative review information is counted, and the number of valid negative review information and valid point information are input into the first identification model for identification, which can improve the accuracy of resource identification.
在本公开第五实施例中,后验信息包括用户对待识别资源的点展信息和举报信息,点展信息包括用户对待识别资源的不同操作所对应的点击信息数和待识别资源的展示信息,不同操作包括点赞、点踩、评论、举报、收藏等,展示信息包括待识别资源的停留时长、停留频次和播放或浏览完成度等,步骤S204具体包括:In the fifth embodiment of the present disclosure, the posterior information includes the user's click information and reporting information of the resource to be identified, and the click information includes the number of click information corresponding to the user's different operations on the resource to be identified and the display information of the resource to be identified, Different operations include likes, dislikes, comments, reports, collections, etc. The display information includes the length of stay, frequency of stays, playback or browsing completion of the resource to be identified, etc. Step S204 specifically includes:
统计有效举报信息的数量,得到有效举报信息数;将有效举报信息数和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。The number of valid reporting information is counted to obtain the number of valid reporting information; the number of valid reporting information and the valid click information are input into the first identification model, the resources to be identified are identified, and the first identification result is obtained.
在本实施例中,首先需要统计有效举报信息的数量,然后将有效举报信息数和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。In this embodiment, it is first necessary to count the number of valid reporting information, and then input the number of valid reporting information and the valid point information into the first identification model to identify the resource to be identified and obtain the first identification result.
在一可实施方式中,有效点展信息中的点击信息数包括待识别资源的点赞数、点踩数、评论数、举报数、收藏数、屏蔽数等,将有效点展信息和有效举报信息数输入第一识别模型进行识别,相当于根据用户对待识别资源的正面评价(包括点赞数、收藏数)和负面评价(包括点踩数、有效举报信息数等)共同对待识别资源进行识别,从而得到第一识别结果。具体地,若待识别资源的负面评价数与正面评价数的比值大于预设阈值,则第一识别结果为“是”,即待识别资源为特定类型资源;若待识别资源的负面评价数与正面评价数的比值不大于预设阈值,则第一识别结果为“否”,即待识别资源不为特定类型资源,其中,预设阈值可以根据实际需求自行设定。In one possible implementation, the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified, and the effective click information and effective reports are The number of information is input into the first identification model for identification, which is equivalent to identifying the resource to be identified based on the user's positive evaluation (including the number of likes and collections) and negative evaluation (including the number of dislikes, the number of valid reporting information, etc.) of the resource to be identified. , thereby obtaining the first recognition result. Specifically, if the ratio of the number of negative evaluations to the number of positive evaluations of the resource to be identified is greater than the preset threshold, the first identification result is "yes", that is, the resource to be identified is a specific type of resource; if the number of negative evaluations of the resource to be identified is equal to If the ratio of the number of positive reviews is not greater than the preset threshold, the first identification result is "No", that is, the resource to be identified is not a specific type of resource. The preset threshold can be set according to actual needs.
在本公开第五实施例中,统计有效举报信息数,并将有效举报信息数和有效点展信息输入第一识别模型进行识别,可以提高资源识别的准确率。In the fifth embodiment of the present disclosure, the number of valid reporting information is counted, and the number of valid reporting information and the valid point information are input into the first identification model for identification, which can improve the accuracy of resource identification.
图4是根据本公开第六实施例的一种资源识别方法的流程示意图,如图4所示,后验信息包括用户对待识别资源的点展信息、评论信息和举报信息,点展信息包括用户对待识别资源的不同操作所对应的点击信息数和待识别资源的展示信息,不同操作包括点赞、点踩、评论、举报、收藏等,展示信息包括待识别资源的停留时长、停留频次和播放或浏览完成度等,步骤S204具体包括:Figure 4 is a schematic flow chart of a resource identification method according to the sixth embodiment of the present disclosure. As shown in Figure 4, the posterior information includes the user's click information, comment information and report information of the resource to be identified, and the click information includes the user's click information. The number of clicks corresponding to different operations on the resource to be identified and the display information of the resource to be identified. The different operations include likes, dislikes, comments, reports, collections, etc. The display information includes the length of stay, stay frequency and playback of the resource to be identified. Or browsing completion, etc. Step S204 specifically includes:
步骤S401,根据第一分类模型,对有效评论信息添加第一标签,得到第一标签结果。Step S401: Add a first label to the valid comment information according to the first classification model to obtain a first label result.
步骤S401与步骤S301类似,在此不再赘述。Step S401 is similar to step S301 and will not be described again here.
步骤S402,统计第一标签结果中所有负评标签所对应的有效评论信息的数量和有效举报信息的数量,得到有效负评信息数和有效举报信息数。Step S402: Count the number of valid comment information and the number of valid report information corresponding to all negative review tags in the first tag result, and obtain the number of valid negative review information and the number of valid report information.
步骤S403,将有效负评信息数、有效举报信息数和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。Step S403: Enter the number of valid negative review information, the number of valid report information, and the valid point-of-view information into the first identification model, identify the resources to be identified, and obtain the first identification result.
在本实施例中,得到第一标签结果之后,可以对第一标签结果中所有负评标签对应的有效评论信息的数量和有效举报信息的数量进行统计,得到有效负评信息数和有效举报信息数,然后将有效负评信息数、有效举报信息数和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。In this embodiment, after obtaining the first tag result, the number of valid comment information and the number of valid report information corresponding to all negative review tags in the first tag result can be counted to obtain the number of valid negative review information and valid report information. number, and then input the number of valid negative review information, the number of valid report information and the valid point-of-view information into the first identification model, identify the resources to be identified, and obtain the first identification result.
在一可实施方式中,有效点展信息中的点击信息数包括待识别资源的点赞数、点踩数、评论数、举报数、收藏数、屏蔽数等,将有效点展信息、有效负评信息数和有效举报信息数等输入第一识别模型进行识别,相当于根据用户对待识别资源的正面评价(包括点赞数、收藏数、有效一般评论信息数等)和负面评价(包括点踩数、有效举报信息数、有效负评信息数等)共同对待识别资源进行识别,从而得到第一识别结果。具体地,若待识别资源的负面评价数与正面评价数的比值大于预设阈值,则第一识别结果为“是”,即待识别资源为特定类型资源;若待识别资源的负面评价数与正面评价数的比值不大于预设阈值,则第一识别结果为“否”,即待识别资源不为特定类型资源,其中,预设阈值可以根据实际需求自行设定。In one possible implementation, the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified, and the effective click information, effective negative The number of comment information and the number of valid report information are input into the first identification model for identification, which is equivalent to based on the user's positive evaluation (including the number of likes, the number of collections, the number of valid general comment information, etc.) and negative evaluation (including the number of thumbs down) of the resource to be identified. number, the number of valid reporting information, the number of valid negative review information, etc.) to jointly identify the resources to be identified, thereby obtaining the first identification result. Specifically, if the ratio of the number of negative evaluations to the number of positive evaluations of the resource to be identified is greater than the preset threshold, the first identification result is "yes", that is, the resource to be identified is a specific type of resource; if the number of negative evaluations of the resource to be identified is equal to If the ratio of the number of positive reviews is not greater than the preset threshold, the first identification result is "No", that is, the resource to be identified is not a specific type of resource. The preset threshold can be set according to actual needs.
在本公开第六实施例中,统计有效负评信息数、有效举报信息数,并将有效负评信息数、有效举报信息数和有效点展信息输入第一识别模型进行识别,可以进一步提高资源识别的准确率。In the sixth embodiment of the present disclosure, the number of valid negative review information and the number of valid report information are counted, and the number of valid negative review information, the number of valid report information, and the valid click information are input into the first identification model for identification, which can further improve resources. Recognition accuracy.
图5是根据本公开第七实施例的一种资源识别方法的流程示意图,如图5所示,步骤S204具体包括:Figure 5 is a schematic flowchart of a resource identification method according to the seventh embodiment of the present disclosure. As shown in Figure 5, step S204 specifically includes:
步骤S501,根据第一分类模型,对有效评论信息添加第一标签,得到第一标签结果。Step S501: Add a first label to the valid comment information according to the first classification model to obtain a first label result.
步骤S502,统计第一标签结果中不同负评标签对应的有效评论信息的数量,得到第一统计结果。Step S502: Count the number of valid comment information corresponding to different negative comment labels in the first label result to obtain the first statistical result.
在本实施例中,在训练第一分类模型时,可以将其第一标签设置为文章编造、标题硬伤、内容编造、题文不符、一般、优秀等,其中文章编造、标题硬伤、内容编造、题文不符等负面标签可以归为负评标签;在根据第一分类模型对有效评论信息添加第一标签之后,得到的第一标签结果就将有效评论信息分为文章编造、标题硬伤、内容编造、题文不符、一般、优秀等多类,然后统计第一标签结果中不同负评标签对应的有效评论信息的数量;例如,文章编造标签对应的有效评论信息数、内容编造标签对应的有效评论信息数和题文不符标签对应的有效评论信息数等,从而得到第一统计结果。In this embodiment, when training the first classification model, its first label can be set as article fabrication, title flaw, content fabrication, title incompatibility, average, excellent, etc., where article fabrication, title flaw, content Negative labels such as fabrication and title inconsistency can be classified as negative review labels; after adding the first label to the effective review information according to the first classification model, the obtained first label result will classify the effective review information into article fabrication and title falsification. , content fabrication, inconsistent title, average, excellent, etc., and then count the number of valid comment information corresponding to different negative review labels in the first label result; for example, the number of valid comment information corresponding to the article fabrication label, the number of valid comment information corresponding to the content fabrication label The number of valid comment information and the number of valid comment information corresponding to the label that the title does not match, etc., thereby obtaining the first statistical result.
步骤S503,根据第二分类模型,对有效举报信息添加第二标签,得到第二标签结果。Step S503: Add a second label to the valid reporting information according to the second classification model to obtain a second label result.
步骤S504,统计第二标签结果中不同第二标签对应的有效举报信息的数量,得到第二统计结果。Step S504: Count the number of valid reporting information corresponding to different second tags in the second tag result to obtain a second statistical result.
在本实施例中,在训练第二分类模型时,可以将其第二标签设置为文章编造、标题硬伤、标题低质、题文不符等;在根据第二分类模型对有效举报信息添加第二标签之后,得到的第二标签结果就将有效举报信息分为文章编造、标题硬伤、标题低质、题文不符等多类,然后统计第二标签结果中不同第二标签对应的有效举报信息的数量;例如,文章编造标签对应的有效举报信息数、标题硬伤标签对应的有效举报信息数和标题低质标签对应的有效举报信息数等,从而得到第二统计结果。In this embodiment, when training the second classification model, its second label can be set to fabricated articles, flawed titles, low-quality titles, inconsistent titles, etc.; when adding a third label to the effective reporting information according to the second classification model After the second tag, the obtained second tag results will be divided into multiple categories such as fabricated articles, inaccurate titles, low-quality titles, inconsistent titles, etc., and then count the valid reports corresponding to different second tags in the second tag results. The number of information; for example, the number of valid reporting information corresponding to the article fabrication tag, the number of valid reporting information corresponding to the false title tag, the number of valid reporting information corresponding to the low-quality title tag, etc., thereby obtaining the second statistical result.
在一可实施方式中,第二分类模型基于机器学习模型训练生成,机器学习模型可以根据实际需求选取,第二分类模型的训练方法与第一识别模型和第二识别模型类似,在此不再赘述。In one implementation, the second classification model is generated based on machine learning model training. The machine learning model can be selected according to actual needs. The training method of the second classification model is similar to the first recognition model and the second recognition model, and will not be discussed here. Repeat.
步骤S505,将第一统计结果、第二统计结果和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。Step S505: Input the first statistical result, the second statistical result and the valid spread information into the first recognition model, identify the resources to be identified, and obtain the first recognition result.
在本实施例中,有效点展信息中的点击信息数包括待识别资源的点赞数、点踩数、评论数、举报数、收藏数、屏蔽数等,将第一统计结果、第二统计结果和有效点展信息等输入第一识别模型进行识别,相当于根据用户对待识别资源的正面评价(包括点赞数、收藏数、有效一般评论信息数等)和负面评价(包括点踩数、有效举报信息数、有 效负评信息数等)共同对待识别资源进行识别,从而得到第一识别结果。具体地,若待识别资源的负面评价数与正面评价数的比值大于预设阈值,则第一识别结果为“是”,即待识别资源为特定类型资源,此时第一识别结果还可以包含待识别资源为特定类型资源的原因,比如题文不符或标题夸大等;若待识别资源的负面评价数与正面评价数的比值不大于预设阈值,则第一识别结果为“否”,即待识别资源不为特定类型资源,其中,预设阈值可以根据实际需求自行设定。In this embodiment, the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified. The first statistical result, the second statistical result The results and valid click information are input into the first recognition model for identification, which is equivalent to based on the user's positive evaluation of the resource to be identified (including the number of likes, the number of collections, the number of valid general comment information, etc.) and the negative evaluation (including the number of dislikes, The number of valid reporting information, the number of valid negative review information, etc.) are jointly used to identify the resources to be identified, thereby obtaining the first identification result. Specifically, if the ratio of the number of negative evaluations to the number of positive evaluations of the resource to be identified is greater than the preset threshold, the first identification result is "yes", that is, the resource to be identified is a specific type of resource. At this time, the first identification result may also include The reason why the resource to be identified is a specific type of resource, such as the title does not match or the title is exaggerated; if the ratio of the number of negative evaluations to the number of positive evaluations of the resource to be identified is not greater than the preset threshold, the first identification result is "No", that is The resources to be identified are not resources of a specific type, and the preset threshold can be set according to actual needs.
在本公开第七实施例中,统计不同负评标签对应的有效评论信息的数量和不同第二标签对应的有效举报信息的数量,得到第一统计结果和第二统计结果,并将第一统计结果、第二统计结果和有效点展信息输入第一识别模型进行识别,得到第一识别结果,可以进一步提高资源识别的准确率。In the seventh embodiment of the present disclosure, the number of valid comment information corresponding to different negative review tags and the number of valid report information corresponding to different second tags are counted to obtain the first statistical result and the second statistical result, and the first statistical result is The result, the second statistical result and the effective point spread information are input into the first identification model for identification, and the first identification result is obtained, which can further improve the accuracy of resource identification.
在本公开第八实施例中,第二识别模型可以为基于自然语言处理模型训练生成的模型,步骤S103具体包括:In the eighth embodiment of the present disclosure, the second recognition model may be a model generated based on natural language processing model training. Step S103 specifically includes:
对先验信息进行分词处理后转换成向量矩阵;根据向量矩阵和第二识别模型,对待识别资源进行识别,得到第二识别结果。The prior information is segmented and converted into a vector matrix; based on the vector matrix and the second recognition model, the resources to be identified are identified to obtain the second recognition result.
在本实施例中,先验信息可以作为待识别资源的语义特征,首先对先验信息进行分词处理并转换成向量矩阵,即将先验信息转换为计算机可识别的数据,然后根据向量矩阵和第二识别模型,对待识别资源进行识别,得到第二识别结果。In this embodiment, the prior information can be used as the semantic feature of the resource to be identified. First, the prior information is segmented and converted into a vector matrix, that is, the prior information is converted into computer-recognizable data, and then based on the vector matrix and the third The second identification model identifies the resource to be identified and obtains the second identification result.
在一可实施方式中,用于训练第二识别模型的自然语言处理模型可以为长短期记忆模型(LSTM,Long-short term memory)、Transformer模型和BERT模型(Bidirectional Encoder Representation from Transformers)等,本公开不对自然语言处理模型的选取进行限定。In an implementation, the natural language processing model used to train the second recognition model can be a long short-term memory model (LSTM, Long-short term memory), a Transformer model, a BERT model (Bidirectional Encoder Representation from Transformers), etc., this There are no public restrictions on the selection of natural language processing models.
在一可实施方式中,可以人工对先验信息进行分词,也可以利用分词工具对先验信息进行分词,可以利用Word2Vec模型或GloVe模型等用于产生词向量的模型将进行分词处理后的先验信息转换成向量矩阵。In an implementation, the prior information can be manually segmented, or a word segmentation tool can be used to segment the prior information. A model such as the Word2Vec model or the GloVe model for generating word vectors can be used to segment the prior information. Convert empirical information into a vector matrix.
本公开第八实施例,利用自然语言处理模型训练生成第二识别模型,根据第二识别模型对待识别资源的先验信息进行识别,得到第二识别结果,以便后续结合第一识别结果与第二识别结果判断待识别资源是否为特定类型资源,提高资源识别的准确率。The eighth embodiment of the present disclosure uses natural language processing model training to generate a second recognition model, identifies the prior information of the resources to be identified according to the second recognition model, and obtains the second recognition result, so that the first recognition result and the second recognition result can be subsequently combined. The identification results determine whether the resource to be identified is a specific type of resource, thereby improving the accuracy of resource identification.
在本公开第九实施例中,步骤S104具体包括:In the ninth embodiment of the present disclosure, step S104 specifically includes:
第一识别结果和第二识别结果中有任一结果表征待识别资源属于特定类型资源,则生成用于表征待识别资源属于特定类型资源的第三识别结果;否则,生成的用于表征待 识别资源不属于特定类型资源的第三识别结果。在本实施例中,第一识别结果从用户角度来体现待识别资源是否为特定类型资源,第二识别结果从语义角度来体现待识别资源是否为特定类型资源,也就是说,只要第一识别结果和第二识别结果中有一个显示待识别资源为特定类型资源,则可以认为待识别资源为特定类型资源;若第一识别结果和第二识别结果均显示待识别资源不为特定类型资源,则认为待识别资源不为特定类型资源。If any of the first recognition result and the second recognition result indicates that the resource to be identified belongs to a specific type of resource, then a third recognition result is generated to indicate that the resource to be identified belongs to a specific type of resource; otherwise, the generated result is used to indicate that the resource to be identified belongs to a specific type of resource. The third identification result is that the resource does not belong to a specific type of resource. In this embodiment, the first identification result reflects whether the resource to be identified is a specific type of resource from the user's perspective, and the second identification result reflects whether the resource to be identified is a specific type of resource from a semantic perspective. That is to say, as long as the first identification If one of the first identification result and the second identification result shows that the resource to be identified is a resource of a specific type, the resource to be identified can be considered to be a resource of a specific type; if both the first identification result and the second identification result show that the resource to be identified is not a resource of a specific type, It is considered that the resource to be identified is not a specific type of resource.
本公开第九实施例,结合第一识别结果与第二识别结果来判断待识别资源是否为特定类型资源,可以提高对待识别资源标题识别的准确率。The ninth embodiment of the present disclosure combines the first recognition result and the second recognition result to determine whether the resource to be identified is a specific type of resource, which can improve the accuracy of title recognition of the resource to be identified.
下面以抖音短视频平台为例,对本公开实施例提供的一种资源识别方法进行进一步说明。The following takes the Douyin short video platform as an example to further explain a resource identification method provided by embodiments of the present disclosure.
针对抖音短视频中的一个视频资源,首先获取该视频资源的后验信息和先验信息,后验信息即为用户对该视频资源的反馈信息,例如点赞数、点踩数、评论数、举报数、收藏数、屏蔽数等,后验信息包括用户对待识别资源的点展信息、评论信息和举报信息,先验信息即为该视频资源的语义信息,例如标题文本和字幕等;然后将后验信息中与异常用户信息对应的后验信息删除,得到有效后验信息,并将有效后验信息输入第一识别模型进行识别,得到第一识别结果,第一识别结果可以体现该视频资源是否为特定类型资源,当然,在将有效后验信息输入第一识别模型之前,也可以对有效负评信息和有效举报信息添加标签;之后将先验信息输入第二识别模型进行识别,得到第二识别结果,第二识别结果也可以体现该视频资源是否为特定类型资源;如果第一识别结果和第二识别结果中有一个显示该视频资源为特定类型资源,则可以得出第三识别结果为该视频资源为特定类型资源,例如标题党资源等。For a video resource in Douyin short video, first obtain the posterior information and prior information of the video resource. The posterior information is the user's feedback information on the video resource, such as the number of likes, the number of dislikes, and the number of comments. , number of reports, number of collections, number of blocks, etc. The posterior information includes the user’s click information, comment information and report information of the resource to be identified. The prior information is the semantic information of the video resource, such as title text and subtitles, etc.; then Delete the posterior information corresponding to the abnormal user information in the posterior information to obtain effective posterior information, and input the effective posterior information into the first recognition model for identification to obtain the first recognition result. The first recognition result can reflect the video Whether the resource is a specific type of resource, of course, before entering the effective posterior information into the first identification model, you can also add labels to the effective negative review information and effective report information; then enter the prior information into the second identification model for identification, and get The second recognition result can also reflect whether the video resource is a specific type of resource; if one of the first recognition result and the second recognition result shows that the video resource is a specific type of resource, a third recognition result can be obtained The result is that the video resource is a specific type of resource, such as a clickbait resource, etc.
需要强调的是,本公开实施例提供的一种资源识别方法不仅可以应用于短视频平台,还可以应用于其他各种自媒体平台。It should be emphasized that the resource identification method provided by the embodiments of the present disclosure can be applied not only to short video platforms, but also to various other self-media platforms.
图6是根据本公开第十实施例的一种资源识别装置的结构示意图,如图6所示,该装置主要包括:Figure 6 is a schematic structural diagram of a resource identification device according to the tenth embodiment of the present disclosure. As shown in Figure 6, the device mainly includes:
第一获取模块60,用于获取待识别资源的后验信息和先验信息,后验信息用于体现用户对待识别资源的反馈信息,先验信息用于体现待识别资源的语义信息;第一识别模块61,用于根据第一识别模型和后验信息,对待识别资源进行识别,得到第一识别结果;第二识别模块62,用于根据第二识别模型和先验信息,对待识别资源进行识别,得 到第二识别结果;生成模块63,用于根据第一识别结果和第二识别结果,生成第三识别结果。The first acquisition module 60 is used to obtain posterior information and prior information of the resource to be identified, the posterior information is used to reflect the user's feedback information of the resource to be identified, and the prior information is used to reflect the semantic information of the resource to be identified; first The identification module 61 is used to identify the resource to be identified based on the first identification model and a posteriori information, and obtain the first identification result; the second identification module 62 is used to identify the resource to be identified based on the second identification model and a priori information. Recognize to obtain a second recognition result; the generation module 63 is configured to generate a third recognition result based on the first recognition result and the second recognition result.
在一可实施方式中,该装置还包括:In an implementation, the device further includes:
第二获取模块,用于获取第一训练集和第二训练集,第一训练集包括已得到资源识别结果的资源及其后验信息,第二训练集包括已得到资源识别结果的资源及其先验信息;训练模块,用于分别根据第一训练集和第二训练集对机器学习模型进行训练,得到第一识别模型和第二识别模型。The second acquisition module is used to acquire the first training set and the second training set. The first training set includes the resources for which the resource identification results have been obtained and their posterior information. The second training set includes the resources for which the resource identification results have been obtained and their posterior information. Prior information; a training module, used to train the machine learning model based on the first training set and the second training set respectively to obtain the first recognition model and the second recognition model.
在一可实施方式中,第一识别模块61主要包括:In an implementation, the first identification module 61 mainly includes:
获取子模块,用于获取后验信息对应的用户信息;用户识别子模块,用于根据用户识别模型识别用户信息中的异常用户信息;删除子模块,用于删除后验信息中与异常用户信息对应的后验信息,得到有效后验信息;第一识别子模块,用于根据第一识别模型和有效后验信息,对待识别资源进行识别,得到第一识别结果。The acquisition sub-module is used to obtain the user information corresponding to the posterior information; the user identification sub-module is used to identify abnormal user information in the user information according to the user identification model; the delete sub-module is used to delete the abnormal user information in the posterior information The corresponding posterior information is used to obtain effective posterior information; the first identification submodule is used to identify the resource to be identified based on the first identification model and the effective posterior information, and obtain the first identification result.
在一可实施方式中,后验信息包括用户对待识别资源的点展信息、评论信息和举报信息,点展信息包括用户对待识别资源的不同操作所对应的点击信息数和待识别资源的展示信息;第一识别子模块用于:根据第一分类模型,对有效评论信息添加第一标签,得到第一标签结果;统计第一标签结果中所有负评标签所对应的有效评论信息的数量,得到有效负评信息数;将有效负评信息数和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。In one implementation, the posterior information includes the user's click information, comment information and report information of the resource to be identified, and the click information includes the number of clicks corresponding to the user's different operations on the resource to be identified and the display information of the resource to be identified. ; The first identification sub-module is used to: add a first label to the valid comment information according to the first classification model to obtain the first label result; count the number of valid comment information corresponding to all negative review labels in the first label result to obtain The number of valid negative review information; input the number of valid negative review information and valid point information into the first identification model, identify the resource to be identified, and obtain the first identification result.
在一可实施方式中,第一识别子模块还用于:统计有效举报信息的数量,得到有效举报信息数;将有效举报信息数和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。In one possible implementation, the first identification sub-module is also used to: count the number of valid reporting information to obtain the number of valid reporting information; input the number of valid reporting information and the effective click information into the first identification model to identify the resources to be identified , get the first recognition result.
在一可实施方式中,第一识别子模块还用于:根据第一分类模型,对有效评论信息添加第一标签,得到第一标签结果;统计第一标签结果中所有负评标签所对应的有效评论信息的数量和有效举报信息的数量,得到有效负评信息数和有效举报信息数;将有效负评信息数、有效举报信息数和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。In one possible implementation, the first identification sub-module is also used to: add a first label to the valid review information according to the first classification model to obtain a first label result; and count the number of negative review labels corresponding to all negative review labels in the first label result. The number of valid comment information and the number of valid report information are used to obtain the number of valid negative review information and the number of valid report information; the number of valid negative review information, the number of valid report information and the effective click information are input into the first identification model, and the resources to be identified are processed Recognize and obtain the first recognition result.
在一可实施方式中,第一识别子模块还用于:根据第一分类模型,对有效评论信息添加第一标签,得到第一标签结果;统计第一标签结果中不同负评标签对应的有效评论信息的数量,得到第一统计结果;根据第二分类模型,对有效举报信息添加第二标签,得到第二标签结果;统计第二标签结果中不同第二标签对应的有效举报信息的数量,得 到第二统计结果;将第一统计结果、第二统计结果和有效点展信息输入第一识别模型,对待识别资源进行识别,得到第一识别结果。In an implementation, the first identification sub-module is also used to: add a first tag to the valid comment information according to the first classification model to obtain a first tag result; count the valid negative comment tags corresponding to the first tag result. The number of comment information is used to obtain the first statistical result; according to the second classification model, a second label is added to the effective reporting information to obtain the second label result; the number of valid reporting information corresponding to different second labels in the second label result is counted, Obtain the second statistical result; input the first statistical result, the second statistical result and the effective point spread information into the first recognition model, identify the resources to be identified, and obtain the first recognition result.
在一可实施方式中,第二识别模块62主要包括:In an implementation manner, the second identification module 62 mainly includes:
转换子模块,用于对先验信息进行分词处理后转换成向量矩阵;第二识别子模块,用于根据向量矩阵和第二识别模型,对待识别资源进行识别,得到第二识别结果。The conversion submodule is used to perform word segmentation processing on the prior information and convert it into a vector matrix; the second identification submodule is used to identify the resources to be identified based on the vector matrix and the second identification model, and obtain the second identification result.
在一可实施方式中,生成模块63主要包括:In an implementation manner, the generation module 63 mainly includes:
第一生成子模块,用于第一识别结果和第二识别结果中有任一结果表征待识别资源属于特定类型资源,则生成用于表征待识别资源属于特定类型资源的第三识别结果;第二生成子模块,用于生成的用于表征待识别资源不属于特定类型资源的第三识别结果。The first generation submodule is used to generate a third identification result used to represent that the resource to be identified belongs to a specific type of resource if any of the first identification result and the second identification result indicates that the resource to be identified belongs to a specific type of resource; The second generation submodule is used to generate a third identification result that indicates that the resource to be identified does not belong to a specific type of resource.
本公开的技术方案中,所涉及的用户个人信息的获取,存储和应用等,均符合相关法律法规的规定,且不违背公序良俗。In the technical solution of this disclosure, the acquisition, storage and application of user personal information involved are in compliance with relevant laws and regulations and do not violate public order and good customs.
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
图7示出了可以用来实施本公开的实施例的示例电子设备700的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字助理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图7所示,设备700包括计算单元701,其可以根据存储在只读存储器(ROM)702中的计算机程序或者从存储单元708加载到随机访问存储器(RAM)703中的计算机程序,来执行各种适当的动作和处理。在RAM 703中,还可存储设备700操作所需的各种程序和数据。计算单元701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7 , the device 700 includes a computing unit 701 that can execute according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 Various appropriate actions and treatments. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
设备700中的多个部件连接至I/O接口705,包括:输入单元706,例如键盘、鼠标等;输出单元707,例如各种类型的显示器、扬声器等;存储单元708,例如磁盘、光盘等;以及通信单元709,例如网卡、调制解调器、无线通信收发机等。通信单元709允许设备700通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 700 are connected to the I/O interface 705, including: an input unit 706, such as a keyboard, a mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; a storage unit 708, such as a magnetic disk, optical disk, etc. ; and communication unit 709, such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.
计算单元701可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元701的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元701执行上文所描述的各个方法和处理,例如一种资源识别方法。例如,在一些实施例中,一种资源识别方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元708。在一些实施例中,计算机程序的部分或者全部可以经由ROM 702和/或通信单元709而被载入和/或安装到设备700上。当计算机程序加载到RAM 703并由计算单元701执行时,可以执行上文描述的一种资源识别方法的一个或多个步骤。备选地,在其他实施例中,计算单元701可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行一种资源识别方法。 Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc. The computing unit 701 performs various methods and processes described above, such as a resource identification method. For example, in some embodiments, a resource identification method may be implemented as a computer software program that is tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of a resource identification method described above may be performed. Alternatively, in other embodiments, computing unit 701 may be configured to perform a resource identification method in any other suitable manner (eg, by means of firmware).
本文中以上描述的***和技术的各种实施方式可以在数字电子电路***、集成电路***、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上***的***(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程***上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储***、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储***、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor The processor, which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行***、装置或设备使用或与指令执行***、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体***、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气 连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
为了提供与用户的交互,可以在计算机上实施此处描述的***和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
可以将此处描述的***和技术实施在包括后台部件的计算***(例如,作为数据服务器)、或者包括中间件部件的计算***(例如,应用服务器)、或者包括前端部件的计算***(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的***和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算***中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将***的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
计算机***可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式***的服务器,或者是结合了区块链的服务器。Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other. The server can be a cloud server, a distributed system server, or a server combined with a blockchain.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that various forms of the process shown above may be used, with steps reordered, added or deleted. For example, each step described in the present disclosure can be executed in parallel, sequentially, or in a different order. As long as the desired results of the technical solution disclosed in the present disclosure can be achieved, there is no limitation here.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the scope of the present disclosure. It will be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions are possible depending on design requirements and other factors. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this disclosure shall be included in the protection scope of this disclosure.

Claims (13)

  1. 一种资源识别方法,包括:A resource identification method including:
    获取待识别资源的后验信息和先验信息,所述后验信息用于体现用户对所述待识别资源的反馈信息,所述先验信息用于体现所述待识别资源的语义信息;Obtain posterior information and prior information of the resource to be identified, the posterior information is used to reflect the user's feedback information on the resource to be identified, and the prior information is used to reflect the semantic information of the resource to be identified;
    根据第一识别模型和所述后验信息,对所述待识别资源进行识别,得到第一识别结果;Identify the resource to be identified according to the first identification model and the posterior information, and obtain a first identification result;
    根据第二识别模型和所述先验信息,对所述待识别资源进行识别,得到第二识别结果;以及Identify the resource to be identified according to the second identification model and the prior information, and obtain a second identification result; and
    根据所述第一识别结果和第二识别结果,生成第三识别结果。A third recognition result is generated based on the first recognition result and the second recognition result.
  2. 根据权利要求1所述的方法,其中,所述第一识别模型和第二识别模型通过以下方式获得:The method according to claim 1, wherein the first recognition model and the second recognition model are obtained by:
    获取第一训练集和第二训练集,所述第一训练集包括已得到资源识别结果的第一样本资源及其后验信息,所述第二训练集包括已得到资源识别结果的第二样本资源及其先验信息;以及Obtain a first training set and a second training set. The first training set includes the first sample resource for which the resource identification result has been obtained and its posterior information. The second training set includes the second sample resource for which the resource identification result has been obtained. Sample resources and their prior information; and
    将所述第一训练集输入第一机器学习模型进行训练得到所述第一识别模型,将所述第二训练集输入第二机器学习模型进行训练得到所述第二识别模型。The first training set is input into a first machine learning model for training to obtain the first recognition model, and the second training set is input into a second machine learning model for training to obtain the second recognition model.
  3. 根据权利要求1所述的方法,其中,所述根据第一识别模型和所述后验信息,对所述待识别资源进行识别,得到第一识别结果,包括:The method according to claim 1, wherein identifying the resource to be identified according to the first identification model and the posterior information to obtain the first identification result includes:
    获取所述后验信息对应的用户信息;Obtain user information corresponding to the posterior information;
    根据用户识别模型识别所述用户信息中的异常用户信息;Identify abnormal user information in the user information according to the user identification model;
    删除所述后验信息中与所述异常用户信息对应的后验信息,得到有效后验信息;以及Delete the posterior information corresponding to the abnormal user information in the posterior information to obtain effective posterior information; and
    根据所述第一识别模型和所述有效后验信息,对所述待识别资源进行识别,得到所述第一识别结果。The resource to be identified is identified according to the first identification model and the effective posterior information, and the first identification result is obtained.
  4. 根据权利要求3所述的方法,其中,所述后验信息包括用户对所述待识别资源的点展信息和评论信息,所述点展信息包括用户对所述待识别资源的不同操作所对应的点击信息数和所述待识别资源的展示信息;The method according to claim 3, wherein the posterior information includes the user's click information and comment information on the resource to be identified, and the click information includes the user's click information corresponding to different operations on the resource to be identified. The number of clicks and the display information of the resource to be identified;
    所述根据第一识别模型和有效后验信息,对所述待识别资源进行识别,得到第一识别结果,包括:The step of identifying the resource to be identified based on the first identification model and effective posterior information to obtain the first identification result includes:
    根据第一分类模型,对有效评论信息添加第一标签,得到第一标签结果;According to the first classification model, add the first label to the effective comment information to obtain the first label result;
    统计所述第一标签结果中所有负评标签所对应的有效评论信息的数量,得到有效负评信息数;以及Count the number of valid comment information corresponding to all negative review tags in the first tag result to obtain the number of valid negative review information; and
    将所述有效负评信息数和有效点展信息输入所述第一识别模型,对所述待识别资源进行识别,得到所述第一识别结果。The number of valid negative review information and valid point information are input into the first recognition model, and the resources to be identified are identified to obtain the first recognition result.
  5. 根据权利要求3所述的方法,其中,所述后验信息包括用户对所述待识别资源的点展信息和举报信息,所述点展信息包括用户对所述待识别资源的不同操作所对应的点击信息数和所述待识别资源的展示信息;The method according to claim 3, wherein the posterior information includes the user's click information and reporting information on the resource to be identified, and the click information includes the user's click information corresponding to different operations on the resource to be identified. The number of clicks and the display information of the resource to be identified;
    所述根据第一识别模型和有效后验信息,对所述待识别资源进行识别,得到第一识别结果,包括:The step of identifying the resource to be identified based on the first identification model and effective posterior information to obtain the first identification result includes:
    统计有效举报信息的数量,得到有效举报信息数;以及Count the number of valid reporting information and obtain the number of valid reporting information; and
    将所述有效举报信息数和有效点展信息输入所述第一识别模型,对所述待识别资源进行识别,得到所述第一识别结果。The number of valid reporting information and the valid point information are input into the first identification model, and the resource to be identified is identified to obtain the first identification result.
  6. 根据权利要求3所述的方法,其中,所述后验信息包括用户对所述待识别资源的点展信息、评论信息和举报信息,所述点展信息包括用户对所述待识别资源的不同操作所对应的点击信息数和所述待识别资源的展示信息;The method according to claim 3, wherein the a posteriori information includes the user's click information, comment information and report information on the resource to be identified, and the click information includes the user's different opinions on the resource to be identified. The number of click information corresponding to the operation and the display information of the resource to be identified;
    所述根据第一识别模型和有效后验信息,对所述待识别资源进行识别,得到第一识别结果,包括:The step of identifying the resource to be identified based on the first identification model and effective posterior information to obtain the first identification result includes:
    根据第一分类模型,对有效评论信息添加第一标签,得到第一标签结果;According to the first classification model, add the first label to the effective comment information to obtain the first label result;
    统计所述第一标签结果中所有负评标签所对应的有效评论信息的数量和有效举报信息的数量,得到有效负评信息数和有效举报信息数;以及Count the number of valid comment information and the number of valid report information corresponding to all negative review tags in the first tag result to obtain the number of valid negative review information and the number of valid report information; and
    将所述有效负评信息数、有效举报信息数和有效点展信息输入所述第一识别模型,对所述待识别资源进行识别,得到所述第一识别结果。The number of valid negative review information, the number of valid report information, and the valid point information are input into the first identification model, and the resource to be identified is identified to obtain the first identification result.
  7. 根据权利要求3所述的方法,其中,所述后验信息包括用户对所述待识别资源的点展信息、评论信息和举报信息,所述点展信息包括用户对所述待识别资源的不同操作所对应的点击信息数和所述待识别资源的展示信息;The method according to claim 3, wherein the a posteriori information includes the user's click information, comment information and report information on the resource to be identified, and the click information includes the user's different opinions on the resource to be identified. The number of click information corresponding to the operation and the display information of the resource to be identified;
    所述根据第一识别模型和有效后验信息,对所述待识别资源进行识别,得到第一识别结果,包括:The step of identifying the resource to be identified based on the first identification model and effective posterior information to obtain the first identification result includes:
    根据第一分类模型,对有效评论信息添加第一标签,得到第一标签结果;According to the first classification model, add the first label to the effective comment information to obtain the first label result;
    统计所述第一标签结果中不同负评标签对应的有效评论信息的数量,得到第一统计结果;Count the number of valid comment information corresponding to different negative review labels in the first label result to obtain the first statistical result;
    根据第二分类模型,对有效举报信息添加第二标签,得到第二标签结果;According to the second classification model, add a second label to the effective reporting information to obtain the second label result;
    统计所述第二标签结果中不同第二标签对应的有效举报信息的数量,得到第二统计结果;以及Count the number of valid reporting information corresponding to different second tags in the second tag results to obtain the second statistical result; and
    将所述第一统计结果、第二统计结果和有效点展信息输入第一识别模型,对所述待识别资源进行识别,得到第一识别结果。The first statistical result, the second statistical result and the effective point spread information are input into the first identification model, and the resource to be identified is identified to obtain the first identification result.
  8. 根据权利要求1所述的方法,其中,所述第二识别模型为基于自然语言处理模型训练生成的模型;所述根据第二识别模型和所述先验信息,对所述待识别资源进行识别,得到第二识别结果,包括:The method of claim 1, wherein the second recognition model is a model generated based on natural language processing model training; and the resource to be identified is identified based on the second recognition model and the prior information. , the second recognition result is obtained, including:
    对所述先验信息进行分词处理后转换成向量矩阵;以及Perform word segmentation processing on the prior information and convert it into a vector matrix; and
    根据所述向量矩阵和第二识别模型,对所述待识别资源进行识别,得到第二识别结果。The resource to be identified is identified according to the vector matrix and the second identification model, and a second identification result is obtained.
  9. 根据权利要求1至8任一项所述的方法,其中,根据所述第一识别结果和第二识别结果,生成第三识别结果,包括:The method according to any one of claims 1 to 8, wherein generating a third recognition result according to the first recognition result and the second recognition result includes:
    所述第一识别结果和第二识别结果中有任一结果表征所述待识别资源属于特定类型资源,则生成用于表征所述待识别资源属于所述特定类型资源的第三识别结果;以及If any of the first identification result and the second identification result indicates that the resource to be identified belongs to a specific type of resource, a third identification result is generated to indicate that the resource to be identified belongs to the specific type of resource; and
    否则,生成的用于表征所述待识别资源不属于所述特定类型资源的第三识别结果。Otherwise, a third identification result is generated to represent that the resource to be identified does not belong to the specific type of resource.
  10. 一种资源识别装置,包括:A resource identification device, including:
    第一获取模块,用于获取待识别资源的后验信息和先验信息,所述后验信息用于体现用户对所述待识别资源的反馈信息,所述先验信息用于体现所述待识别资源的语义信息;The first acquisition module is used to obtain posterior information and prior information of the resource to be identified. The posterior information is used to reflect the user's feedback information on the resource to be identified. The prior information is used to reflect the resource to be identified. Identify semantic information of resources;
    第一识别模块,用于根据第一识别模型和所述后验信息,对所述待识别资源进行识别,得到第一识别结果;A first identification module, configured to identify the resource to be identified according to the first identification model and the posterior information, and obtain a first identification result;
    第二识别模块,用于根据第二识别模型和所述先验信息,对所述待识别资源进行识别,得到第二识别结果;以及A second identification module, configured to identify the resource to be identified according to the second identification model and the prior information, and obtain a second identification result; and
    生成模块,用于根据所述第一识别结果和第二识别结果,生成第三识别结果。A generating module, configured to generate a third recognition result based on the first recognition result and the second recognition result.
  11. 一种电子设备,包括:An electronic device including:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-9中任一项所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1-9. Methods.
  12. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使计算机执行根据权利要求1-9中任一项所述的方法。A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the method according to any one of claims 1-9.
  13. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-9中任一项所述的方法。A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.
PCT/CN2022/127332 2022-06-16 2022-10-25 Resource recognition method and apparatus, and device and storage medium WO2023240878A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210694398.8A CN115099239B (en) 2022-06-16 2022-06-16 Resource identification method, device, equipment and storage medium
CN202210694398.8 2022-06-16

Publications (1)

Publication Number Publication Date
WO2023240878A1 true WO2023240878A1 (en) 2023-12-21

Family

ID=83290393

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/127332 WO2023240878A1 (en) 2022-06-16 2022-10-25 Resource recognition method and apparatus, and device and storage medium

Country Status (2)

Country Link
CN (1) CN115099239B (en)
WO (1) WO2023240878A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099239B (en) * 2022-06-16 2023-10-31 北京百度网讯科技有限公司 Resource identification method, device, equipment and storage medium
CN117172245A (en) * 2023-05-26 2023-12-05 国家计算机网络与信息安全管理中心 Control method and control system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170295189A1 (en) * 2016-04-11 2017-10-12 International Business Machines Corporation Identifying security breaches from clustering properties
US20180365574A1 (en) * 2017-06-20 2018-12-20 Beijing Baidu Netcom Science And Technology Co., L Td. Method and apparatus for recognizing a low-quality article based on artificial intelligence, device and medium
CN110705257A (en) * 2019-09-16 2020-01-17 腾讯科技(深圳)有限公司 Media resource identification method and device, storage medium and electronic device
WO2022068600A1 (en) * 2020-09-30 2022-04-07 百果园技术(新加坡)有限公司 Abnormal user detection model training method and apparatus, and abnormal user auditing method and apparatus
CN115099239A (en) * 2022-06-16 2022-09-23 北京百度网讯科技有限公司 Resource identification method, device, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106412B (en) * 2013-01-11 2016-04-20 广州广电运通金融电子股份有限公司 Flaky medium recognition methods and recognition device
CN109684513B (en) * 2018-12-14 2021-08-24 北京奇艺世纪科技有限公司 Low-quality video identification method and device
CN113590968A (en) * 2021-08-10 2021-11-02 平安普惠企业管理有限公司 Resource recommendation method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170295189A1 (en) * 2016-04-11 2017-10-12 International Business Machines Corporation Identifying security breaches from clustering properties
US20180365574A1 (en) * 2017-06-20 2018-12-20 Beijing Baidu Netcom Science And Technology Co., L Td. Method and apparatus for recognizing a low-quality article based on artificial intelligence, device and medium
CN110705257A (en) * 2019-09-16 2020-01-17 腾讯科技(深圳)有限公司 Media resource identification method and device, storage medium and electronic device
WO2022068600A1 (en) * 2020-09-30 2022-04-07 百果园技术(新加坡)有限公司 Abnormal user detection model training method and apparatus, and abnormal user auditing method and apparatus
CN115099239A (en) * 2022-06-16 2022-09-23 北京百度网讯科技有限公司 Resource identification method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115099239A (en) 2022-09-23
CN115099239B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
US10795939B2 (en) Query method and apparatus
CN108874776B (en) Junk text recognition method and device
US11200269B2 (en) Method and system for highlighting answer phrases
WO2020042925A1 (en) Man-machine conversation method and apparatus, electronic device, and computer readable medium
CN109670163B (en) Information identification method, information recommendation method, template construction method and computing device
WO2023240878A1 (en) Resource recognition method and apparatus, and device and storage medium
WO2020253350A1 (en) Network content publication auditing method and apparatus, computer device and storage medium
WO2020155423A1 (en) Cross-modal information retrieval method and apparatus, and storage medium
US10803253B2 (en) Method and device for extracting point of interest from natural language sentences
WO2021114810A1 (en) Graph structure-based official document recommendation method, apparatus, computer device, and medium
US20220027569A1 (en) Method for semantic retrieval, device and storage medium
US10803252B2 (en) Method and device for extracting attributes associated with centre of interest from natural language sentences
US20220284218A1 (en) Video classification method, electronic device and storage medium
US11397952B2 (en) Semi-supervised, deep-learning approach for removing irrelevant sentences from text in a customer-support system
CN114861889B (en) Deep learning model training method, target object detection method and device
US11977567B2 (en) Method of retrieving query, electronic device and medium
CN112559747B (en) Event classification processing method, device, electronic equipment and storage medium
CN112966081B (en) Method, device, equipment and storage medium for processing question and answer information
WO2023040230A1 (en) Data evaluation method and apparatus, training method and apparatus, and electronic device and storage medium
WO2023284327A1 (en) Method for training text quality assessment model and method for determining text quality
CN112579729A (en) Training method and device for document quality evaluation model, electronic equipment and medium
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
CN113204956B (en) Multi-model training method, abstract segmentation method, text segmentation method and text segmentation device
CN113111658A (en) Method, device, equipment and storage medium for checking information
US20230004715A1 (en) Method and apparatus for constructing object relationship network, and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22946548

Country of ref document: EP

Kind code of ref document: A1