CN116910628B - Creator expertise portrait assessment method and system - Google Patents

Creator expertise portrait assessment method and system Download PDF

Info

Publication number
CN116910628B
CN116910628B CN202311168195.6A CN202311168195A CN116910628B CN 116910628 B CN116910628 B CN 116910628B CN 202311168195 A CN202311168195 A CN 202311168195A CN 116910628 B CN116910628 B CN 116910628B
Authority
CN
China
Prior art keywords
creator
evaluated
creation
score
creators
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311168195.6A
Other languages
Chinese (zh)
Other versions
CN116910628A (en
Inventor
朱淑媛
曹珣
马宝军
姜昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unicom Online Information Technology Co Ltd
Original Assignee
China Unicom Online Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unicom Online Information Technology Co Ltd filed Critical China Unicom Online Information Technology Co Ltd
Priority to CN202311168195.6A priority Critical patent/CN116910628B/en
Publication of CN116910628A publication Critical patent/CN116910628A/en
Application granted granted Critical
Publication of CN116910628B publication Critical patent/CN116910628B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and a system for evaluating professional figures of an creator, belonging to the technical field of figure evaluation, wherein the method comprises the following steps: the method comprises the steps of obtaining a grading label and grading prediction probability of an author to be evaluated through a classification model, and obtaining a first professional degree score of each author to be evaluated in the authoring field; obtaining similarity between each to-be-evaluated creator and other authors adjacent to the to-be-evaluated creator in the affiliated creation domain in the link diagram, and obtaining a second specialty score of each to-be-evaluated creator in the affiliated creation domain based on the similarity and the first specialty score; fitting the first professional degree score and the second professional degree score of each to-be-evaluated creator in the belonging creation field to obtain the total professional degree score of each to-be-evaluated creator in the belonging creation field. The method and the system provided by the application exert various advantages of the supervised model and the unsupervised model, and can obtain the comprehensive creator professional evaluation portrait with higher accuracy and better robustness.

Description

Creator expertise portrait assessment method and system
Technical Field
The invention relates to the technical field of portrait assessment, in particular to a method and a system for assessing professional portraits of creators.
Background
At present, methods for constructing user portraits are various, however, most methods for constructing user portraits in the prior art take user consumption data as input, such as behavior habits, hobbies and the like, and the constructed user interest portraits lack of professional portraits for constructing creators from the production perspective; on the other hand, in most methods for constructing user portraits in the prior art, user scoring is also based on global scoring, and no subdivision of domain scores can be made.
Meanwhile, in the prior art, various ways of creator expertise measurement are available, for example, attention and attention relation methods between users are estimated based on Pagerank, or interaction relation between users and content is adopted, user influence is measured based on a matrix decomposition method, and then tensor decomposition is adopted, so that various user attention relations are comprehensively utilized to measure user influence, however, the above ways are the ways of using unsupervised modeling, and the ways of unsupervised modeling have the following problems: the result is not interpreted well; the method cannot adapt to new data; in addition, the accuracy is not enough, and in addition, a single model is adopted for scoring in an unsupervised modeling mode, so that the problem of low accuracy and robustness exists.
In addition, with the rapid development of the internet, more and more people participate in community creation, and creators are becoming more and more civilian. However, due to huge numbers of creators, the levels of the creators are uneven, life is taken along with the life, specialized creation is performed, and the specialized degree of the creators is difficult to distinguish. Particularly, in the search recommendation scene, a score is needed to describe the expertise of the creator field for the user to quickly locate specialized content and acquire knowledge, so that the user can quickly screen out a high-quality created content pool in the application scene such as search recommendation, and accurate and high-quality content distribution can be achieved.
Disclosure of Invention
The invention aims to provide a professional image evaluation method and system for an creator, which solve the defects in the prior art, and the technical problem to be solved by the invention is realized by the following technical scheme.
The invention provides a professional image evaluation method of an creator, which comprises the following steps:
collecting creator data corresponding to a plurality of creators respectively, acquiring professional grade corresponding to each creator according to the creator data, and taking the professional grade corresponding to each creator as a grading label of each creator to form an creator data set marked with the grading label;
dividing the creator data set into a training set and a verification set, and training and verifying the classification model through the training set and the verification set respectively to obtain a classification model with optimized parameters;
the method comprises the steps of obtaining creator data corresponding to each creator to be evaluated, extracting corresponding characteristics including the creation field, inputting the extracted characteristics into a classification model with optimized parameters, obtaining a grading label and grading prediction probability of the creator to be evaluated, and obtaining a first professional degree score of each creator to be evaluated in the creation field;
obtaining the affiliated creation fields of all the creators to be evaluated by calculating the verticality of the affiliated field, constructing a link diagram among all the creators to be evaluated by the association relation among all the creators to be evaluated, obtaining the similarity between the affiliated creation fields of all the creators to be evaluated and other creators adjacent to the affiliated creation fields in the link diagram, and obtaining the second specialty score of all the creators to be evaluated in the affiliated creation fields based on the similarity and the first specialty score;
and obtaining the total professional degree score of each to-be-evaluated creator in the affiliated creation domain by fitting the first professional degree score and the second professional degree score of each to-be-evaluated creator in the affiliated creation domain.
In the above scheme, the professional grade comprises three grades of 0, 1 and 2.
In the above solution, the obtaining the data of the authors corresponding to each of the authors to be evaluated, extracting the corresponding features including the domain of the authors to be evaluated, inputting the extracted features into the classification model with optimized parameters to obtain the classification labels and classification prediction probabilities of the authors to be evaluated, and obtaining the first professional scores of the authors to be evaluated in the domain of the authors to be evaluated includes the following steps:
the method comprises the steps of obtaining creator data corresponding to each creator to be evaluated, extracting basic characteristics of the creator data corresponding to each creator to be evaluated, and obtaining basic characteristics comprising creator basic information and creator content basic information;
extracting the extracted basic features again through the trained feature extractor to obtain high-order features including the creation field, and inputting the high-order features into a classification model with optimized parameters to obtain the grading label and grading prediction probability of the creator to be evaluated.
In the above-mentioned scheme, the method for obtaining the belonging authoring domain of each to-be-evaluated author by calculating the perpendicularity of the belonging domain includes the following steps:
extracting characteristics of the creator data corresponding to each creator to be evaluated, and obtaining works corresponding to each creator to be evaluated and creation content labels of the works within a period of time;
comparing the number of works corresponding to each creator to be evaluated in a period of time with the threshold number of works;
when the number of works corresponding to each creator to be evaluated in a period of time is smaller than or equal to the threshold number of works, obtaining the perpendicularity of the domain by substituting the number of works with the same creation content label into a first domain perpendicularity calculation formula, wherein the first domain perpendicularity calculation formula is as follows: (number of works with same authoring content label/number of works over time) ×sqrt (number of works over time/30);
when the number of works corresponding to each creator to be evaluated is greater than the threshold number of works within a period of time, obtaining the domain verticality by substituting the number of works with the same creation content label into a second domain verticality calculation formula, wherein the second domain verticality calculation formula is as follows: number of works/30 with the same authored content label;
and sequencing the calculated verticality of the domain according to the sequence from big to small, and taking the creation content label corresponding to the verticality of the domain sequenced in the first position as the creation domain of the creator to be evaluated.
In the above scheme, the construction of the link graph between the to-be-evaluated creators through the association relationship between the to-be-evaluated creators comprises the following steps:
removing invalid attention relations based on the behavior signals, establishing direct relations among the creators to be evaluated through the attention relations among the creators to be evaluated, and obtaining link information among the creators to be evaluated;
and taking each to-be-evaluated creator as a node, taking the link between two to-be-evaluated creators as a directed edge, and based on the link information between the to-be-evaluated creators, taking a link diagram between the to-be-evaluated creators.
In the above scheme, the similarity between each creator to be evaluated and the other authors adjacent to the creator in the link diagram in the belonging creation field is obtained through a similarity calculation formula, where the similarity calculation formula is as follows:
sim t (i,j)=simA t (i,j)×simB t (i, j), wherein sim t (i, j) is the similarity between the to-be-evaluated creator i and the to-be-evaluated creator j in the t-authoring domain, simA t (i, j) is the content similarity between the to-be-evaluated creator i and the to-be-evaluated creator j in the t-authoring domain, simB t (i, j) is the similarity of behavior between the to-be-evaluated creator i and the to-be-evaluated creator j in the t authoring field;
wherein, simA t (i,j)=1-∣DT' it -DT' jt ∣,DT' it To evaluate the work topic relevance of an author i in the field of t-authoring, DT' jt The method comprises the steps of evaluating the correlation degree of the work theme of an creator j in the t creation field;
simB t (i,j)=1-cosine(emb i ,emb j )+1,emb i embling vector for the creator i to be evaluated, emb j For the empadd vector of the creator j to be evaluated, cosine (emb i ,emb j ) The cosine similarity between the ebedding vector of the to-be-evaluated creator i and the ebedding vector of the to-be-evaluated creator j.
In the above-mentioned scheme, the obtaining the second professional degree score of each creator to be evaluated in the authoring domain based on the similarity and the first professional degree score includes the following steps:
obtaining a transfer matrix of each creator to be evaluated in the creation field based on the similarity and the first specialty score;
and calculating a second specialty score of each to-be-evaluated creator in the affiliated creation domain based on the transfer matrix.
In the above scheme, a second professional degree score of each creator to be evaluated in the authoring field is obtained through a second professional degree score calculation formula based on the transfer matrix, where the second professional degree score calculation formula is:
wherein->For the score matrix of all the authors to be evaluated in the field of t-authoring in the n-th round of iterative scoring process +.>For the score matrix of all the authors to be evaluated in the t authoring field in the n-1 th round of iterative scoring process, n is E [1, N]N is the total iterative round number in the iterative scoring process, N is the set maximum iterative round number, and in the iterative scoring process, when ∈>Or n>When the value is N, iteration is terminated, epsilon is a set termination parameter, P t Transfer matrix in the field of t-authoring for all authors to be evaluated, < >>For damping coefficient E t A professional degree matrix of authoring content labels corresponding to each to-be-evaluated author t authoring field;
wherein the transfer matrix P t The expression of each element is:
,sim t (i, j) is the similarity between the to-be-evaluated creator i and the to-be-evaluated creator j in the t-authoring domain, author_score (i) is the first professional score in the t authoring domain of the to-be-evaluated author i, and follow (i) is the total number of directed edges corresponding to the to-be-evaluated author i in the link graph.
In the above solution, the obtaining the total professional degree score of each to-be-evaluated creator in the belonging authoring domain by fitting the first professional degree score and the second professional degree score of each to-be-evaluated creator in the belonging authoring domain includes the following steps:
marking content labels on the creator data set marked with the grading labels, randomly extracting creator data of different corresponding grading labels according to the content labels from the creator data set to obtain the parilwise data, and marking the parilwise data;
and training a linear logistic regression model through the marked paper data, and fitting the first professional degree score and the second professional degree score of each creator to be evaluated in the creation field through the trained linear logistic regression model to obtain the total professional degree score of each creator to be evaluated in the creation field.
The creator professional portrayal evaluation system provided by the invention adopts the creator professional portrayal evaluation method to evaluate the creator professional portrayal, and comprises the following steps:
the data set acquisition module is used for acquiring creator data corresponding to a plurality of creators respectively, acquiring professional grade corresponding to each creator according to the creator data, and taking the professional grade corresponding to each creator as a grading label of each creator to form an creator data set marked with the grading label;
the classification model training module is used for dividing the creator data set into a training set and a verification set, training and verifying the classification model through the training set and the verification set respectively, and obtaining a classification model with optimized parameters;
the first professional degree score acquisition module is used for acquiring the corresponding creator data of each creator to be evaluated, extracting the corresponding characteristics comprising the creation field, inputting the extracted characteristics into the classification model with optimized parameters to acquire the classification labels and classification prediction probabilities of the creator to be evaluated, and acquiring the first professional degree score of each creator to be evaluated in the creation field;
the second expertise score acquisition module is used for acquiring the affiliated creation fields of all the creators to be evaluated, constructing a link diagram among all the creators to be evaluated, acquiring the similarity between the affiliated creation fields of all the creators to be evaluated and other creators adjacent to the affiliated creation fields in the link diagram, and acquiring the second expertise score of all the creators to be evaluated in the affiliated creation fields based on the similarity and the first expertise score;
the total specialty score obtaining module is used for obtaining the total specialty score of each creator to be evaluated in the affiliated creation domain by fitting the first specialty score and the second specialty score of each creator to be evaluated in the affiliated creation domain.
The embodiment of the invention has the following advantages:
according to the method and the system for evaluating the professional level portrait of the creator to be evaluated, the classification model is used for acquiring the classification label and the classification prediction probability of the creator to be evaluated, the first professional level score of each creator to be evaluated in the affiliated creation field is acquired, the similarity between the affiliated creation field and other creators adjacent to the affiliated creation field in the link diagram is acquired, the second professional level score of each creator to be evaluated in the affiliated creation field is acquired based on the similarity and the first professional level score, finally the first professional level score and the second professional level score of each creator to be evaluated in the affiliated creation field are fitted through the linear logistic regression model, the total professional level score of each creator to be evaluated in the affiliated creation field is acquired, various advantages of the supervised model and the unsupervised model are fully exerted, various user interaction relations can be fused, the comprehensive creator professional level evaluation portrait with higher accuracy and better robustness can be obtained, and the professional level scores of different creators can be distinguished.
Drawings
FIG. 1 is a step diagram of a creator specialty portrait assessment method of the present invention;
FIG. 2 is a step diagram of the present invention for obtaining a first specialty score for each of the evaluated authors in the field of authoring;
FIG. 3 is a schematic diagram of the composition of an author specialty portrait assessment system of the present invention.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
As shown in FIG. 1, the invention provides a method for evaluating professional figures of an creator, which comprises the following steps:
step S1: and collecting creator data corresponding to the plurality of creators respectively, acquiring professional grade grades corresponding to the creators according to the creator data, and taking the professional grade corresponding to the creators as a grading label of the creators to form an creator data set marked with the grading label.
Specifically, professional degree grading standards are formulated through the professional degree evaluation dimension of the creator, a plurality of professional degree grades are determined according to the professional degree grading standards, the creator data is compared with the professional degree grading standards, the professional degree grade corresponding to each creator is obtained, the professional degree grade comprises three grades of 0, 1 and 2, and the professional degree evaluation dimension of the creator comprises: the relationship between the credibility of the content, the refinement of the production, the concentration of the field creation and the number of created works, the professional grading standard and the professional grade is shown in table 1:
TABLE 1 professional level grading Standard and professional level relation Table
Step S2: dividing an creator data set into a training set and a verification set, and training and verifying the classification model through the training set and the verification set respectively to obtain a classification model with optimized parameters;
specifically, the creator dataset is divided into a training set and a verification set, training is carried out by inputting the training set into a classification model, cross verification is carried out by inputting the verification set into the classification model trained by the training set, and when the prediction accuracy and recall rate of the classification model reach expected values, the classification model with optimized parameters is obtained.
Specifically, the parameter-optimized classification model may be further obtained by drawing an ROC curve, wherein, based on different thresholds, true positive rates and false positive rates under different thresholds are calculated, points corresponding to the true positive rates and the false positive rates calculated under different thresholds are drawn in a coordinate system to form an ROC curve, the ROC curve starts from a lower left corner (0, 0), and moves towards an upper right corner with increasing thresholds until the upper right corner (1, 1), the ROC curve is closer to the upper left corner, the better the performance of the classification model is, because at this position, the true positive rate is high and the false positive rate is low, the area under the curve (AUC, area Under the Curve) is calculated, the greater the AUC value is, the better the model performance is, the maximum value is 1, so that the model is optimized by means of adjusting classification model parameters, feature engineering and the like, and then the parameter-optimized classification model may be obtained by comparing the performances of different models according to the ROC curve and the AUC value, wherein, the true positive rate calculation formula is: tpr=tp/(tp+fn), false positive rate calculation formula: fpr=fp/(fp+tn), where TP represents a True case (True posives), FN represents a False negative case (False positive), FP represents a False positive case (False posives), and TN represents a True negative case (True positive).
Step S3: the method comprises the steps of obtaining creator data corresponding to each creator to be evaluated, extracting corresponding characteristics including the creation field, inputting the extracted characteristics into a classification model with optimized parameters, obtaining a grading label and grading prediction probability of the creator to be evaluated, and obtaining a first specialty grade of each creator to be evaluated in the creation field.
Specifically, in step S3, the classification model optimized for the parameters is used as a supervised model to obtain a first professional score of each creator to be evaluated in the authoring domain.
As shown in fig. 2, step S3 includes:
step S31: the method comprises the steps of obtaining creator data corresponding to each creator to be evaluated, extracting basic characteristics of the creator data corresponding to each creator to be evaluated, and obtaining basic characteristics comprising creator basic information and creator content basic information, wherein the creator basic information comprises creator attention numbers, public works numbers, whether large V certificates exist or not and the like, and the creator content basic information comprises works praise, share and collection numbers;
step S32: extracting the extracted basic features again through a trained feature extractor to obtain high-order features including the belonging creation field, the quality features of works, the production liveness of the creators and the like, inputting the high-order features into a classification model with optimized parameters to obtain the grading labels and grading prediction probabilities of the creators to be evaluated, and multiplying the grade of expertise on the grading labels of the creators to be evaluated by the grading prediction probabilities to obtain first grade scores of the creators to be evaluated in the belonging creation field, wherein the high-order features are features with grading distinction.
Step S4: obtaining the affiliated creation fields of all the creators to be evaluated by calculating the verticality of the affiliated field, constructing a link diagram among all the creators to be evaluated by the association relation among all the creators to be evaluated, obtaining the similarity between all the creators to be evaluated and other creators adjacent to the affiliated creation fields in the link diagram, and obtaining the second specialty score of all the creators to be evaluated in the affiliated creation fields based on the similarity and the first specialty score by using a creator calculation formula.
Specifically, in step S4, a second professional degree score of each creator to be evaluated in the authoring domain is obtained through a second professional degree score calculation formula based on the transfer matrix, the second professional degree score calculation formula adopts a creatorryank calculation formula, and the creatorryank calculation formula is used as an unsupervised model to obtain the second professional degree score of each creator to be evaluated in the authoring domain.
Specifically, the obtaining of the belonging authoring fields of each to-be-evaluated author by calculating the belonging verticality includes:
extracting characteristics of the creator data corresponding to each creator to be evaluated, and obtaining works corresponding to each creator to be evaluated and creation content labels of the works within a period of time;
comparing the number of works corresponding to each creator to be evaluated in a period of time with the threshold number of works;
when the number of works corresponding to each creator to be evaluated in a period of time is smaller than or equal to the threshold number of works, obtaining the perpendicularity of the domain by substituting the number of works with the same creation content label into a first domain perpendicularity calculation formula, wherein the first domain perpendicularity calculation formula is as follows: (number of works with same authoring content label/number of works over time) ×sqrt (number of works over time/30);
when the number of works corresponding to each creator to be evaluated is greater than the threshold number of works within a period of time, obtaining the domain verticality by substituting the number of works with the same creation content label into a second domain verticality calculation formula, wherein the second domain verticality calculation formula is as follows: number of works/30 with the same authored content label;
and sequencing the calculated verticality of the domain according to the sequence from big to small, and taking the creation content label corresponding to the verticality of the domain sequenced in the first position as the creation domain of the creator to be evaluated.
Specifically, constructing a link graph between each of the creators to be evaluated by the association relationship between each of the creators to be evaluated includes:
the method comprises the steps of removing invalid attention relations based on behavior signals, establishing direct connection among the creators to be evaluated through the attention relations among the creators to be evaluated, and obtaining link information among the creators to be evaluated, wherein the behavior signals are various interactive behaviors between the creators to be evaluated and other creators to be evaluated, including but not limited to attention, praise, comment, forwarding and the like, sources of the attention behaviors include address book friend relations, recommended attention and the like, and the aim of removing the invalid attention relations based on the behavior signals is to remove long-tail relation networks, accelerate network topology calculation and improve accuracy;
and taking each to-be-evaluated creator as a node, taking the link between two to-be-evaluated creators as a directed edge, and based on the link information between the to-be-evaluated creators, taking a link diagram between the to-be-evaluated creators.
Specifically, the similarity between each creator to be evaluated and other creators adjacent to the creator to be evaluated in the link diagram in the belonging creation field is obtained through a similarity calculation formula, wherein the similarity calculation formula is as follows:
sim t (i,j)=simA t (i,j)×simB t (i, j), wherein sim t (i, j) is the similarity between the to-be-evaluated creator i and the to-be-evaluated creator j in the t-authoring domain, simA t (i, j) is the content similarity between the to-be-evaluated creator i and the to-be-evaluated creator j in the t-authoring domain, simB t (i, j) is the similarity of behavior between the to-be-evaluated creator i and the to-be-evaluated creator j in the t authoring field;
wherein, simA t (i,j)=1-∣DT' it -DT' jt ∣,DT' it To evaluate the work topic relevance of an author i in the field of t-authoring, DT' jt The method comprises the steps of evaluating the correlation degree of the work theme of an creator j in the t creation field;
simB t (i,j)=1-cosine(emb i ,emb j )+1,emb i embling vector for the creator i to be evaluated, emb j For the empadd vector of the creator j to be evaluated, cosine (emb i ,emb j ) The cosine similarity between the ebedding vector of the to-be-evaluated creator i and the ebedding vector of the to-be-evaluated creator j.
Specifically, the obtaining the second professional degree score of each creator to be evaluated in the creation domain based on the similarity and the first professional degree score includes:
obtaining a transfer matrix of each creator to be evaluated in the creation field based on the similarity and the first specialty score;
and calculating a second specialty score of each to-be-evaluated creator in the affiliated creation domain based on the transfer matrix.
Specifically, a second professional degree score of each creator to be evaluated in the affiliated creation domain is obtained through a second professional degree score calculation formula based on a transfer matrix, in the invention, in the process of obtaining the second professional degree score of each creator to be evaluated in the affiliated creation domain, multiple rounds of iterative scores are carried out until reaching an iterative termination condition, the obtained score is a final score, wherein the second professional degree score calculation formula is as follows:
wherein->For the score matrix of all the authors to be evaluated in the field of t-authoring in the n-th round of iterative scoring process +.>For the score matrix of all the authors to be evaluated in the t authoring field in the n-1 th round of iterative scoring process, n is E [0, N]N is the total iterative round number in the iterative scoring process, N is the set maximum iterative round number, wherein +_>In the iterative scoring process, when +.>Or n>When the value is N, iteration is terminated, epsilon is a set termination parameter, P t Transfer matrix in the field of t-authoring for all authors to be evaluated, < >>For the damping coefficient, set to 0.85, E in this embodiment t The professional degree matrix of the authored content labels corresponding to each to-be-evaluated author t authoring field is lower than that of the authored content labels in the news class authoring field, for example, because the life class is not authored as the news class;
wherein the transfer matrix P t The expression of each element is:
,sim t (i, j) is the similarity between the to-be-evaluated creator i and the to-be-evaluated creator j in the t authoring field, author_score (i) is the first specialty score of the to-be-evaluated creator i in the t authoring field, and follow (i) is the total number of directed edges corresponding to the to-be-evaluated creator i in the link graph, wherein the output of the supervised model is used as the input of the unsupervised model, so that the iteration accuracy of unsupervised prediction is improved, and the attention of a high-quality creator is given higher weight.
Step S5: and obtaining the total professional degree score of each to-be-evaluated creator in the affiliated creation domain by fitting the first professional degree score and the second professional degree score of each to-be-evaluated creator in the affiliated creation domain.
Specifically, step S5 includes:
marking content labels on an creator data set marked with grading labels, randomly extracting creator data of different corresponding grading labels according to the content labels from the creator data set to obtain the parilwise data, and manually marking the parilwise data, wherein the parilwise data is a plurality of data pairs formed by the creator data which are randomly extracted, and each data pair comprises creator data respectively corresponding to two creators with the same content label and the same or different grading labels;
and training a linear logistic regression model through the marked paper data, and fitting the first professional degree score and the second professional degree score of each creator to be evaluated in the creation field through the trained linear logistic regression model to obtain the total professional degree score of each creator to be evaluated in the creation field.
Specifically, in the linear logistic regression model training process, a model evaluation mode of a positive and negative sequence pair is introduced to evaluate and iterate the scoring accuracy.
As shown in fig. 3, the present invention provides an creator professional portrayal evaluation system for performing creator professional portrayal evaluation by using the creator professional portrayal evaluation method as described above, comprising:
the data set acquisition module is used for acquiring the creator data corresponding to each of the plurality of creators, acquiring the professional level grade corresponding to each of the creators according to the creator data, and taking the professional level grade corresponding to each of the creators as the grading label of each of the creators to form an creator data set marked with the grading label;
the classification model training module is used for dividing the creator data set into a training set and a verification set, training and verifying the classification model through the training set and the verification set respectively, and obtaining a classification model with optimized parameters;
the first professional degree score acquisition module is used for acquiring the corresponding creator data of each creator to be evaluated, extracting the corresponding characteristics comprising the creation field, inputting the extracted characteristics into the classification model with optimized parameters to acquire the classification labels and classification prediction probabilities of the creator to be evaluated, and acquiring the first professional degree score of each creator to be evaluated in the creation field;
the second expertise score acquisition module is used for acquiring the affiliated creation fields of all the creators to be evaluated, constructing a link diagram among all the creators to be evaluated, acquiring the similarity between the affiliated creation fields of all the creators to be evaluated and other creators adjacent to the affiliated creation fields in the link diagram, and acquiring the second expertise score of all the creators to be evaluated in the affiliated creation fields based on the similarity and the first expertise score;
the total specialty score obtaining module is used for obtaining the total specialty score of each creator to be evaluated in the affiliated creation domain by fitting the first specialty score and the second specialty score of each creator to be evaluated in the affiliated creation domain.
It should be noted that the foregoing detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is intended to include the plural unless the context clearly indicates otherwise. Furthermore, it will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, steps, operations, devices, components, and/or groups thereof.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or otherwise described herein.
Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Spatially relative terms, such as "above … …," "above … …," "upper surface at … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial location relative to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "above" or "over" other devices or structures would then be oriented "below" or "beneath" the other devices or structures. Thus, the exemplary term "above … …" may include both orientations of "above … …" and "below … …". The device may also be positioned in other different ways, such as rotated 90 degrees or at other orientations, and the spatially relative descriptors used herein interpreted accordingly.
In the above detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, like numerals typically identify like components unless context indicates otherwise. The illustrated embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. The creator specialty portrait assessment method is characterized by comprising the following steps:
collecting creator data corresponding to a plurality of creators respectively, acquiring professional grade corresponding to each creator according to the creator data, and taking the professional grade corresponding to each creator as a grading label of each creator to form an creator data set marked with the grading label;
dividing the creator data set into a training set and a verification set, and training and verifying the classification model through the training set and the verification set respectively to obtain a classification model with optimized parameters;
the method comprises the steps of obtaining creator data corresponding to each creator to be evaluated, extracting corresponding characteristics including the creation field, inputting the extracted characteristics into a classification model with optimized parameters, obtaining a grading label and grading prediction probability of the creator to be evaluated, and obtaining a first professional degree score of each creator to be evaluated in the creation field;
obtaining the affiliated creation fields of all the creators to be evaluated by calculating the verticality of the affiliated field, constructing a link diagram among all the creators to be evaluated by the association relation among all the creators to be evaluated, obtaining the similarity between the affiliated creation fields of all the creators to be evaluated and other creators adjacent to the affiliated creation fields in the link diagram, and obtaining the second specialty score of all the creators to be evaluated in the affiliated creation fields based on the similarity and the first specialty score;
the method comprises the steps of obtaining total professional degree scores of all the authors to be evaluated in the authoring field by fitting first professional degree scores and second professional degree scores of all the authors to be evaluated in the authoring field;
when the number of works corresponding to each creator to be evaluated in a period of time is smaller than or equal to the threshold number of works, the perpendicularity of the field is obtained through a first field perpendicularity calculation formula, wherein the first field perpendicularity calculation formula is as follows: (number of works with same authoring content label/number of works over time) ×sqrt (number of works over time/30);
when the number of works corresponding to each creator to be evaluated is larger than the threshold number of works within a period of time, acquiring the domain verticality through a second domain verticality calculation formula, wherein the second domain verticality calculation formula is as follows: number of works/30 with the same authored content label;
the second fitness score calculation formula is:
wherein,for the score matrix of all the authors to be evaluated in the t authoring domain in the n-th round of iterative scoring process,to at the same timeScore matrix of all authors to be evaluated in t authoring field in n-1 th round of iterative scoring process, n is E [1, N]N is the total iterative round number in the iterative scoring process, N is the set maximum iterative round number, and in the iterative scoring process, whenOr n>When the value is N, iteration is terminated, epsilon is a set termination parameter, P t Transfer matrix in the field of t-authoring for all authors to be evaluated, < >>For damping coefficient E t A professional degree matrix of authoring content labels corresponding to each to-be-evaluated author t authoring field;
wherein the transfer matrix P t The expression of each element is:
wherein sim is t (i, j) is the similarity between the to-be-evaluated author i and the to-be-evaluated author j in the t-authoring domain, author_score (i) is the first specialty score of the to-be-evaluated author i in the t-authoring domain, and follow (i) is the total number of directed edges corresponding to the to-be-evaluated author i in the link graph.
2. The creator professional level portrait assessment method according to claim 1, wherein the professional level includes three levels of 0, 1, and 2.
3. The method for evaluating the professional level portrait of the creator according to claim 1, wherein the steps of obtaining the creator data corresponding to each creator to be evaluated, extracting the corresponding characteristics including the domain of creation, inputting the extracted characteristics into a classification model optimized by parameters to obtain the classification label and classification prediction probability of the creator to be evaluated, and obtaining the first professional level score of each creator to be evaluated in the domain of creation include the following steps:
the method comprises the steps of obtaining creator data corresponding to each creator to be evaluated, extracting basic characteristics of the creator data corresponding to each creator to be evaluated, and obtaining basic characteristics comprising creator basic information and creator content basic information;
extracting the extracted basic features again through the trained feature extractor to obtain high-order features including the creation field, and inputting the high-order features into a classification model with optimized parameters to obtain the grading label and grading prediction probability of the creator to be evaluated.
4. The method for evaluating the professional level portrait of the creator according to claim 1, wherein the step of obtaining the affiliated creation areas of the respective creators to be evaluated by calculating the verticality of the affiliated areas comprises the steps of:
extracting characteristics of the creator data corresponding to each creator to be evaluated, and obtaining works corresponding to each creator to be evaluated and creation content labels of the works within a period of time;
comparing the number of works corresponding to each creator to be evaluated in a period of time with the threshold number of works;
when the number of works corresponding to each creator to be evaluated in a period of time is smaller than or equal to the threshold number of works, obtaining the perpendicularity of the field by substituting the number of works with the same creation content label into a first calculation formula of the perpendicularity of the field;
when the number of works corresponding to each creator to be evaluated is larger than the threshold number of works in a period of time, obtaining the perpendicularity of the field by substituting the number of works with the same creation content label into a second calculation formula of the perpendicularity of the field;
and sequencing the calculated verticality of the domain according to the sequence from big to small, and taking the creation content label corresponding to the verticality of the domain sequenced in the first position as the creation domain of the creator to be evaluated.
5. The method for evaluating the professional portraits of the creators according to claim 1, wherein the construction of the link map between the creators to be evaluated by the association relation between the creators to be evaluated comprises the steps of:
removing invalid attention relations based on the behavior signals, establishing direct relations among the creators to be evaluated through the attention relations among the creators to be evaluated, and obtaining link information among the creators to be evaluated;
and taking each to-be-evaluated creator as a node, taking the link between two to-be-evaluated creators as a directed edge, and based on the link information between the to-be-evaluated creators, taking a link diagram between the to-be-evaluated creators.
6. The creator expertise portrait assessment method according to claim 1, wherein similarity between each creator to be assessed in the belonging creation field and other creators adjacent to the creator in the link map is obtained through a similarity calculation formula, wherein the similarity calculation formula is:
sim t (i,j)=simA t (i,j)×simB t (i,j),
wherein sim is t (i, j) is the similarity between the to-be-evaluated creator i and the to-be-evaluated creator j in the t-authoring domain, simA t (i, j) is the content similarity between the to-be-evaluated creator i and the to-be-evaluated creator j in the t-authoring domain, simB t (i, j) is the similarity of behavior between the to-be-evaluated creator i and the to-be-evaluated creator j in the t authoring field;
wherein, simA t (i,j)=1-∣DT' it -DT' jt ∣,DT' it To evaluate the work topic relevance of an author i in the field of t-authoring, DT' jt The method comprises the steps of evaluating the correlation degree of the work theme of an creator j in the t creation field;
simB t (i,j)=1-cosine(emb i ,emb j )+1,emb i embling vector for the creator i to be evaluated, emb j For the empadd vector of the creator j to be evaluated, cosine (emb i ,emb j ) The empdding vector for the creator i to be evaluated and the empdding vector for the creator j to be evaluatedCosine similarity between the quantities.
7. The method for evaluating the professional representation of the creator according to claim 1, wherein the step of obtaining the second professional score of each creator to be evaluated in the field of creation based on the similarity and the first professional score comprises the following steps:
obtaining a transfer matrix of each creator to be evaluated in the creation field based on the similarity and the first specialty score;
and calculating a second specialty score of each to-be-evaluated creator in the affiliated creation domain based on the transfer matrix.
8. The method for evaluating the professional level portrait of the creator according to claim 1, wherein the step of obtaining the total professional level score of each creator to be evaluated in the domain of the creation by fitting the first professional level score and the second professional level score of each creator to be evaluated in the domain of the creation comprises the following steps:
marking content labels on the creator data set marked with the grading labels, randomly extracting creator data of different corresponding grading labels according to the content labels from the creator data set to obtain the parilwise data, and marking the parilwise data;
and training a linear logistic regression model through the marked paper data, and fitting the first professional degree score and the second professional degree score of each creator to be evaluated in the creation field through the trained linear logistic regression model to obtain the total professional degree score of each creator to be evaluated in the creation field.
9. An creator specialty representation evaluation system employing the creator specialty representation evaluation method of any one of claims 1-8, said system comprising:
the data set acquisition module is used for acquiring creator data corresponding to a plurality of creators respectively, acquiring professional grade corresponding to each creator according to the creator data, and taking the professional grade corresponding to each creator as a grading label of each creator to form an creator data set marked with the grading label;
the classification model training module is used for dividing the creator data set into a training set and a verification set, training and verifying the classification model through the training set and the verification set respectively, and obtaining a classification model with optimized parameters;
the first professional degree score acquisition module is used for acquiring the corresponding creator data of each creator to be evaluated, extracting the corresponding characteristics comprising the creation field, inputting the extracted characteristics into the classification model with optimized parameters to acquire the classification labels and classification prediction probabilities of the creator to be evaluated, and acquiring the first professional degree score of each creator to be evaluated in the creation field;
the second expertise score acquisition module is used for acquiring the affiliated creation fields of all the creators to be evaluated, constructing a link diagram among all the creators to be evaluated, acquiring the similarity between the affiliated creation fields of all the creators to be evaluated and other creators adjacent to the affiliated creation fields in the link diagram, and acquiring the second expertise score of all the creators to be evaluated in the affiliated creation fields based on the similarity and the first expertise score;
the total specialty score obtaining module is used for obtaining the total specialty score of each creator to be evaluated in the affiliated creation domain by fitting the first specialty score and the second specialty score of each creator to be evaluated in the affiliated creation domain.
CN202311168195.6A 2023-09-12 2023-09-12 Creator expertise portrait assessment method and system Active CN116910628B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311168195.6A CN116910628B (en) 2023-09-12 2023-09-12 Creator expertise portrait assessment method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311168195.6A CN116910628B (en) 2023-09-12 2023-09-12 Creator expertise portrait assessment method and system

Publications (2)

Publication Number Publication Date
CN116910628A CN116910628A (en) 2023-10-20
CN116910628B true CN116910628B (en) 2024-02-06

Family

ID=88368106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311168195.6A Active CN116910628B (en) 2023-09-12 2023-09-12 Creator expertise portrait assessment method and system

Country Status (1)

Country Link
CN (1) CN116910628B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117909543A (en) * 2024-01-25 2024-04-19 华策影视(北京)有限公司 Drama recommendation method based on authored episode evaluation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104903929A (en) * 2012-11-12 2015-09-09 谷歌公司 Providing content recommendation to users on a site
CN109308315A (en) * 2018-10-19 2019-02-05 南京理工大学 A kind of collaborative recommendation method based on specialist field similarity and incidence relation
CN109670855A (en) * 2018-11-02 2019-04-23 北京奇虎科技有限公司 The methods of marking and device of information flow platform author
CN110737837A (en) * 2019-10-16 2020-01-31 河海大学 Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform
CN112989038A (en) * 2021-02-08 2021-06-18 浙江连信科技有限公司 Sentence-level user portrait generation method and device and storage medium
CN113988621A (en) * 2021-10-27 2022-01-28 掌阅科技股份有限公司 Data processing method, computing device and storage medium for book information producer
CN114565034A (en) * 2022-02-24 2022-05-31 杭州网易云音乐科技有限公司 Creation capability information calculation and training method, device, medium, equipment and platform
WO2023134084A1 (en) * 2022-01-11 2023-07-20 平安科技(深圳)有限公司 Multi-label identification method and apparatus, electronic device, and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104903929A (en) * 2012-11-12 2015-09-09 谷歌公司 Providing content recommendation to users on a site
CN109308315A (en) * 2018-10-19 2019-02-05 南京理工大学 A kind of collaborative recommendation method based on specialist field similarity and incidence relation
CN109670855A (en) * 2018-11-02 2019-04-23 北京奇虎科技有限公司 The methods of marking and device of information flow platform author
CN110737837A (en) * 2019-10-16 2020-01-31 河海大学 Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform
CN112989038A (en) * 2021-02-08 2021-06-18 浙江连信科技有限公司 Sentence-level user portrait generation method and device and storage medium
CN113988621A (en) * 2021-10-27 2022-01-28 掌阅科技股份有限公司 Data processing method, computing device and storage medium for book information producer
WO2023134084A1 (en) * 2022-01-11 2023-07-20 平安科技(深圳)有限公司 Multi-label identification method and apparatus, electronic device, and storage medium
CN114565034A (en) * 2022-02-24 2022-05-31 杭州网易云音乐科技有限公司 Creation capability information calculation and training method, device, medium, equipment and platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
在线学习社区发帖质量评价的回归模型研究;刘金晶;王丽英;;南京师范大学学报(工程技术版)(第01期);全文 *
开放互联网中的学者画像技术综述;袁莎;唐杰;顾晓韬;;计算机研究与发展(第09期);全文 *

Also Published As

Publication number Publication date
CN116910628A (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN110880019B (en) Method for adaptively training target domain classification model through unsupervised domain
CN110457404B (en) Social media account classification method based on complex heterogeneous network
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN106951471B (en) SVM-based label development trend prediction model construction method
CN107341571B (en) Social network user behavior prediction method based on quantitative social influence
CN116910628B (en) Creator expertise portrait assessment method and system
CN108733791B (en) Network event detection method
CN113422761B (en) Malicious social user detection method based on counterstudy
CN109492076B (en) Community question-answer website answer credible evaluation method based on network
CN113239159B (en) Cross-modal retrieval method for video and text based on relational inference network
CN111143704A (en) Online community friend recommendation method and system fusing user influence relationship
CN109783805A (en) A kind of network community user recognition methods and device
CN109597944B (en) Single-classification microblog rumor detection model based on deep belief network
CN112685573A (en) Knowledge graph embedding training method and related device
CN111221915B (en) Online learning resource quality analysis method based on CWK-means
CN116595543A (en) Processing system for developing application data by software based on Internet platform
CN110008975B (en) Social network water army detection method based on immune hazard theory
CN108763400B (en) Object dividing method and device based on object behaviors and theme preferences
CN113792574B (en) Cross-dataset expression recognition method based on metric learning and teacher student model
CN112860983B (en) Method, system, equipment and readable storage medium for pushing learning content
CN107577681A (en) A kind of terrain analysis based on social media picture, recommend method and system
CN114443930A (en) News public opinion intelligent monitoring and analyzing method, system and computer storage medium
CN113657766A (en) Tourist attraction joy index metering method based on tourist multi-metadata
Zhou et al. Topical authority identification in community question answering
CN113704422A (en) Text recommendation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant