CN111324641A - Personnel estimation method and device, computer-readable storage medium and terminal equipment - Google Patents

Personnel estimation method and device, computer-readable storage medium and terminal equipment Download PDF

Info

Publication number
CN111324641A
CN111324641A CN202010101233.6A CN202010101233A CN111324641A CN 111324641 A CN111324641 A CN 111324641A CN 202010101233 A CN202010101233 A CN 202010101233A CN 111324641 A CN111324641 A CN 111324641A
Authority
CN
China
Prior art keywords
specific
feature
person
features
personnel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010101233.6A
Other languages
Chinese (zh)
Other versions
CN111324641B (en
Inventor
刘志煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010101233.6A priority Critical patent/CN111324641B/en
Publication of CN111324641A publication Critical patent/CN111324641A/en
Application granted granted Critical
Publication of CN111324641B publication Critical patent/CN111324641B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a person estimation method, a person estimation device, a computer readable storage medium and terminal equipment, which are applied to the technical field of information processing based on artificial intelligence. The personnel estimation device determines at least one frequent feature sequence according to the occurrence frequency of each feature in the features of a plurality of specific personnel, each frequent feature sequence comprises at least one feature, then classifies the specific personnel according to the frequent feature sequences contained in the features of the specific personnel, respectively calculates the classification probability corresponding to each class of specific personnel, then determines the features and the classification probabilities of the specific personnel as first training samples, and then trains the personnel estimation model. Therefore, all the characteristic combinations which have large influence on the specific personnel are determined, the characteristic correlation among a plurality of specific personnel can be represented, and the characteristic correlation among the specific personnel at different periods can also be represented, so that the finally trained personnel estimation model can accurately estimate the personnel to be estimated.

Description

Personnel estimation method and device, computer-readable storage medium and terminal equipment
Technical Field
The present invention relates to the field of artificial intelligence based information processing technologies, and in particular, to a method and an apparatus for estimating a person, a computer-readable storage medium, and a terminal device.
Background
In the existing application, in many cases, it is necessary to estimate a person to determine whether the person to be estimated is a specific person, for example, whether the person is a sub-healthy person or a person prone to loss, and the estimation of the person prone to loss is widely applied to the fields of human resource system construction, target user hunting, and the like.
The current method for estimating the loss-prone personnel mainly comprises the following steps: the loss probability of the people to be estimated is determined by processing the characteristic information of the people to be estimated through a logistic regression prediction model or a neural network depth model-based model, and the models are obtained by training according to training samples in advance, wherein when the training samples are determined, the characteristic information related to the lost people is often determined through a manual screening mode, the characteristics are mutually independent, and the accuracy of each model obtained through training in the process of estimating the people is not high.
Disclosure of Invention
The embodiment of the invention provides a person estimation method, a person estimation device, a computer readable storage medium and terminal equipment, which realize the classification of specific persons according to frequent feature sequences and the training of a person estimation model.
One aspect of the embodiments of the present invention provides a method for estimating a person, including:
acquiring characteristics corresponding to a plurality of specific persons respectively;
determining at least one frequent feature sequence according to the occurrence frequency of each feature in the features of the specific persons, wherein each frequent feature sequence comprises at least one feature which simultaneously appears in the features of part or all of the specific persons;
classifying the specific personnel according to frequent feature sequences contained in the features of each specific personnel in the specific personnel, and respectively calculating the classification probability corresponding to each type of specific personnel;
taking the characteristics of the specific persons as the characteristics of the persons in the positive sample, and taking the classification probabilities corresponding to the specific persons as the classification probabilities of the persons in the positive sample;
training a person estimation model according to a first training sample, wherein the person estimation model is used for determining the classification probability that a person to be estimated is a specific person, and the first training sample comprises the characteristics of the person in the positive sample and the classification probability thereof.
Another aspect of an embodiment of the present invention provides a person estimation apparatus, including:
the system comprises a characteristic acquisition unit, a characteristic acquisition unit and a characteristic acquisition unit, wherein the characteristic acquisition unit is used for acquiring characteristics corresponding to a plurality of specific personnel;
a frequent feature unit, configured to determine at least one frequent feature sequence according to an occurrence frequency of each feature in the features of the multiple specific people, where each frequent feature sequence includes at least one feature that occurs simultaneously in the features of some or all of the multiple specific people;
the probability unit is used for classifying the specific personnel according to the frequent feature sequences contained in the features of each specific personnel in the specific personnel and respectively calculating the classification probability corresponding to each type of specific personnel;
the sample unit is used for taking the characteristics of the specific persons as the characteristics of the persons who are sampled positively and taking the classification probabilities corresponding to the specific persons as the classification probabilities of the persons who are sampled positively;
the training unit is used for training a person estimation model according to a first training sample, the person estimation model is used for determining the classification probability that a person to be estimated is a specific person, and the first training sample comprises the characteristics of the person in the positive sample and the classification probability thereof.
In another aspect, an embodiment of the present invention further provides a computer-readable storage medium, which stores a plurality of computer programs, and the computer programs are suitable for being loaded by a processor and executing the people estimation method according to the embodiment of the present invention.
In another aspect, an embodiment of the present invention further provides a terminal device, including a processor and a memory;
the memory is used for storing a plurality of computer programs, and the computer programs are used for being loaded by the processor and executing the personnel estimation method according to the embodiment of the invention; the processor is configured to implement each of the plurality of computer programs.
It can be seen that, in the method of this embodiment, the person estimation apparatus determines at least one frequent feature sequence according to the occurrence frequency of each feature in the features of a plurality of specific persons, and each frequent feature sequence includes at least one feature, so that one or more groups of features that affect the person to be estimated to become a specific person can be determined; and classifying the specific personnel according to the frequent feature sequences contained in the features of the specific personnel, respectively calculating the classification probability corresponding to each class of specific personnel, determining the features and the classification probabilities of the specific personnel as first training samples, and further training the personnel estimation model. Therefore, in the process of training the personnel estimation model, the characteristics of the specific personnel are analyzed to determine all characteristic combinations which have larger influence on the specific personnel, and the characteristic combinations can represent the characteristic correlation among a plurality of specific personnel and can also represent the characteristic correlation among the specific personnel at different periods, so that the finally trained personnel estimation model can accurately estimate the personnel to be estimated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of a method for estimating a person according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for estimating a person provided by an embodiment of the invention;
FIG. 3 is a flow diagram of a method for determining a respective frequent feature sequence in accordance with an embodiment of the present invention;
FIG. 4 is a flow diagram of a method for determining a second frequent feature sequence, in accordance with an embodiment of the present invention;
FIG. 5 is a flow diagram of a method for training a person estimation model in accordance with an embodiment of the present invention;
FIG. 6 is a flow chart of a method for estimating a person according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a person estimation initial model determined in an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a person estimation device according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention provides a personnel estimation method, which is mainly characterized in that a personnel estimation device firstly determines a training sample to train a personnel estimation model, and the personnel estimation model determines the probability of a person to be estimated belonging to a specific person, as shown in figure 1, the personnel estimation method specifically comprises the following steps:
acquiring characteristics corresponding to a plurality of specific persons respectively; determining at least one frequent feature sequence (illustrated by m frequent feature sequences in fig. 1) according to the occurrence frequency of each feature in the features of the specific persons, wherein each frequent feature sequence comprises at least one feature; classifying the specific personnel according to frequent feature sequences contained in the features of each specific personnel in the specific personnel, and respectively calculating the classification probability corresponding to each type of specific personnel; taking the characteristics of the specific persons as the characteristics of the persons in the positive sample, and taking the classification probabilities corresponding to the specific persons as the classification probabilities of the persons in the positive sample; training a person estimation model according to a first training sample, wherein the person estimation model is used for determining the classification probability that a person to be estimated is a specific person, and the first training sample comprises the characteristics of the person in the positive sample and the classification probability thereof.
The above-mentioned human estimation model is a machine learning model based on Artificial Intelligence (AI), which is a theory, method, technique and application system that simulates, extends and expands human Intelligence, senses environment, acquires knowledge and uses knowledge to obtain the best result by using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how the computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
Therefore, in the process of training the personnel estimation model, the characteristics of the specific personnel are analyzed to determine all characteristic combinations which have larger influence on the specific personnel, and the characteristic combinations can represent the characteristic correlation among a plurality of specific personnel and can also represent the characteristic correlation among the specific personnel at different periods, so that the finally trained personnel estimation model can accurately estimate the personnel to be estimated.
An embodiment of the present invention provides a method for estimating a person, which is a method executed by a person estimation apparatus, and a flowchart is shown in fig. 2, where the method includes:
step 101, obtaining characteristics corresponding to a plurality of specific persons respectively.
It is understood that the user may operate through the people estimation device, so that the people estimation device initiates the process of the present embodiment, wherein the specific person is a person belonging to a specific type, such as a lost person of a company or a unit, or a sub-health person, and the characteristics of the specific person may include, but are not limited to, the following attributes: gender, position, age, address, information of purchased goods, education status, business trip, working age, work satisfaction, health status, marital status, etc.
Further, after step 101 is performed, the features of a plurality of specific persons may be encoded to obtain encoded features, and the following step 102 is continuously performed according to the encoded features; alternatively, the features of a plurality of specific persons may be preprocessed, and then the preprocessed features are encoded to obtain encoded features, and then the following step 102 is continuously performed according to the encoded features. Among these, the preprocessing of the features of a plurality of specific persons may include, but is not limited to, the following: removing the characteristics of abnormal values and the like.
Specifically, when the characteristics of a specific person are coded, if the characteristics of the specific person are discrete characteristics, type coding is carried out on the characteristics of the specific person, wherein one type of the characteristics corresponds to one code; and if the characteristic of the specific person is a continuous characteristic, performing range coding on the characteristic of the specific person, wherein one range interval of the characteristic value corresponds to one code. The discrete features may include the position, sex and name of the goods purchased by a specific person, and the continuous features may include the age and working life.
And step 102, determining at least one frequent feature sequence according to the occurrence frequency of each feature in the features of the specific persons, wherein each frequent feature sequence can comprise at least one feature which simultaneously appears in the features of some or all of the specific persons, and the frequent feature sequence can represent features which frequently appear among the specific persons.
Specifically, the person estimation apparatus may dig out at least one frequent feature sequence based on an algorithm such as a sequence pattern, for example, a pre-fixed span (prefix span) algorithm, so that the common features of the plurality of specific persons obtained in step 101 described above may be obtained.
When determining at least one frequent feature sequence based on a sequence pattern algorithm, the method may specifically be implemented by the following steps, and a flowchart is shown in fig. 3, including:
A. the features of a plurality of specific persons are determined as a plurality of feature sequences respectively.
B. And determining a plurality of characteristic sequence prefixes with the length of 1 corresponding to the characteristic sequences, wherein the characteristic sequence prefixes refer to a subsequence of the characteristic sequences and have the length of 1.
C. And removing the prefix of the characteristic sequence with the occurrence frequency not more than the preset minimum support rate, and taking the prefix of the residual characteristic sequence as a first frequent characteristic sequence.
Here, the preset minimum support rate may be dynamically changed or may be fixed, and the frequency of occurrence of any signature sequence prefix is specifically the quotient of the number of signature sequences in which the signature sequence prefix occurs and the total number of signature sequences, for example, if there are x signature sequences in which the signature sequence prefix occurs and there are y signature sequences in total, the frequency of occurrence of the signature sequence prefix is x/y.
Because a plurality of feature sequence prefixes can be obtained in the step B, after the step C, some feature sequence prefixes can be removed, the remaining feature sequence prefixes are first frequent feature sequences, and each first frequent feature sequence includes a feature. For example, the feature sequence prefix includes < feature 1>, < feature 2>, and < feature 3>, and if < feature 3> is removed, the first frequent feature sequences can be obtained as < feature 1> and < feature 2 >.
D. And determining the projection suffix characteristics of the characteristic sequences corresponding to the prefixes of the residual characteristic sequences, wherein the projection suffix characteristic corresponding to one characteristic sequence is the characteristic after the prefix of the characteristic sequence in the characteristic sequence.
For example, the feature sequence 1 includes < feature 1, 2, 3, 4, 5>, and the feature sequence 2 includes < feature 3, 4>, then the feature sequence prefix may include 5, which are < feature 1>, < feature 2>, < feature 3>, < feature 4> and < feature 5>, respectively, where when the feature 3 is used as the feature sequence prefix, the projection suffix feature corresponding to the feature sequence 1 is < feature 4, feature 5>, and the projection suffix feature corresponding to the feature sequence 2 is < feature 4 >; when the feature 1 is used as a feature sequence prefix, the projection suffix feature corresponding to the feature sequence 1 is < feature 2, 3, 4, 5>, and the projection suffix feature corresponding to the feature sequence 2 is null.
E. And determining at least one second frequent feature sequence according to the prefix and the projection suffix features of the remaining feature sequences, wherein each second frequent feature sequence is composed of at least one feature of the prefix and the projection suffix features of the remaining feature sequences, and the determined frequent feature sequences comprise the first frequent feature sequence and the second frequent feature sequence.
Specifically, as shown in fig. 4, the person estimation apparatus may determine the second frequent feature sequence by cycling through the steps including:
e1, determining a single feature having a frequency of occurrence greater than a preset minimum support rate among the projected suffix features when the projected suffix features of the feature sequence are non-empty. Here, the frequency of occurrence of any single feature is specifically a quotient of the number of groups of the projection suffix features where the feature sequence prefix appears and the number of groups of all projection suffix features, for example, if there are a projection suffix features where a single feature appears and there are b projection suffix features in total, the frequency of occurrence of the single feature is a/b.
And E2, combining the characteristic sequence prefix with the single characteristic to form a new characteristic sequence prefix.
For example, the feature sequence 1 includes < features 1, 2, 3, 4, 5>, and the feature sequence 2 includes < features 3, 4>, when the feature 3 is used as a feature sequence prefix, the projection suffix feature corresponding to the feature sequence 1 includes < features 4, 5>, and the projection suffix feature corresponding to the feature sequence 2 includes < features 4>, wherein the occurrence frequency of a single feature 4 is greater than a preset value, the features 3 and 4 are combined into a new feature sequence prefix of < features 3, 4>, and the new projection suffix feature of the feature sequence 1 corresponding to the new feature sequence prefix is < features 5 >.
E3, determining new projection suffix characteristics of each characteristic sequence corresponding to the new characteristic sequence prefixes, and returning to execute the step E1 aiming at the new characteristic sequence prefixes and the new projection suffix characteristics, namely executing the steps of determining single characteristics, combining and determining new projection suffix characteristics.
In this way, by performing the steps E1 to E3 in a loop, the finally obtained second frequent feature sequence includes the new feature sequence prefix, and during the loop, the preset minimum support rate may be continuously changed, specifically, the minimum support rate may be changed according to the number of groups of all the projection suffix features used in calculating the occurrence frequency of a single feature, for example, if the number of groups of all the projection suffix features is large, the minimum support rate may be set to be large, and the like, thereby ensuring the accuracy of the finally obtained multiple frequent feature sequences.
103, classifying the specific personnel according to the frequent feature sequences contained in the features of each specific personnel in the specific personnel, and respectively calculating the classification probability corresponding to each specific personnel, wherein the classification probabilities can be mainly divided into the following two cases:
(1) the length of the longest frequent feature sequence is not limited in classifying a particular person.
The person estimation means may first determine the longest frequent feature sequence included in the features of any one specific person; then, dividing the specific personnel with the same longest frequent characteristic sequence into the same specific personnel; and finally, determining the classification probability corresponding to any specific personnel according to the information of the longest frequent feature sequence. That is, a specific person having the longest frequent feature sequence among the features is classified into one class, and the classification probability thereof can be obtained.
When the classification probability corresponding to any specific person is determined according to the information of the longest frequent feature sequence, the occurrence frequency of the frequent feature sequence with the longest classification probability corresponding to any specific person can be determined.
Or determining the classification probability corresponding to any type of specific person as a function calculation value of the occurrence frequency of each feature in the longest frequent feature sequence. Further, in this case, if the frequency of occurrence of a certain feature in the longest frequent feature sequence in the features of the plurality of specific people acquired in the above step 101 is less than the preset minimum support rate, the features may be removed first, and then the classification probability of the specific people may be calculated for the remaining features in the longest frequent feature sequence.
Specifically, when the longest frequent feature sequence is the first frequent feature sequence, because the first frequent feature sequence is a feature sequence prefix with a length of 1, the frequency of occurrence of the longest frequent feature sequence is the frequency of occurrence of the determined feature sequence prefix with a length of 1; when the longest frequent feature sequence is the second frequent feature sequence, and the second frequent feature sequence is obtained by combining the first frequent feature sequence and the single feature in the projection suffix feature, the occurrence frequency of the second frequent feature sequence is specifically: the product of the frequency of occurrence of the first frequent feature sequence and the frequency of occurrence of the individual features in the merged new feature sequence prefix.
For example, the frequent feature sequences determined in step 102 include < a >, < ab >, < abc >, < abcf > and < abcfgh >, and the feature of a specific person includes < abcfijk >, so that the longest frequent feature sequence included in the feature of the specific person is < abcf >, and the classification probability of the specific person is the average value of the occurrence frequencies corresponding to the abcfs in the features; or the frequency of occurrence of the longest frequent signature sequence.
(2) In the classification process, the length of the longest frequent feature sequence is limited.
Specifically, the person estimation apparatus may determine the longest frequent feature sequence included in the features of any specific person, and if the length of the longest frequent feature sequence is greater than a preset value, the specific person may be classified and the corresponding classification probability may be calculated according to the method in (1), which is not described herein again.
If the length of the longest frequent feature sequence is not greater than the preset value, the person estimation device calculates the classification probability of the specific person having the longest frequent feature sequence, and then classifies the specific person according to the interval of the classification probability.
When calculating the classification probability of the specific person with the longest frequent feature sequence, the person estimation device specifically determines the occurrence frequency of the frequent feature sequence with the longest probability weight of a first class of features in the features of any specific person with the longest frequent feature sequence, and determines the probability weight of a second class of features in the features of any specific person as a preset minimum support rate, wherein the first class of features is the features in the longest frequent feature sequence, and the second class of features is the features of any specific person except the first class of features; and calculating the classification probability corresponding to any specific person according to the probability weights respectively corresponding to the first class feature and the second class feature, wherein for example, the average value of the probability weights respectively corresponding to the first class feature and the second class feature is used as the corresponding classification probability.
Further, the frequency of occurrence of a certain feature in the second class of features in the features of the plurality of specific people acquired in step 101 is less than the preset minimum support rate, and these features may be removed first, and are not considered when calculating the classification probability corresponding to any specific person.
For example, the frequent feature sequences determined in step 102 include < a >, < ab >, < abc >, < abcf > and < abcfgh >, and the feature of a specific person includes < abcfijk >, so that the longest frequent feature sequence included in the feature of the specific person is < abcf >, the probability weights corresponding to the features (i.e., the first-class features) of the features of the specific person are the occurrence frequencies corresponding to the longest frequent feature sequence < abcf >, and the probability weights corresponding to the features (i.e., the second-class features) of the specific person are the preset minimum support rate.
And 104, taking the characteristics of the specific persons as the characteristics of the persons in the positive sample, and taking the classification probability corresponding to each specific person as the classification probability of the persons in the positive sample.
105, training a person estimation model according to a first training sample, wherein the person estimation model is used for determining the classification probability that the person to be estimated is a specific person, and the first training sample comprises the characteristics of the person in the positive sample and the classification probability thereof determined in the step 104.
The first training sample is a positive sample, and further, the person estimation device can determine a second training sample which comprises the characteristics and the classification probability of the person with the negative sample, namely the negative sample, and then train the person estimation model according to the first training sample and the second training sample.
Further, after the person estimation device trains the person estimation model, the operation logic of the person estimation model can be stored in the person estimation device, so that when the person to be estimated is estimated later, the person estimation device can extract the feature vector of the feature of the person to be estimated through the person estimation model, and then output the classification probability that the person to be estimated belongs to the specific person according to the feature vector.
In another case, after the person estimation model is trained, the person estimation model includes a feature extraction module and an estimation module, and only the operation logic of the feature extraction module may be stored in the person estimation device, and the feature vector and the classification probability of each specific person may also be stored in the person estimation device, and the feature vector and the classification probability of non-specific persons may also be stored. Therefore, when the person to be estimated is estimated later, the feature vectors of the features of the person to be estimated can be extracted through the feature extraction module, and then the similarity between the feature vectors of the person to be estimated and the preset feature vectors of each specific person and each non-specific person is calculated; and then determining a plurality of feature vectors with the similarity greater than a specific value, and calculating the classification probability of the person to be estimated belonging to the specific person according to the classification probabilities respectively corresponding to the plurality of feature vectors stored in the person estimation device, specifically, taking the average value of the classification probabilities respectively corresponding to the plurality of feature vectors as the classification probability of the person to be estimated belonging to the specific person.
The characteristic vector of each specific person and the characteristic vector of the unspecific person can be directly preset in the person estimation device, so that the person to be estimated can be conveniently estimated later. In another case, the features of each specific person and the features of non-specific persons may be stored in the person estimation device, and in the process of estimating the person to be estimated later, not only the feature vector of the person to be estimated but also the feature vectors of the features of each specific person and non-specific persons need to be extracted through a preset feature extraction module to calculate the similarity.
It should be noted that, as the number of specific people increases, the person estimation model may be adjusted and trained continuously according to the increased features of the specific people, so as to obtain a person estimation model with higher estimation accuracy.
It can be seen that, in the method of this embodiment, the person estimation apparatus determines at least one frequent feature sequence according to the occurrence frequency of each feature in the features of a plurality of specific persons, and each frequent feature sequence includes at least one feature, so that one or more groups of features that affect the person to be estimated to become a specific person can be determined; and classifying the specific personnel according to the frequent feature sequences contained in the features of the specific personnel, respectively calculating the classification probability corresponding to each class of specific personnel, determining the features and the classification probabilities of the specific personnel as first training samples, and further training the personnel estimation model. Therefore, in the process of training the personnel estimation model, the characteristics of the specific personnel are analyzed to determine all characteristic combinations which have larger influence on the specific personnel, and the characteristic combinations can represent the characteristic correlation among a plurality of specific personnel and can also represent the characteristic correlation among the specific personnel at different periods, so that the finally trained personnel estimation model can accurately estimate the personnel to be estimated.
In a specific embodiment, when the human estimation apparatus performs the step 105, the human estimation model may be specifically trained through the following steps, and a flowchart is shown in fig. 5, and includes:
in step 201, an initial model of a person estimation is determined.
It is to be understood that, when determining the initial human estimation model, the human estimation device determines the initial values of the parameters in the multilayer structure and each layer mechanism included in the initial human estimation model.
Specifically, the person estimation initial model may include: the device comprises a feature extraction module and an estimation module, wherein the feature extraction module is used for extracting feature vectors of features of a person to be estimated, the estimation module is used for determining the classification probability of the person to be estimated belonging to a specific person according to the feature vectors extracted by the feature extraction module, and if the classification probability output by the estimation module is greater than a certain threshold value, the person to be estimated belongs to the specific person. Specifically, the feature extraction module may be a directional Long Short-Term Memory (BI-LSTM) structure, and the estimation module may be a classification structure such as am-softmax.
The parameters of the initial model of the personnel estimation refer to fixed parameters used in the calculation process of each layer structure in the initial model of the personnel estimation, and the parameters do not need to be assigned at any time, such as parameters of parameter scale, network layer number, user vector length and the like.
Step 202, determining the classification probability of each positive sample person belonging to a specific person respectively through the characteristics of each positive sample person in the person estimation initial model and the first training sample.
Specifically, a feature extraction module in the person estimation initial model extracts feature vectors of features of each person in the positive sample, and then an estimation module determines the classification probability of each person in the positive sample belonging to a specific person according to the feature vectors extracted by the feature extraction module.
Further, the person estimation apparatus may further determine a second training sample, where the second training sample may include features of the negative sample persons and classification probabilities of the negative sample persons belonging to the specific persons, and thus, the person estimation initial model further needs to determine the classification probabilities of the negative sample persons belonging to the specific persons according to the features of the negative sample persons.
And step 203, adjusting the personnel estimation initial model according to the classification probability of each positive sample personnel obtained by the personnel estimation initial model and the classification probability of each positive sample personnel in the first training sample to obtain a final personnel estimation model.
Specifically, the person estimation apparatus calculates a loss function related to the person estimation initial model according to the result obtained by estimating the initial model by the person in the step 202 and the classification probability in the first training sample, where the loss function is used to indicate the classification probability that each person in the positive sample estimated by the person estimation initial model belongs to a specific person and an error between the actual classification probability (according to the classification probability in the first training sample) of each person in the positive sample, such as a cross entropy loss function.
The training process of the personnel estimation model is to reduce the error value as much as possible, and the training process is to continuously optimize the parameter values of the parameters in the personnel estimation initial model determined in the step 201 by a series of mathematical optimization means such as back propagation derivation and gradient descent, and to minimize the calculated value of the loss function.
It should be noted that, in the above step 203, the parameter value in the initial model estimated by the person estimation is adjusted once according to the classification probability of each positive sample person estimated by the person estimation initial model belonging to a specific person, and in practical applications, the above step 203 needs to be executed continuously and circularly until the adjustment of the parameter value meets a certain stop condition.
Therefore, after executing steps 201 to 203 of the above embodiment, the people estimation apparatus further needs to determine whether the current adjustment on the parameter value meets a preset stop condition, and when the current adjustment on the parameter value meets the preset stop condition, the flow is ended; if not, the initial model is estimated for the person after adjusting the parameter value and the initial model is estimated for the person after adjusting the parameter value, and the step 203 is executed. Wherein the preset stop condition includes but is not limited to any one of the following conditions: the difference value between the current adjusted parameter value and the last adjusted parameter value is smaller than a threshold value, namely the adjusted parameter value reaches convergence; and the adjustment times of the parameter values are equal to the preset times, and the like.
Further, as new specific persons are continuously present over time, the person estimation device can acquire the characteristics of a plurality of new specific persons and continuously update the person estimation model, so that the person estimation model is more and more accurate. Specifically, the following two cases can be classified:
(1) when the number of new specific persons meets the preset condition, the above steps of determining the frequent feature sequence, calculating the classification probability and training the person estimation model are performed for the features of the new specific persons and the features of the existing specific persons (i.e. the features obtained in step 101), i.e. the steps 102 to 105 are returned to, i.e. the first training sample is obtained again, and the person estimation model is trained again.
The number of new specific persons satisfying the preset condition herein specifically means: the number of new specific persons is greater than a certain threshold, or the ratio of the number of new specific persons to the number of specific persons involved in the feature acquired in the above step 101 is greater than a certain threshold, and so on.
(2) When the number of the new specific personnel does not meet the preset condition, respectively dividing the new specific personnel into the existing specific personnel categories according to the longest frequent feature sequences respectively contained in the features of the new specific personnel to obtain the newly classified multiple specific personnel; then, a first training sample is determined according to the feature and the classification probability of the specific person after the latest classification, and a step of training the person estimation model according to the first training sample, i.e., the above step 105, is performed.
The fact that the number of new specific persons does not satisfy the preset condition here specifically means that: the number of new specific persons is not greater than a certain threshold, or the ratio of the number of new specific persons to the number of specific persons involved in the above-mentioned feature obtained in step 101 is not greater than a certain threshold, and so on.
For example, through the above steps 101 to 103, 5 categories of specific people, that is, categories 1 to 5, corresponding to frequent feature sequences 1 to 5, respectively, may be obtained, when features corresponding to 2 new specific people, respectively, are obtained, and a frequent feature sequence 3 is included in the features of the new specific people 1, the new specific people 1 is classified into the category 3, and a frequent feature sequence 5 is included in the features of the new specific people 2, the new specific people 2 is classified into the category 5.
In the following, a specific application example is used to illustrate the staff estimation method of the present invention, in this embodiment, a specific staff is a lost staff of a company or a unit, and as shown in fig. 6, the staff estimation device can perform the staff estimation according to the following steps:
step 301, obtaining characteristics corresponding to a plurality of lost people respectively.
For the human resource system of a company or a unit, the lost personnel are the persons who are in the company or the unit at one time, so that the related information and the behavior state of the lost personnel are relatively easy to obtain. The people estimation device can lose people in this dimension, and obtain the following characteristics including but not limited to the lost people:
position, age, gender, distance between a company or unit and home (map distance, whether there is a shift car spot), educational status (academic calendar), overtime (time to punch a card on work or off work), business trip, working age, total working age, marital status (whether married or not, whether there is a child), performance, historical superior ratings, consideration (such as income, growth, equity, or other incentives), number of attendees, number of attended company organizational activities, number of past jobs, age of historical company or unit, job level (promotion, time to promote from the previous time), job satisfaction (including corporate culture, business culture), matching of an attended delivery resume to a present job, and the like.
These characteristics can be obtained from employee information of the human resources system or collected in the form of questionnaires.
Step 302, preprocessing the acquired characteristics of the plurality of lost people, specifically, but not limited to, the following processing may be performed:
(1) features with excessive missing values are discarded.
And setting a missing value filtering threshold value as the product of the number of lost personnel and X, wherein X is a number between 0 and 1, such as 0.4, and if the missing amount of the characteristic value of a certain characteristic exceeds the missing value filtering threshold value, filtering the characteristic and deleting the single-value characteristic. For example, for the feature of position, the position of 4 out of 5 lost people is unknown, and the feature of position can be filtered.
(2) Outlier processing is performed, specifically, feature values that are too large or too small in the features are discarded according to the feature distribution. For example, for age, if an missing person is an outlier before 15 years of age, the missing person's age value is filtered.
(3) Missing value processing, filling with a mean value for continuous features and constant for discrete features as separate types.
(4) And (4) coding, specifically, performing box discretization on the continuity characteristics, further performing range coding, and adopting type coding on the discrete characteristics. The box-dividing discretization refers to dividing according to distribution proportion of characteristic values of lost personnel in each value interval of any characteristic.
For example, the positions (discrete features) of the lost persons are type-coded as shown in the following table 1-1:
Figure BDA0002386933050000141
TABLE 1-1
The age (continuance profile) of the lost person is encoded as shown in tables 1-2 below:
age box Age coding
18-25 Age a
26-30 Age b
31-35 Age c
36-40 Age d
41-45 Age e
46-50 Age f
51-60 Age g
60-65 Age h
Tables 1 to 2
The distance between the lost person's home and the company or organization (continuity feature) is encoded as shown in tables 1-3 below:
physical distance binning Encoding
Within 1 km Distance interval a
Within 1-2 km Distance interval b
Within 2-5 km Distance interval c
5 km to 10 km Distance interval d
10 km-15 km Distance interval e
Over 15 km Distance interval f
Tables 1 to 3
Whether the shift-on-duty feature (discrete feature) corresponding to the lost person is encoded is as shown in the following tables 1 to 4:
Figure BDA0002386933050000151
Figure BDA0002386933050000161
tables 1 to 4
It can be seen that, after the features of each lost person are coded by the preprocessing, a matrix with the feature code value of each lost person as a row and the feature code value of each feature as a column can be formed. The magnitude dimension is a matrix of columns. And the lost personnel will be expanded continuously with the lapse of time, then
A lost personnel sample base is constructed by utilizing historical off-duty personnel samples, the sample base can be continuously expanded along with the time lapse and personnel flow, the system training samples are increased, and the model accuracy can be continuously improved until a critical point with higher accuracy is reached.
And step 303, according to the characteristics of the loss personnel after pretreatment, taking the characteristics of each loss personnel as a characteristic sequence, and determining a plurality of frequent characteristic sequences with high occurrence frequency.
Specifically, the people estimation apparatus may use a prefix span algorithm to mine a first frequent feature sequence and a second frequent feature sequence of the lost people, and the specific mining method is the flow shown in fig. 3 and 4, which is not described herein again.
It should be noted that, in the loop operation in this embodiment, the preset minimum support rate may be changed continuously, specifically, the change may be performed according to the number of the lost people, so that a "rolling snowball" method is adopted, a higher minimum support rate is set in each loop process, accuracy of frequent feature sequence mining is ensured, and the recall ratio may be improved through multiple loops.
For example, two attritors are characterized as shown in Table 2-1 below, each attritor being characterized as a signature sequence:
Figure BDA0002386933050000162
TABLE 2-1
The frequency of occurrence of a determined signature sequence prefix of length 1 is shown in table 2-2 below:
Figure BDA0002386933050000163
Figure BDA0002386933050000171
tables 2 to 2
Assuming that the minimum support rate is 0.5, the minimum support rate is 2 × 0.5 — 1 of the number of lost people, and prefix of the feature sequences with the occurrence frequency greater than 0.5 is removed to obtain first frequent feature sequences, where the occurrence frequency is greater than 0.5, that is, the occurrence frequency is greater than the preset minimum support rate, that is, 1, the obtained first frequent feature sequences are distance intervals e, age intervals c, and performance intervals c, respectively, and projection suffix features of the feature sequences corresponding to the first frequent feature sequences are shown in the following tables 2 to 3:
Figure BDA0002386933050000172
tables 2 to 3
Counting each single feature with the frequency greater than the minimum support degree 1 in the projection suffix features corresponding to each first frequent feature sequence, combining the first frequent feature sequences with the corresponding single features respectively to form new feature sequence prefixes, namely second frequent feature sequences, which are respectively as follows: determining a new projection suffix characteristic of each feature sequence corresponding to the new feature sequence prefix, wherein the new projection suffix characteristic is defined by < distance interval e, age limit c >, < distance interval e, performance c > and < age limit c, performance c >, and is shown in the following tables 2-4:
Figure BDA0002386933050000173
Figure BDA0002386933050000181
tables 2 to 4
Continuously counting the single features with the frequency greater than the minimum support degree 1 in the projection suffix features corresponding to the second frequent feature sequences, combining the second frequent feature sequences with the corresponding single features respectively to form new feature sequence prefixes, namely, the distance interval e, the age limit c and the performance c, and determining the new projection suffix features of the feature sequences corresponding to the new feature sequence prefixes, as shown in the following tables 2 to 5:
new signature sequence prefix Projected suffix feature
Distance interval e, age c, Performance c
Tables 2 to 5
The method can determine that the obtained frequent characteristic sequences are respectively as follows: < distance interval e >, < age c >, < performance c >, < distance interval e, age c >, < distance interval e, performance c >, < age c, performance c > and < distance interval e, age c, performance c >.
Step 304, classifying the lost people according to the frequent feature sequences included in the features of each lost person, and calculating the classification probability of each class of lost people respectively, where the specific method is described in the above embodiments and is not described herein again.
For example, if the feature of an attrition person includes a frequent feature sequence as shown in table 3-1 below, the classification probability of a class of attrition persons having these features at the same time in the feature is: (0.6+0.75+0.9+0.88+0.58)/5 ═ 0.742:
age d Overtime time e Performance c Marital status a Distance interval e
0.6 0.75 0.9 0.88 0.58
TABLE 3-1
If the frequency of occurrence of a certain feature in the frequency feature sequence in the features obtained in step 301 is less than a preset minimum support rate (for example, 0.5), the feature may be removed first, and the classification probability of the corresponding class of lost people is calculated according to the remaining features in the frequent feature sequence.
The final personnel estimation device can obtain information of various lost personnel, including information of the lost personnel, contained frequent feature sequences and classification probabilities thereof, as shown in the following table 3-2:
frequent feature sequences Personnel loss Probability of classification
bcagh People lost 1 0.76
bcagh People who run away 2 0.76
bcagh People who run away 3 0.76
bcagh People lost 4 0.76
AaB People who run away 5 0.85
AaB People who run away 6 0.85
AaB People lost 7 0.85
acdhg People lost 8 0.92
acdhg People who run away 9 0.92
Number of lost persons n p
TABLE 3-2
Further, the classification probability of the lost person is subjected to binning processing, for example, binning the classification probability with 0.1 as an interval, that is, [0,0.1), [0.1,0.2), [0.2,0.3), [0.3,0.4), [0.4,0.5), [0.5,0.6), [0.6,0.7), [0.7,0.8), [0.8,0.9), [0.9,1], and the obtained categories may be as shown in the following table 3 to 3:
categories Frequent feature sequences Personnel loss Probability of classification Probability interval
7 bcagh People lost 1 0.76 [0.7,0.8)
7 bcagh People who run away 2 0.76 [0.7,0.8)
7 bcagh People who run away 3 0.76 [0.7,0.8)
7 bcagh People lost 4 0.76 [0.7,0.8)
8 AaB People who run away 5 0.85 [0.8,0.9)
8 AaB People who run away 6 0.85 [0.8,0.9)
8 AaB People lost 7 0.85 [0.8,0.9)
9 acdhg People lost 8 0.92 [0.9,1]
9 acdhg People who run away 9 0.92 [0.9,1]
Number of lost persons n Loss probability p
Tables 3 to 3
Step 305, determining a first training sample, specifically, using the features of the various types of the lost people determined in the above step as the features of the positive sample people, using the classification probabilities of the various types of the lost people as the classification probabilities of the positive sample people belonging to the lost people, and using the interval of the classification probabilities of the various types of the lost people as the probability interval of one type of the lost people.
Step 306, training the human estimation model according to the first training sample and the process shown in fig. 5.
In the process of training the personnel estimation model, it can be determined that the personnel estimation initial model comprises a feature extraction module and an estimation module, wherein the feature extraction module can adopt a structure such as BI-LSTM, the estimation module adopts a structure such as am-softmax, as shown in FIG. 7, each feature of each positive sample personnel is embedded into a corresponding LSTM structure, a plurality of feature vectors can be output, the feature vectors can be spliced after passing through a concat layer and a full connected layers (Fc), and finally the classification probability that each positive sample personnel belongs to lost personnel can be output after the spliced feature vectors pass through the am-softmax structure.
Thus, the feature extraction module can extract the feature vector of the feature of each positive sample person by using the following formula 1, wherein x is the feature of the positive sample person, and y is the extracted feature vector; the estimation module may determine the classification probability that each positive sample person belongs to the lost person according to equation 2 and the feature vector of each positive sample person, where W is a specific person category set, i.e., W ═ (c)1,c2,…,cn) P is the classification probability of the lost person, i.e. p ═am-softmax(<y,c1>,<y,c2>,...,<y,cn>):
y=Bi-LSTM(x) (1)
p=am-softmax(yW) (2)
Further, in adjusting the parameter values in the determined person estimation initial model, the adopted loss function can be represented by the following formula 3, wherein θ isiDenotes y and ciS is 30, m is 0.35:
Figure BDA0002386933050000211
the process of training the personnel estimation model is to continuously adjust parameter values in the personnel estimation initial model so that the loss function is converged, and then the final personnel estimation model is obtained.
And 307, estimating the classification probability of the people to be estimated belonging to the lost people according to the trained people estimation model.
The estimation method for lost personnel in the embodiment can be widely applied to scenes of human resource system construction, personnel internal management, employee working state mining and the like, and specifically comprises the following steps:
(1) in the embodiment, the classification probability of each lost person is obtained by mining the frequent feature sequence and according to the occurrence frequency of each feature in the more important common feature sequence, so that the problem that the features constructed by the traditional method are rough is solved, and because the features with unobvious influence are removed in the process, the influence of noise features can be greatly reduced, and the accuracy of the person estimation model is further improved.
(2) In the embodiment, the am-softmax classification structure is adopted, so that the calculation effect of the personnel estimation model can be rapidly improved; the feature vectors of the combined features can be better extracted by using the BI-LSTM structure, so that the generalization capability is higher.
(3) The embodiment has good effect on the problems of less sample data for dealing with lost personnel, characteristic loss and the like, and has higher practical application value and guiding significance.
An embodiment of the present invention further provides a personnel estimation apparatus, a schematic structural diagram of which is shown in fig. 8, and the apparatus may specifically include:
a feature acquiring unit 10, configured to acquire features corresponding to a plurality of specific persons, respectively.
A frequent feature unit 11, configured to determine at least one frequent feature sequence according to an occurrence frequency of each of the features of the multiple specific people acquired by the feature acquisition unit 10, where each frequent feature sequence includes at least one feature that appears simultaneously in some or all of the features of the multiple specific people.
The feature obtaining unit 10 is further configured to encode the features of the specific people to obtain encoded features; or after preprocessing the characteristics of the specific personnel, coding the preprocessed characteristics to obtain coded characteristics; the frequent features unit 11 is specifically configured to determine the at least one frequent feature sequence according to the encoded features.
When the features of the specific persons are coded, if the features of the specific persons are discrete features, the feature obtaining unit 10 performs type coding on the features of the specific persons; and if the characteristic of the specific person is a continuous characteristic, performing range coding on the characteristic of the specific person.
The frequent feature unit 11 specifically determines that the features of the specific persons are a plurality of feature sequences respectively; determining the prefix of the characteristic sequence with the length of 1 corresponding to the plurality of characteristic sequences; removing the prefix of the characteristic sequence with the occurrence frequency not more than the preset minimum support rate, and taking the prefix of the residual characteristic sequence as a first frequent characteristic sequence; determining the projection suffix characteristics of each characteristic sequence corresponding to the residual characteristic sequence prefixes; determining at least one second frequent feature sequence according to the remaining feature sequence prefixes and the projected suffix features, wherein each second frequent feature sequence is composed of at least one feature of the remaining feature sequence prefixes and the projected suffix features; the at least one frequent feature sequence comprises: the first frequent feature sequence and the second frequent feature sequence.
When determining at least one second frequent feature sequence according to the remaining feature sequence prefixes and the projected suffix features, the frequent feature unit 11 is specifically configured to determine, when the projected suffix features of the feature sequence are non-empty, a single feature whose occurrence frequency is greater than a preset minimum support rate in the projected suffix features; combining the feature sequence prefix with the single feature to form a new feature sequence prefix; determining a new projection suffix feature of each feature sequence corresponding to the new feature sequence prefix, and performing the steps of determining a single feature, merging and determining a new projection suffix feature for the new feature sequence prefix and the new projection suffix feature; the second frequent signature sequence includes the new signature sequence prefix.
And a probability unit 12, configured to classify the specific people according to the frequent feature sequence determined by the frequent feature unit 11 included in the features of each specific person in the multiple specific people, and calculate a classification probability corresponding to each class of specific people respectively.
A probability unit 12, specifically configured to determine a longest frequent feature sequence included in the features of each specific person; dividing the specific personnel with the same longest frequent characteristic sequence into the same type of specific personnel; and determining the classification probability corresponding to any specific personnel according to the information of the longest frequent feature sequence. Wherein, the probability unit 12 determines that the classification probability corresponding to any specific person is the occurrence frequency of the longest frequent feature sequence; or, determining the classification probability corresponding to any specific person as a function calculation value of the occurrence frequency of each feature in the longest frequent feature sequence.
When the longest frequent feature sequence is a first frequent feature sequence, the occurrence frequency of the longest frequent feature sequence is the occurrence frequency of the prefix of the feature sequence with the length of 1; when the longest frequent feature sequence is a second frequent feature sequence, the occurrence frequency of the longest frequent feature sequence is the product of the occurrence frequency of the first frequent feature sequence and the occurrence frequency of a single feature in the new feature sequence prefix.
Further, the probability unit 12 is further configured to, after determining the longest frequent feature sequence included in the features of each specific person, if the length of the longest frequent feature sequence is greater than a preset value, perform the steps of dividing and determining the classification probability; if the length of the longest frequent feature sequence is not larger than a preset value, calculating the classification probability of the specific personnel with the longest frequent feature sequence, and classifying the specific personnel according to the classification probability. When calculating the classification probability of the specific person having the longest frequent feature sequence, the probability unit 12 is specifically configured to determine that the probability weight of a first class feature in the features of any specific person having the longest frequent feature sequence is the occurrence frequency of the longest frequent feature sequence, and determine that the probability weight of a second class feature in the features of any specific person is a preset minimum support rate, where the first class feature is a feature in the longest frequent feature sequence, and the second class feature is a feature other than the first class feature in the features of any specific person; and calculating the classification probability corresponding to any specific person according to the probability weights respectively corresponding to the first class features and the second class features.
The sampling unit 13 is configured to use the features of the plurality of specific persons as the features of the persons who are positive samples, and use the classification probabilities respectively corresponding to the plurality of specific persons calculated by the probability unit 12 as the classification probabilities of the persons who are positive samples.
A training unit 14, configured to train a person estimation model according to a first training sample, where the person estimation model is used to determine a classification probability that a person to be estimated is a specific person, and the first training sample includes the feature of the person in the positive sample determined by the sample unit 13 and the classification probability thereof.
A training unit 14, specifically configured to determine a person estimation initial model; respectively determining the classification probability of each positive sample person belonging to a specific person through the characteristics of each positive sample person in the person estimation initial model and the first training sample; and adjusting the personnel estimation initial model according to the classification probability of each positive sample personnel obtained by the personnel estimation initial model and the classification probability of each positive sample personnel in the first training sample to obtain a final personnel estimation model.
The training unit 14 is further configured to stop the adjustment of the fixed parameter value when the number of times of adjustment on the parameter value is equal to a preset number of times, or when a difference between a currently adjusted fixed parameter value and a last adjusted fixed parameter value is smaller than a threshold value.
Further, in the apparatus of this embodiment, after the feature obtaining unit 10 obtains the features of the new specific persons, when the number of the new specific persons satisfies a preset condition, the frequent feature unit 11, the probability unit 12, the sample unit 13, and the training unit 14 are notified to perform the steps of determining the frequent feature sequence, calculating the classification probability, determining the features and the classification probability of the positive sample person, and training a person estimation model for the features of the new specific persons and the features of the existing specific persons, respectively.
When the number of the new specific people does not satisfy the preset condition, the probability unit 12 classifies the new specific people into the existing specific people categories according to the longest frequent feature sequences respectively contained in the features of the new specific people to obtain the specific people after the latest classification, and the sample unit 13 determines a first training sample according to the features and the classification probabilities of the specific people after the latest classification, and notifies the training unit 14 to execute the step of training the people estimation model according to the first training sample.
Further, the person estimation device of the present embodiment may further include: the estimation unit 15 is configured to, when the person estimation model trained by the training unit 14 includes a feature extraction module and an estimation module, extract a feature vector of a feature of the person to be estimated by the feature extraction module; calculating the similarity between the feature vectors of the people to be estimated and the preset feature vectors of specific people; determining a plurality of feature vectors with similarity greater than a specific value; and calculating the classification probability of the person to be estimated belonging to a specific person according to the classification probability corresponding to the preset feature vectors respectively.
It can be seen that, in the person estimation apparatus of this embodiment, the frequent feature unit 11 determines at least one frequent feature sequence according to the occurrence frequency of each special type in the features of the specific person, and each frequent feature sequence includes at least one feature, so that one or more groups of features that affect the person to be estimated to become the specific person can be determined; the probability unit 12 classifies the specific persons according to the frequent feature sequences included in the features of the specific persons, calculates the classification probability corresponding to each specific person, determines the features and the classification probabilities of the specific persons as first training samples by the sample unit 13, and trains the person estimation model by the training unit 14. Therefore, in the process of training the personnel estimation model, the characteristics of the specific personnel are analyzed to determine all characteristic combinations influencing the specific personnel, and the characteristic combinations can represent the characteristic correlation among a plurality of specific personnel and also represent the characteristic correlation among the specific personnel at different periods, so that the finally trained personnel estimation model can accurately estimate the personnel to be estimated.
The present invention further provides a terminal device, a schematic structural diagram of which is shown in fig. 9, where the terminal device may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 20 (e.g., one or more processors) and a memory 21, and one or more storage media 22 (e.g., one or more mass storage devices) storing the application programs 221 or the data 222. Wherein the memory 21 and the storage medium 22 may be a transient storage or a persistent storage. The program stored in the storage medium 22 may include one or more modules (not shown), each of which may include a series of instruction operations for the terminal device. Still further, the central processor 20 may be arranged to communicate with the storage medium 22, and to execute a series of instruction operations in the storage medium 22 on the terminal device.
Specifically, the application 221 stored in the storage medium 22 includes an application for estimating the person, and the application may include the feature obtaining unit 10, the frequent feature unit 11, the probability unit 12, the sample unit 13, the training unit 14, and the estimating unit 15 in the person estimating apparatus, which will not be described in detail herein. Further, the central processor 20 may be configured to communicate with the storage medium 22, and execute a series of operations corresponding to the application program of the person estimation stored in the storage medium 22 on the terminal device.
The terminal equipment may also include one or more power supplies 23, one or more wired or wireless network interfaces 24, one or more input-output interfaces 25, and/or one or more operating systems 223, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and the like.
The steps performed by the person estimation apparatus described in the above method embodiment may be based on the structure of the terminal device shown in fig. 9.
Embodiments of the present invention also provide a computer-readable storage medium, which stores a plurality of computer programs, where the computer programs are suitable for being loaded by a processor and executing the person estimation method performed by the person estimation apparatus.
The embodiment of the invention also provides terminal equipment, which comprises a processor and a memory; the memory is used for storing a plurality of computer programs which are used for being loaded by the processor and executing the person estimation method executed by the person estimation device; the processor is configured to implement each of the plurality of computer programs.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The person estimation method, the person estimation device, the computer-readable storage medium, and the terminal device provided in the embodiments of the present invention are described in detail above, and a specific example is applied in the present disclosure to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (15)

1. A person estimation method, comprising:
acquiring characteristics corresponding to a plurality of specific persons respectively;
determining at least one frequent feature sequence according to the occurrence frequency of each feature in the features of the specific persons, wherein each frequent feature sequence comprises at least one feature which simultaneously appears in the features of part or all of the specific persons;
classifying the specific personnel according to frequent feature sequences contained in the features of each specific personnel in the specific personnel, and respectively calculating the classification probability corresponding to each type of specific personnel;
taking the characteristics of the specific persons as the characteristics of the persons in the positive sample, and taking the classification probabilities corresponding to the specific persons as the classification probabilities of the persons in the positive sample;
training a person estimation model according to a first training sample, wherein the person estimation model is used for determining the classification probability that a person to be estimated is a specific person, and the first training sample comprises the characteristics of the person in the positive sample and the classification probability thereof.
2. The method of claim 1, wherein after obtaining the features corresponding to the respective specific persons, the method further comprises:
coding the characteristics of the specific personnel to obtain coded characteristics; or after preprocessing the characteristics of the specific personnel, coding the preprocessed characteristics to obtain coded characteristics;
the determining at least one frequent feature sequence with an occurrence frequency greater than a preset value among the features of the plurality of specific persons specifically includes: and determining the at least one frequent feature sequence according to the coded features.
3. The method of claim 1, wherein said encoding the characteristics of the plurality of specific persons comprises:
if the characteristic of the specific person is a discrete characteristic, performing type coding on the characteristic of the specific person;
and if the characteristic of the specific person is a continuous characteristic, performing range coding on the characteristic of the specific person.
4. The method according to any one of claims 1 to 3, wherein the determining at least one frequent feature sequence according to the frequency of occurrence of each of the features of the plurality of specific persons specifically comprises:
determining the characteristics of the specific persons to be a plurality of characteristic sequences respectively;
determining the prefix of the characteristic sequence with the length of 1 corresponding to the plurality of characteristic sequences;
removing the prefix of the characteristic sequence with the occurrence frequency not more than the preset minimum support rate, and taking the prefix of the residual characteristic sequence as a first frequent characteristic sequence;
determining the projection suffix characteristics of each characteristic sequence corresponding to the residual characteristic sequence prefixes;
determining at least one second frequent feature sequence according to the remaining feature sequence prefixes and the projected suffix features, wherein each second frequent feature sequence is composed of at least one feature of the remaining feature sequence prefixes and the projected suffix features;
the at least one frequent feature sequence comprises: the first frequent feature sequence and the second frequent feature sequence.
5. The method according to claim 4, wherein the determining at least one second frequent feature sequence according to the remaining feature sequence prefixes and the projected suffix features comprises:
when the projected suffix features of the feature sequence are non-empty, determining a single feature of which the occurrence frequency is greater than a preset minimum support rate in the projected suffix features;
combining the feature sequence prefix with the single feature to form a new feature sequence prefix;
determining a new projection suffix feature of each feature sequence corresponding to the new feature sequence prefix, and performing the steps of determining a single feature, merging and determining a new projection suffix feature for the new feature sequence prefix and the new projection suffix feature;
the second frequent signature sequence includes the new signature sequence prefix.
6. The method according to claim 4, wherein the classifying the specific people according to the frequent feature sequences included in the features of each specific person in the specific people, and calculating the classification probability corresponding to each class of specific people respectively comprises:
determining the longest frequent feature sequence contained in the features of each specific person;
dividing the specific personnel with the same longest frequent characteristic sequence into the same type of specific personnel;
and determining the classification probability corresponding to any specific personnel according to the information of the longest frequent feature sequence.
7. The method according to claim 6, wherein the determining the classification probability corresponding to any specific person according to the information of the longest frequent feature sequence specifically comprises:
determining the classification probability corresponding to any specific person as the occurrence frequency of the longest frequent feature sequence; or, determining the classification probability corresponding to any specific person as a function calculation value of the occurrence frequency of each feature in the longest frequent feature sequence.
8. The method of claim 6,
when the longest frequent feature sequence is a first frequent feature sequence, the occurrence frequency of the longest frequent feature sequence is the occurrence frequency of the prefix of the feature sequence with the length of 1;
when the longest frequent feature sequence is a second frequent feature sequence, the occurrence frequency of the longest frequent feature sequence is the product of the occurrence frequency of the first frequent feature sequence and the occurrence frequency of a single feature in the new feature sequence prefix.
9. The method of claim 6, wherein after determining the longest frequent sequence of features included in the features of each particular person, further comprising:
if the length of the longest frequent feature sequence is larger than a preset value, executing the steps of dividing and determining the classification probability;
if the length of the longest frequent feature sequence is not larger than a preset value, calculating the classification probability of the specific personnel with the longest frequent feature sequence, and classifying the specific personnel according to the classification probability.
10. The method of claim 9, wherein the calculating the classification probability of the particular person having the longest frequent feature sequence comprises:
determining the probability weight of a first class of features in the features of any one specific person with the longest frequent feature sequence as the occurrence frequency of the longest frequent feature sequence, and determining the probability weight of a second class of features in the features of any one specific person as a preset minimum support rate, wherein the first class of features are the features in the longest frequent feature sequence, and the second class of features are the features of any one specific person except the first class of features;
and calculating the classification probability corresponding to any specific person according to the probability weights respectively corresponding to the first class features and the second class features.
11. The method of any one of claims 1 to 3, wherein the training of the person estimation model from the first training sample specifically comprises:
determining a personnel estimation initial model;
respectively determining the classification probability of each positive sample person belonging to a specific person through the characteristics of each positive sample person in the person estimation initial model and the first training sample;
and adjusting the personnel estimation initial model according to the classification probability of each positive sample personnel obtained by the personnel estimation initial model and the classification probability of each positive sample personnel in the first training sample to obtain a final personnel estimation model.
12. The method of claim 11, wherein the method further comprises:
acquiring characteristics of a plurality of new specific persons;
and when the number of the new specific personnel meets a preset condition, aiming at the characteristics of the new specific personnel and the existing specific personnel, executing the steps of determining the frequent characteristic sequence, calculating the classification probability and training a personnel estimation model.
13. The method of claim 12, wherein the method further comprises:
when the number of the new specific personnel does not meet the preset condition, dividing the new specific personnel into the existing specific personnel categories according to the longest frequent feature sequences contained in the features of the new specific personnel respectively to obtain the newly classified specific personnel;
and determining a first training sample according to the characteristics and the classification probability of the specific person after the latest classification, and executing the step of training a person estimation model according to the first training sample.
14. The method of any one of claims 1 to 3, wherein the person estimation model comprises a feature extraction module and an estimation module, the method further comprising:
extracting a feature vector of the feature of the person to be estimated through the feature extraction module;
calculating the similarity between the feature vectors of the people to be estimated and the preset feature vectors of specific people;
determining a plurality of feature vectors with similarity greater than a specific value;
and calculating the classification probability of the person to be estimated belonging to a specific person according to the classification probability corresponding to the preset feature vectors respectively.
15. A person estimation device, comprising:
the system comprises a characteristic acquisition unit, a characteristic acquisition unit and a characteristic acquisition unit, wherein the characteristic acquisition unit is used for acquiring characteristics corresponding to a plurality of specific personnel;
a frequent feature unit, configured to determine at least one frequent feature sequence according to an occurrence frequency of each feature in the features of the multiple specific people, where each frequent feature sequence includes at least one feature that occurs simultaneously in the features of some or all of the multiple specific people;
the probability unit is used for classifying the specific personnel according to the frequent feature sequences contained in the features of each specific personnel in the specific personnel and respectively calculating the classification probability corresponding to each type of specific personnel;
the sample unit is used for taking the characteristics of the specific persons as the characteristics of the persons who are sampled positively and taking the classification probabilities corresponding to the specific persons as the classification probabilities of the persons who are sampled positively;
the training unit is used for training a person estimation model according to a first training sample, the person estimation model is used for determining the classification probability that a person to be estimated is a specific person, and the first training sample comprises the characteristics of the person in the positive sample and the classification probability thereof.
CN202010101233.6A 2020-02-19 2020-02-19 Personnel estimation method and device, computer-readable storage medium and terminal equipment Active CN111324641B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010101233.6A CN111324641B (en) 2020-02-19 2020-02-19 Personnel estimation method and device, computer-readable storage medium and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010101233.6A CN111324641B (en) 2020-02-19 2020-02-19 Personnel estimation method and device, computer-readable storage medium and terminal equipment

Publications (2)

Publication Number Publication Date
CN111324641A true CN111324641A (en) 2020-06-23
CN111324641B CN111324641B (en) 2022-09-09

Family

ID=71171059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010101233.6A Active CN111324641B (en) 2020-02-19 2020-02-19 Personnel estimation method and device, computer-readable storage medium and terminal equipment

Country Status (1)

Country Link
CN (1) CN111324641B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597321A (en) * 2020-07-08 2020-08-28 腾讯科技(深圳)有限公司 Question answer prediction method and device, storage medium and electronic equipment
CN115545570A (en) * 2022-11-28 2022-12-30 四川大学华西医院 Method and system for checking and accepting achievements of nursing education training

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106204083A (en) * 2015-04-30 2016-12-07 ***通信集团山东有限公司 A kind of targeted customer's sorting technique, Apparatus and system
WO2017148521A1 (en) * 2016-03-03 2017-09-08 Telefonaktiebolaget Lm Ericsson (Publ) Uncertainty measure of a mixture-model based pattern classifer
CN109544197A (en) * 2017-09-22 2019-03-29 中兴通讯股份有限公司 A kind of customer churn prediction technique and device
CN110009062A (en) * 2019-04-18 2019-07-12 成都四方伟业软件股份有限公司 Disaggregated model training method and device
CN110197251A (en) * 2018-02-26 2019-09-03 中国科学院深圳先进技术研究院 Prediction technique, device, equipment and storage medium based on deep learning network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106204083A (en) * 2015-04-30 2016-12-07 ***通信集团山东有限公司 A kind of targeted customer's sorting technique, Apparatus and system
WO2017148521A1 (en) * 2016-03-03 2017-09-08 Telefonaktiebolaget Lm Ericsson (Publ) Uncertainty measure of a mixture-model based pattern classifer
CN109544197A (en) * 2017-09-22 2019-03-29 中兴通讯股份有限公司 A kind of customer churn prediction technique and device
CN110197251A (en) * 2018-02-26 2019-09-03 中国科学院深圳先进技术研究院 Prediction technique, device, equipment and storage medium based on deep learning network
CN110009062A (en) * 2019-04-18 2019-07-12 成都四方伟业软件股份有限公司 Disaggregated model training method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597321A (en) * 2020-07-08 2020-08-28 腾讯科技(深圳)有限公司 Question answer prediction method and device, storage medium and electronic equipment
CN111597321B (en) * 2020-07-08 2024-06-11 腾讯科技(深圳)有限公司 Prediction method and device of answers to questions, storage medium and electronic equipment
CN115545570A (en) * 2022-11-28 2022-12-30 四川大学华西医院 Method and system for checking and accepting achievements of nursing education training
CN115545570B (en) * 2022-11-28 2023-03-24 四川大学华西医院 Achievement acceptance method and system for nursing education training

Also Published As

Publication number Publication date
CN111324641B (en) 2022-09-09

Similar Documents

Publication Publication Date Title
WO2023065545A1 (en) Risk prediction method and apparatus, and device and storage medium
WO2023000574A1 (en) Model training method, apparatus and device, and readable storage medium
Vovan An improved fuzzy time series forecasting model using variations of data
Jalalkamali Using of hybrid fuzzy models to predict spatiotemporal groundwater quality parameters
CN106874478A (en) Parallelization random tags subset multi-tag file classification method based on Spark
CN111400432A (en) Event type information processing method, event type identification method and device
CN113254833B (en) Information pushing method and service system based on birth teaching fusion
CN111324641B (en) Personnel estimation method and device, computer-readable storage medium and terminal equipment
CN111967971A (en) Bank client data processing method and device
CN111198970A (en) Resume matching method and device, electronic equipment and storage medium
CN113706151A (en) Data processing method and device, computer equipment and storage medium
CN113268370B (en) Root cause alarm analysis method, system, equipment and storage medium
Xu et al. Constructing balance from imbalance for long-tailed image recognition
Polat et al. Subtractive clustering attribute weighting (SCAW) to discriminate the traffic accidents on Konya–Afyonkarahisar highway in Turkey with the help of GIS: A case study
CN111078859B (en) Author recommendation method based on reference times
Wang et al. Towards efficient convolutional neural networks through low-error filter saliency estimation
CN103106329A (en) Training sample grouping construction method used for support vector regression (SVR) short-term load forecasting
CN111325255B (en) Specific crowd delineating method and device, electronic equipment and storage medium
Ferreira et al. Dynamic Identification of Stop Locations from GPS Trajectories Based on Their Temporal and Spatial Characteristics
CN116304518A (en) Heterogeneous graph convolution neural network model construction method and system for information recommendation
Dong et al. Research on academic early warning model based on improved SVM algorithm
Wu et al. Multi-graph-view learning for complicated object classification
Mao et al. Naive Bayesian algorithm classification model with local attribute weighted based on KNN
CN117194966A (en) Training method and related device for object classification model
CN113590720A (en) Data classification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40024710

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant