WO2019072128A1 - 对象识别方法及其*** - Google Patents

对象识别方法及其*** Download PDF

Info

Publication number
WO2019072128A1
WO2019072128A1 PCT/CN2018/109020 CN2018109020W WO2019072128A1 WO 2019072128 A1 WO2019072128 A1 WO 2019072128A1 CN 2018109020 W CN2018109020 W CN 2018109020W WO 2019072128 A1 WO2019072128 A1 WO 2019072128A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature data
data set
decision tree
user
training
Prior art date
Application number
PCT/CN2018/109020
Other languages
English (en)
French (fr)
Inventor
王颖帅
李晓霞
苗诗雨
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Publication of WO2019072128A1 publication Critical patent/WO2019072128A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Definitions

  • the present disclosure relates to the field of data processing, and more particularly to an object recognition method and system thereof.
  • the present disclosure provides an object recognition method capable of improving object recognition accuracy, a system and system thereof, and a computer readable storage medium.
  • An aspect of the present disclosure provides an object recognition method, including: acquiring feature data of an object, wherein the feature data is used to reflect a preference degree of the object to its associated object; and input the acquired feature data into a preset classifier. Obtaining a first classification result, wherein the preset classifier is configured to classify the object according to the input feature data; and identify a preference degree of the object to the associated object according to the first classification result.
  • the method before the obtained feature data is input into the preset classifier to obtain the first classification result, the method further includes: acquiring the training data set, wherein the training data set includes at least one type of feature data; And training the feature data in the training data set to obtain the preset classifier.
  • acquiring the training data set includes: acquiring a sample data set; calculating condition information entropy of each feature data included in the sample data set with respect to all the feature data of the sample data set; calculating all the sample data sets The information entropy of the feature data; calculating the difference between the information entropy and the conditional information entropy to obtain the information gain of each feature data; and selecting the information gain from the sample data set to satisfy the preset condition according to the obtained information gain size relationship
  • the feature data is used as the target feature data; and the data set corresponding to the target feature data is used as the training data set.
  • training the feature data in the training data set to obtain the preset classifier includes: determining a root node and a leaf node of the first decision tree according to the size relationship of the information gain of the target feature data Generating the first decision tree described above; and constructing the preset classifier according to the first decision tree.
  • the method further includes: acquiring a check data set including feature data of the check object; and performing feature data of the check object Entering the above decision tree to obtain a second classification result; determining whether the preference degree of the verification object to the associated object can be identified according to the second classification result; if not, determining the second decision tree based on the feature data of the verification object a root node and a leaf node to generate the second decision tree; construct the preset classifier according to the second decision tree; or construct the preset classifier according to the first decision tree and the second decision tree.
  • an object recognition system including: a first obtaining module, configured to acquire feature data of an object, wherein the feature data is used to reflect a preference degree of the object to its associated object; a module, configured to input the acquired feature data into a preset classifier to obtain a first classification result, where the preset classifier is configured to classify the object according to the input feature data; and the identification module is configured to be used according to the foregoing A classification result identifies the degree of preference of the above object to the associated object.
  • the system further includes: a second obtaining module, configured to acquire a training data set, wherein the training data set includes at least one type of feature data; and a training module, configured to perform features on the training data set The data is trained to obtain the above preset classifier.
  • the second obtaining module includes: a first acquiring unit, configured to acquire a sample data set; and a first calculating unit, configured to calculate each feature data included in the sample data set with respect to the sample data set The conditional information entropy of all the feature data; the second calculating unit is configured to calculate the information entropy of all the feature data in the sample data set; the third calculating unit is configured to calculate the difference between the information entropy and the conditional information entropy, and obtain the above An information gain of the feature data; a selection unit, configured to select feature data whose information gain satisfies a preset condition from the sample data set as the target feature data according to the obtained size relationship of the information gain; and a first determining unit, configured to: The data set corresponding to the target feature data is used as the training data set.
  • the training module includes: a second determining unit, configured to determine a root node and a leaf node of the first decision tree to generate the first decision tree according to the size relationship of the information gain of the target feature data; And a building unit, configured to construct the preset classifier according to the first decision tree.
  • the system further includes: a third obtaining module, configured to acquire a check data set including feature data of the check object; and a second processing module, configured to input feature data of the check object
  • the decision tree is configured to obtain a second classification result
  • the determining module is configured to determine whether the preference level of the verification object is determined according to the second classification result, and the determining module is configured to: Identifying a degree of preference of the check object to its associated object, determining a root node and a leaf node of the second decision tree based on the feature data of the check object to generate the second decision tree;
  • the second decision tree constructs the preset classifier; or the second building block is configured to construct the preset classifier according to the first decision tree and the second decision tree.
  • Another aspect of the present disclosure provides a computer readable storage medium having stored thereon executable instructions for implementing the object recognition method described above when executed by a processor.
  • Another aspect of the present disclosure provides an object recognition system comprising: the above computer readable storage medium; and the processor.
  • FIG. 1 schematically illustrates an exemplary system architecture of an object recognition method and system thereof according to the present disclosure
  • FIG. 2 schematically shows an application scenario diagram of an object recognition method and system thereof according to an embodiment of the present disclosure
  • FIG. 3 schematically shows a flowchart of an object recognition method according to an embodiment of the present disclosure
  • FIG. 4A schematically illustrates a flowchart of an object recognition method according to another embodiment of the present disclosure
  • FIG. 4B schematically illustrates a flowchart of an object recognition method according to another embodiment of the present disclosure
  • FIG. 4C schematically illustrates a flowchart of an object recognition method according to another embodiment of the present disclosure
  • FIG. 4D schematically illustrates a flowchart of an object recognition method according to another embodiment of the present disclosure
  • FIG. 5 is a view schematically showing a recognition result of an object recognition method according to an embodiment of the present disclosure
  • FIG. 6 schematically illustrates a block diagram of an object recognition system in accordance with an embodiment of the present disclosure
  • FIG. 7 is a block diagram schematically showing a computer system to which an object recognition method of an embodiment of the present disclosure is applied.
  • the techniques of this disclosure may be implemented in the form of hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of this disclosure may take the form of a computer program product on a computer readable medium storing instructions for use by or in connection with an instruction execution system.
  • a computer readable medium can be any medium that can contain, store, communicate, propagate or transport the instructions.
  • a computer readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, system, device, or propagation medium.
  • the computer readable medium include: a magnetic storage system such as a magnetic tape or a hard disk (HDD); an optical storage system such as a compact disk (CD-ROM); a memory such as a random access memory (RAM) or a flash memory; and/or a wired /Wireless communication link.
  • a magnetic storage system such as a magnetic tape or a hard disk (HDD)
  • an optical storage system such as a compact disk (CD-ROM)
  • a memory such as a random access memory (RAM) or a flash memory
  • RAM random access memory
  • Embodiments of the present disclosure provide an object recognition method and system thereof.
  • the method includes a data acquisition process and an object recognition process.
  • the data acquisition process it is necessary to acquire the feature data that can reflect the object to which the object is associated, and to acquire the classifier for classifying the object according to the input feature data.
  • the object recognition process is entered.
  • the acquired feature data of the object can be input into the classifier to obtain a classification result, and the degree of preference of the object to the associated object is identified according to the classification result, such as identifying one or more user pairs. Personalized preferences for product categories, etc.
  • FIG. 1 schematically illustrates an exemplary system architecture of an object recognition method and system thereof in accordance with the present disclosure.
  • the system architecture 100 can include a terminal device 101, a terminal device 102, a terminal device 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium of communication links between the terminal device 101, the terminal device 102, the terminal device 103, and the server 105.
  • Network 104 may include various types of connections, such as wired, wireless communication links, fiber optic cables, and the like.
  • the user can interact with the server 105 via the network 104 using the terminal device 101, the terminal device 102, and the terminal device 103 to receive or transmit a message or the like.
  • the terminal device 101, the terminal device 102, and the terminal device 103 can be installed with various communication client applications, such as a shopping application, a web browser application, a search application, an instant communication tool, a mailbox client, a social platform software, and the like. This will not be repeated here.
  • the terminal device 101, the terminal device 102, and the terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
  • the server 105 may be a server that provides various services, such as a background management server (for example only) that provides support for the user to browse the shopping websites browsed by the terminal device 101, the terminal device 102, and the terminal device 103.
  • the background management server may analyze and process data such as the received product information query request, and feed back the processing result (for example, target push information, product information--only examples) to the terminal device.
  • the object identification method provided by the embodiment of the present disclosure may be performed by the server 105, or may be performed by another server or a server cluster different from the server 105. Accordingly, the system for object recognition may be provided in the server 105 or in another server or a server cluster other than the server 105.
  • terminal devices, networks, and servers in Figure 1 is merely illustrative. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
  • FIG. 2 schematically illustrates an application scenario diagram of an object recognition method and system thereof according to an embodiment of the present disclosure.
  • the object may be any specified user, and may be one or more.
  • the user A who is logged in to a certain trading platform may also log in to the user A of a trading platform.
  • the associated object of the object may be a category displayed on the webpage, for example, it may be a clothing category, or may be a mobile communication category. Users will perform click, browse, follow, purchase, etc. for different categories of products, and the background will generate a large amount of data, which is used to reflect the user's preference for the category.
  • e-commerce In order to make personalized recommendations for users, e-commerce generally needs to identify users A, B, C, and users from the generated massive data when users perform click, browse, follow, and purchase operations on different categories of products. Which category is preferred by D respectively.
  • FIG. 3 schematically illustrates a flow chart of an object recognition method in accordance with an embodiment of the present disclosure.
  • the method includes operations S301-S303, wherein:
  • feature data of the object is acquired, wherein the feature data is used to reflect the degree of preference of the object to its associated object.
  • the feature data for reflecting the degree of preference of the user for the category is obtained, as shown in Table 1, wherein the feature data may include but not It is limited to the user's characteristics of the category (feature data numbered 1-10), the features of the pure user dimension (feature data numbered 11-13), and the characteristics of the pure category dimension (feature data numbered 14-27).
  • the user data related to the behavior is obtained, for example, the browsing table can be obtained from the browsing table.
  • the browsing table can be obtained from the browsing table.
  • a product clicked on a shopping application category page the user's three-level category brand score can be obtained from the user's three-level category brand score table; the user's three-level category search score can be obtained from the user search table; the user can be obtained from the shopping cart table
  • the grade category is added to the shopping cart score; the user preference price and purchasing power characteristics can be obtained from the user's third-level category price segment table; the pure three-level category dimension feature can be obtained from the third-level category attribute table; and the pureness can also be obtained from the user value table.
  • User dimension features are not limited here.
  • the classification of the commodity is not limited, and may include, but is not limited to, different kinds of commodities, or classifications of the same kind of commodities in different dimensions, such as a third-grade category.
  • the data After obtaining user data related to click behavior, browsing behavior, attention behavior, and shopping cart behavior, the data needs to be cleaned. For example, if the user browses the same third-level category more than 100 times a day, it counts as 100 times; Cancel the attention of the three-category category; one user purchases multiple three-category categories in the same day, counted as one.
  • the purpose of data cleaning is to remove duplicate information, correct existing errors, and provide data consistency.
  • the user's label is 1, which can be used as an important indicator to identify user preferences.
  • the acquired feature data is input into a preset classifier to obtain a first classification result, wherein the preset classifier is configured to classify the object according to the input feature data.
  • the preset classifier is used to classify objects according to the input feature data.
  • the classification result of the object can be obtained. If the feature data of the 100 third-level categories corresponding to the user A is input to the classifier, the classification result of the corresponding three-level category of the user A is obtained; if the user A, the user B, the user C, and the user D are 100 After the feature data of the three-level category is input to the classifier, the classification results of 100 corresponding three-level categories of four users are obtained.
  • the object may be one user or multiple users, and the user may be all users on the website or some users.
  • the feature data may be one or more of the categories in Table 1, and is not limited herein.
  • the classification result may be a prediction score reflecting the degree of preference of the object to its associated object.
  • the prediction score is normalized so that the prediction score is distributed in (0, 1). between. It can be seen that the higher the prediction score, the higher the preference of the object to its associated object.
  • the product under the user preference category is prioritized, so that the user's preference psychology can be replaced and replaced.
  • GMV gross merchandise transaction
  • the recommended data format for the third-level category preference is shown in the figure below.
  • the first column separated by spaces is the user name.
  • the second column is first separated by a comma.
  • the colon is preceded by a third-level category number and a colon.
  • the latter is the user's score for this three-category category preference, and the present disclosure ranks the user-recommended three-level category scores in descending order, making the score results clear and easy to analyze.
  • the A in front of the first line of the space indicates the user name
  • the space after the space is first separated by a comma indicates a third-level category number
  • 0.04422928 indicates the user A's preference score for the 12354 third-class category
  • 0.04121400 indicates the user A.
  • the preference score of the third-level category of 13783 it can be seen that the data format recommended by the present disclosure ranks the three-level category scores of the user's preference in descending order, and it is easy to conclude that user A compares the three-category category of category 12354. Preference, the same reason can identify the third-level category that other users prefer.
  • Table 2 is only a recommended format for the user's third-class category preference score, which is only an indication, and is not a limitation on the number of third-level categories, the number of users, and the format.
  • an object recognition method that is, a statistical user preference based on a user's historical behavior
  • This method participates in the identification of limited feature data, and the dimensions are relatively simple.
  • the user's order characteristics, the user's browsing characteristics, the user's attention characteristics, and the user's participation in shopping are all part of the user's interactive dimension characteristics and participation.
  • the identified feature data are independent of each other, ignoring the interaction between the feature data, and the potential meaning of the feature data cannot be mined.
  • the weight of each feature behavior depends on the analyst's experience, and the preference score cannot be based on online scores in time. The update makes it impossible to fully and effectively identify the object's preference for related objects.
  • the embodiment of the present disclosure adds more dimensional feature data, and takes into account the interaction relationship between the feature data, overcomes the use of only a single dimension of feature data for identification, and thus cannot comprehensively and effectively identify the object.
  • the defect of the degree of preference of the related object realizes the technical effect of comprehensively and effectively identifying the degree of preference of the object to the related object.
  • an object recognition method that is, a method based on time weight calculation
  • this method relies on time weights, while time weights are abnormal during large promotion periods, and on the other hand, depending on the purchase cycle model, it is not enough to accurately identify the object's preference for related objects.
  • the embodiment of the present disclosure inputs the feature data into the classifier, classifies the feature data of the input object by using the preset classifier, and identifies the preference degree of the object to the associated object according to the classification result, thereby improving the accuracy of identifying the user.
  • This automated mode of machine learning is of great significance to the company's intelligent model system.
  • the multi-dimensional feature data is used to train a classifier with high recognition capability, which can comprehensively, effectively and accurately identify the preference degree of the object to its associated object, and provide the user with a trusted commodity that can be trusted. It is of great significance for the preference map to identify the user's search intention, enhance the user experience, and intelligent platform services.
  • FIG. 4A schematically illustrates a flow chart of an object recognition method according to another embodiment of the present disclosure.
  • the method further includes operations S401 to S402, where:
  • a training data set is acquired, wherein the training data set includes at least one type of feature data.
  • the training data set used to generate the preset classifier needs to be acquired.
  • the feature data in the training data set is relatively flexible, and can be selected according to the processing capability, or can be selected autonomously according to the processing precision requirement.
  • the training data set may include feature data of an object, and may also include feature data of multiple objects.
  • the feature data has user interaction preference characteristics of the class, pure user dimension features, and pure class dimensions, which can realize large-scale data. It can be one or more feature data in Table 2.
  • the feature data in the training data set is trained to obtain a preset classifier.
  • the training classifier may include multiple modes/means, which are not limited herein.
  • the training may be performed using a machine learning algorithm provided by a random forest and a Gradient Boosting Decision Tree (GBDT).
  • GBDT Gradient Boosting Decision Tree
  • the disclosure provides an improved GBDT, and the basic learning unit is a regression decision tree. Multiple parameters were combined with evaluation index feedback to debug the optimal parameters.
  • the present invention is an improved GBDT.
  • the model itself can discover the interaction between the features. For example, the analyst analyzes the historical data and only thinks about the click feature and the order feature, but the latest data shows the two features.
  • a reasonable nonlinear combination can simulate the characteristics of adding a shopping cart.
  • the gradient lifting tree can tap more potential features.
  • the present invention uses the big data technology Spark to realize the subscription task every day, and the data is updated in time. Model training can also be automated at any time when needed.
  • Gradient lifting trees focus on reducing bias and can build strong integration based on learners with weak generalization performance.
  • the gradient lifting tree starts from the weak learning algorithm and learns repeatedly, and obtains a series of basic classifiers, and then combines these basic classifiers to form a strong classifier.
  • the lifting method actually adopts the addition model and the forward distribution algorithm.
  • the lifting method based on the decision tree is called the lifting tree.
  • the gradient lifting algorithm minimizes the loss function by the steepest descent method, and has a good effect on the regression prediction problem.
  • the embodiment of the present disclosure uses the preset classifier for object recognition, and the preset classifier is trained from the training data set.
  • This automatic learning mode of machine learning can identify big data and achieve simplified identification.
  • the process which provides technical effects of identifying efficiency and accuracy, is of great significance to the intelligent model system, while the training process can be automated, and the training results are updated as the data changes.
  • FIG. 4B schematically illustrates a flow chart of an object recognition method according to another embodiment of the present disclosure.
  • acquiring the training data set may include operations S501 to S506, where:
  • Operation S502 calculating conditional information entropy of each feature data included in the sample data set with respect to all feature data of the sample data set.
  • a difference between the information entropy and the conditional information entropy is calculated to obtain an information gain of each feature data.
  • the data set corresponding to the target feature data is used as the training data set.
  • the sample data set may be all of the feature data set or part of the feature data set.
  • Find out the local optimal features that is, whether the data sets can be separated as much as possible after classifying according to this feature.
  • the method for measuring the local optimal feature is to calculate the information entropy of all the feature data in the sample set and the condition information entropy of each feature data in the sample data set with respect to all the feature data of the sample data set, and obtain the information gain value of each feature data. The larger the value, the more important the feature data is.
  • the amount of information represented by the feature data is calculated according to the following formula.
  • the small probability event contains a large amount of information, and the amount of event information that often occurs is small.
  • Information entropy is the expectation of information quantity, and entropy measures the order of node data sets.
  • the information gain indicates how much the uncertainty of the entire data is reduced with the participation of a certain feature. The greater the information gain, the more important the feature is.
  • H(D) represents the information entropy of all feature data in the sample data set D
  • g(D, A) indicates the information gain that the feature data A brings to the sample data set D. If the information entropy of all feature data in the sample data set D is 80%, and the conditional information entropy of the feature data A relative to all feature data of the sample data set D is 60%, then the information gain brought by the feature data A to the sample data set D is It is 20%, and the information gain value of each feature data to the sample data set D is calculated accordingly.
  • a data set corresponding to the feature data larger than the threshold value is selected as the training data set according to the threshold value of the specified information gain value.
  • the embodiments of the present disclosure can filter the information data of the feature data with high importance from the feature data set according to the actual needs, and filter the feature data with high importance to form a sample data set, which provides a good result for the subsequent classifier.
  • the data foundation can filter the information data of the feature data with high importance from the feature data set according to the actual needs, and filter the feature data with high importance to form a sample data set, which provides a good result for the subsequent classifier.
  • FIG. 4C schematically illustrates a flow chart of an object recognition method according to another embodiment of the present disclosure.
  • training the feature data in the training data set to obtain the preset classifier may include operations S601-S602, where:
  • the root node and the leaf node of the first decision tree are determined according to the size relationship of the information gain of the target feature data to generate a first decision tree.
  • a preset classifier is constructed according to the first decision tree.
  • the feature that is optimal under the current condition is selected as a partitioning rule, that is, a locally optimal feature.
  • Figure 5 shows a schematic diagram of the decision tree.
  • the feature that is optimal under the current condition is also selected as the partitioning rule, that is, the local optimal feature, which will not be described again.
  • the feature that is optimal under the current condition is selected as the basis for division, that is, the local optimal feature, and the decision tree classification ability is constructed. Strong.
  • FIG. 4D schematically illustrates a flowchart of an object recognition method according to another embodiment of the present disclosure.
  • the method may further include operations S701-S705, wherein:
  • the feature data of the verification object is input into the decision tree to obtain a second classification result
  • Operation S704 if not, determining a root node and a leaf node of the second decision tree based on the feature data of the check object to generate a second decision tree;
  • a preset classifier is constructed according to the first decision tree and the second decision tree.
  • the first scheme is to construct a preset classifier according to the second decision tree; the second scheme is to construct a preset classifier according to the first decision tree and the second decision tree.
  • FIG. 4D is only a flow chart schematically showing the first solution, and the second embodiment is not described herein again.
  • each tree in GBDT is regarded as a residual iterative tree, and the residual is minimized as the global optimal direction.
  • the optimization goal is to make the result the best, and each regression tree is learning.
  • the residual of the previous N-1 tree when using the squared error loss function, each regression tree learns the conclusions and residuals of all previous trees, and fits to obtain a current residual regression tree.
  • the promotion tree is the accumulation of the decision tree generated by the entire iterative process.
  • the feature data of the check object is input into the decision tree, and if the classification result is obtained, the preference degree of the check object to the associated object is identified according to the classification result. If not, learn the residuals of the first decision tree, construct a second decision tree, and build the next regression tree to reduce the residuals. This is equivalent to establishing the next decision tree to minimize the gradient and update to the entire forest. The same is true until the identification requirements are met.
  • the gradient lifting tree algorithm multiplies the output result of each tree by a factor (0 ⁇ v ⁇ 1). Is the output of the mth tree, and f(m) is the predicted result of the former m tree.
  • the factor v can be regarded as the weight of each tree, which can control the learning rate of the lifting process.
  • v and m are a mutually balanced process.
  • the GBDT predicts the user's third-class category preference, and the evaluation index uses AUC.
  • the GBDT feature tag combination and the GBDT inherent parameters are debugged, so that the AUC reaches 0.8 or more.
  • the parameters that normally affect the gradient boost tree GBDT can be divided into three categories:
  • a.LearningRate learning rate.
  • the GBDT algorithm is implemented by adjusting the initial model once and for all.
  • the learning rate is a parameter that measures the amplitude of each adjustment. In theory, the smaller the parameter, the better the result of the iteration, but the higher the computational cost required;
  • b.NumIterations The number of weak classifiers. Is the number of all weak learners generated, because GBDT is serial, is the number of iterations, of course, not as much as possible, the lifting algorithm will have the risk of over-fitting;
  • Proportion of sample subsets The sample subset used to train weak learners accounts for the proportion of the sample population. Generally, random samples are used to reduce the variance. The default is to select 80% of the total sample to train.
  • branch minimum sample size the minimum number of samples a node needs to continue branching
  • the minimum sample size of the leaf node the minimum number of samples required for a node to be divided into leaf nodes
  • maximum leaf node quantity the maximum number of leaf nodes, and the maximum depth of the tree can replace each other;
  • the gradient lifting tree mainly focuses on reducing the deviation, and can construct a strong integration based on a learner whose generalization performance is rather weak.
  • the gradient lifting tree starts from the weak learning algorithm and learns repeatedly, and obtains a series of basic classifiers, and then combines these basic classifiers to form a strong classifier.
  • the lifting method actually adopts the addition model and the forward distribution algorithm.
  • the lifting method based on the decision tree is called the lifting tree.
  • the gradient lifting algorithm minimizes the loss function by the steepest descent method, and has a good effect on the regression prediction problem.
  • FIG. 6 schematically illustrates a block diagram of an object recognition system in accordance with an embodiment of the present disclosure.
  • the object recognition system includes: a first acquisition module 601, a first processing module 602, and an identification module 603.
  • the first obtaining module 601 is configured to acquire feature data of the object, where the feature data is used to reflect the degree of preference of the object to its associated object.
  • the feature data for reflecting the degree of preference of the user for the category is obtained, as shown in Table 1, wherein the feature data may include but not It is limited to the user's characteristics of the category (feature data numbered 1-10), the features of the pure user dimension (feature data numbered 11-13), and the characteristics of the pure category dimension (feature data numbered 14-27).
  • the user data related to the behavior is obtained, for example, the browsing table can be obtained from the browsing table.
  • the browsing table can be obtained from the browsing table.
  • a product clicked on a shopping application category page the user's three-level category brand score can be obtained from the user's three-level category brand score table; the user's three-level category search score can be obtained from the user search table; the user can be obtained from the shopping cart table
  • the grade category is added to the shopping cart score; the user preference price and purchasing power characteristics can be obtained from the user's third-level category price segment table; the pure three-level category dimension feature can be obtained from the third-level category attribute table; and the pureness can also be obtained from the user value table.
  • User dimension features are not limited here.
  • the classification of the commodity is not limited, and may include, but is not limited to, different types of commodities, or classifications of the same category of commodities in different dimensions, such as a three-category category.
  • the data After obtaining user data related to click behavior, browsing behavior, attention behavior, and shopping cart behavior, the data needs to be cleaned. For example, if the user browses the same third-level category more than 100 times a day, it counts as 100 times; Cancel the attention of the three-category category; one user purchases multiple three-category categories in the same day, counted as one.
  • the purpose of data cleaning is to remove duplicate information, correct existing errors, and provide data consistency.
  • the user's label is 1, which can be used as an important indicator to identify user preferences.
  • the first processing module 602 is configured to input the acquired feature data into a preset classifier to obtain a first classification result, where the preset classifier is configured to classify the object according to the input feature data.
  • the preset classifier is used to classify objects according to the input feature data.
  • the classification result of the object can be obtained. If the feature data of the 100 third-level categories corresponding to the user A is input to the classifier, the classification result of the corresponding three-level category of the user A is obtained; if the user A, the user B, the user C, and the user D are 100 After the feature data of the three-level category is input to the classifier, the classification results of 100 corresponding three-level categories of four users are obtained.
  • the object may be one user or multiple users, and the user may be all users on the website or some users.
  • the feature data may be one or more of the categories in Table 1, and is not limited herein.
  • the identification module 603 is configured to identify the degree of preference of the object to the associated object according to the classification result.
  • the classification result may be a prediction score reflecting the degree of preference of the object to its associated object.
  • the prediction score is normalized so that the prediction score is distributed in (0, 1). between. It can be seen that the higher the prediction score, the higher the preference of the object to its associated object.
  • the product under the user preference category is prioritized, so that the user's preference psychology can be replaced and replaced.
  • GMV gross merchandise transaction
  • the recommended data format for the third-level category preference is shown in the figure below.
  • the first column separated by spaces is the user name.
  • the second column is first separated by a comma.
  • the colon is preceded by a third-level category number and a colon.
  • the latter is the user's score for this three-category category preference, and the present disclosure ranks the user-recommended three-level category scores in descending order, making the score results clear and easy to analyze.
  • an object recognition method that is, a statistical user preference based on a user's historical behavior
  • This method participates in the identification of limited feature data, and the dimensions are relatively simple.
  • the user's order characteristics, the user's browsing characteristics, the user's attention characteristics, and the user's participation in shopping are all part of the user's interactive dimension characteristics and participation.
  • the identified feature data are independent of each other, ignoring the interaction between the feature data, and the potential meaning of the feature data cannot be mined.
  • the weight of each feature behavior depends on the analyst's experience, and the preference score cannot be based on online scores in time. The update makes it impossible to fully and effectively identify the object's preference for related objects.
  • the embodiment of the present disclosure adds more dimensional feature data, and takes into account the interaction relationship between the feature data, overcomes the use of only a single dimension of feature data for identification, and thus cannot comprehensively and effectively identify the object.
  • the defect of the degree of preference of the related object realizes the technical effect of comprehensively and effectively identifying the degree of preference of the object to the related object.
  • an object recognition method that is, a method based on time weight calculation
  • this method relies on time weights, while time weights are abnormal during large promotion periods, and on the other hand, depending on the purchase cycle model, it is not enough to accurately identify the object's preference for related objects.
  • the embodiment of the present disclosure inputs the feature data into the classifier, classifies the feature data of the input object by using the preset classifier, and identifies the preference degree of the object to the associated object according to the classification result, thereby improving the accuracy of identifying the user.
  • This automated mode of machine learning is of great significance to the company's intelligent model system.
  • the multi-dimensional feature data is used to train a classifier with high recognition capability, which can comprehensively, effectively and accurately identify the preference degree of the object to its associated object, and provide the user with a trusted commodity that can be trusted. It is of great significance for the preference map to identify the user's search intention, enhance the user experience, and intelligent platform services.
  • the system further includes: a second acquiring module, configured to acquire a training data set, wherein the training data set includes at least one type of feature data; and a training module, configured to use the feature data in the training data set Train to get the preset classifier.
  • a second acquiring module configured to acquire a training data set, wherein the training data set includes at least one type of feature data
  • a training module configured to use the feature data in the training data set Train to get the preset classifier.
  • the second obtaining module includes: a first acquiring unit, configured to acquire a sample data set; and a first calculating unit, configured to calculate each feature data included in the sample data set with respect to the sample data set Conditional information entropy of all feature data; a second calculating unit for calculating information entropy of all feature data in the sample data set; and a third calculating unit for calculating difference between information entropy and conditional information entropy to obtain information of each feature data a selection unit, configured to select feature data whose information gain satisfies a preset condition from the sample data set as the target feature data according to the obtained size relationship of the information gain; and a first determining unit configured to correspond to the target feature data
  • the data set acts as a training data set.
  • the training module includes: a second determining unit, configured to determine a root node and a leaf node of the first decision tree to generate a first decision tree according to a size relationship of information gain of the target feature data; a building unit for constructing a preset classifier according to the first decision tree.
  • the system further includes: a third acquiring module, configured to acquire a verification data set containing the feature data of the verification object; and a second processing module, configured to input the feature data of the verification object a decision tree, the second classification result is obtained; the determining module is configured to determine whether the preference object can be identified according to the second classification result, and the determining module is configured to: if the verification object cannot be identified according to the second classification result Determining the degree of preference of the associated object, determining the root node and the leaf node of the second decision tree based on the feature data of the check object to generate a second decision tree; and the first building block, configured to construct a preset classification according to the second decision tree Or a second building block, configured to construct a preset classifier according to the first decision tree and the second decision tree.
  • a third acquiring module configured to acquire a verification data set containing the feature data of the verification object
  • a second processing module configured to input the feature data of the verification object a decision tree, the second classification result is obtained
  • Another aspect of the present disclosure provides a computer readable storage medium having stored thereon executable instructions for performing the above object recognition method when executed by a processor.
  • Another aspect of the present disclosure also provides an object recognition system comprising: the above computer readable storage medium; and a processor.
  • FIG. 7 is a block diagram schematically showing a computer system to which an object recognition method of an embodiment of the present disclosure is applied.
  • computer system 700 in accordance with an embodiment of the present disclosure includes a processor 701 that can be loaded into random access memory (RAM) 703 according to a program stored in read only memory (ROM) 702 or from storage portion 708.
  • the program performs various appropriate actions and processes.
  • Processor 701 can include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor, and/or a related chipset and/or a special purpose microprocessor (e.g., an application specific integrated circuit (ASIC)), and the like.
  • Processor 701 can also include onboard memory for caching purposes.
  • the processor 701 may include a single processing unit or a plurality of processing units for performing different actions of the method flow according to the embodiments of the present disclosure described with reference to FIG. 3, FIGS. 4A-4D.
  • the processor 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704.
  • the processor 701 performs various operations of the object recognition described above with reference to FIG. 3, FIGS. 4A to 4D by executing programs in the ROM 702 and/or the RAM 703. It is noted that the program can also be stored in one or more memories other than ROM 702 and RAM 703.
  • the processor 701 can also perform various operations of the object recognition described above with reference to FIG. 3, FIGS. 4A to 4D by executing a program stored in the one or more memories.
  • System 700 may also include an input/output (I/O) interface 705 to which an input/output (I/O) interface 705 is also coupled, in accordance with an embodiment of the present disclosure.
  • System 500 can also include one or more of the following components coupled to I/O interface 705: an input portion 706 including a keyboard, mouse, etc.; including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker An output portion 707 of the like; a storage portion 708 including a hard disk or the like; and a communication portion 709 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 709 performs communication processing via a network such as the Internet.
  • Driver 710 is also connected to I/O interface 705 as needed.
  • a removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 710 as needed so that a computer program read therefrom is installed into the storage portion 708 as needed.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via communication portion 709, and/or installed from removable media 711.
  • the functions defined in the system of the present disclosure are performed when the computer program is executed by the processor 701.
  • the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two.
  • the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. .
  • Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
  • the computer readable medium may include one or more memories other than the ROM 702 and/or the RAM 703 and/or the ROM 702 and the RAM 703 described above.
  • each block in the flowchart or block diagram can represent a module, a program segment, or a portion of code, and a module, a program segment, or a portion of code includes one or more Executable instructions.
  • the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams or flowcharts, and combinations of blocks in the block diagrams or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be used A combination of dedicated hardware and computer instructions is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供了一种对象识别方法,该方法包括:获取对象的特征数据,其中,特征数据用于反映对象对其关联对象的偏好程度;将获取的特征数据输入预设分类器中,得到第一分类结果,其中,预设分类器用于依据输入的特征数据对对象进行分类;以及根据第一分类结果识别对象对关联对象的偏好程度。本公开还提供了一种对象识别***以及一种非易失性存储介质。

Description

对象识别方法及其*** 技术领域
本公开涉及数据处理领域,更具体地,涉及一种对象识别方法及其***。
背景技术
随着网络技术的飞速发展,各行各业积累了海量的数据,如何对海量数据进行准确有效地分析显得非常重要。例如,用户对购物应用上展示的商品进行操作会产生大量的操作数据,这些数据往往会反映用户的个性化偏好,而能够根据用户对商品类型的偏好对用户进行准确识别,对个性化推荐有着非常重大的意义。
目前,相关技术中提供了多种用户识别方案。然而,在实现本公开构思的过程中,发明人发现相关技术中至少存在如下问题:相关技术中识别用户的准确度不高。
针对相关技术中的上述问题,目前还未提出有效的解决方案。
发明内容
有鉴于此,本公开提供了一种能够提高对象识别准确度的对象识别方法及其***和***、以及计算机可读存储介质。
本公开的一个方面提供了一种对象识别方法,包括:获取对象的特征数据,其中,上述特征数据用于反映上述对象对其关联对象的偏好程度;将获取的特征数据输入预设分类器中,得到第一分类结果,其中,上述预设分类器用于依据输入的特征数据对上述对象进行分类;以及根据上述第一分类结果识别上述对象对上述关联对象的偏好程度。
根据本公开的实施例,在将获取的特征数据输入预设分类器中,得到第一分类结果之前,上述方法还包括:获取训练数据集,其中,上述训练数据集中至少包含一类特征数据;以及对上述训练数据集中的特征数据进行训练,得到上述预设分类器。
根据本公开的实施例,获取训练数据集包括:获取样本数据集;计算包含在上述样本数据集中的各特征数据相对于上述样本数据集的 所有特征数据的条件信息熵;计算上述样本数据集中所有特征数据的信息熵;计算上述信息熵和条件信息熵的差值,得到上述各特征数据的信息增益;根据得到的信息增益的大小关系,从上述样本数据集中选出信息增益满足预设条件的特征数据作为目标特征数据;以及将上述目标特征数据对应的数据集合作为上述训练数据集。
根据本公开的实施例,对上述训练数据集中的特征数据进行训练,得到上述预设分类器包括:根据上述目标特征数据的信息增益的大小关系,确定第一决策树的根节点和叶子节点以生成上述第一决策树;以及根据上述第一决策树构建上述预设分类器。
根据本公开的实施例,在根据上述第一决策树构建上述预设分类器之后,上述方法还包括:获取包含有校验对象的特征数据的校验数据集;将上述校验对象的特征数据输入上述决策树,得到第二分类结果;判断是否能够根据上述第二分类结果识别上述校验对象对其关联对象的偏好程度;若否,则基于上述校验对象的特征数据确定第二决策树的根节点和叶子节点以生成上述第二决策树;根据上述第二决策树构建上述预设分类器;或者根据上述第一决策树和上述第二决策树构建上述预设分类器。
本公开的另一个方面提供了一种对象识别***,包括:第一获取模块,用于获取对象的特征数据,其中,上述特征数据用于反映上述对象对其关联对象的偏好程度;第一处理模块,用于将获取的特征数据输入预设分类器中,得到第一分类结果,其中,上述预设分类器用于依据输入的特征数据对上述对象进行分类;以及识别模块,用于根据上述第一分类结果识别上述对象对上述关联对象的偏好程度。
根据本公开的实施例,上述***还包括:第二获取模块,用于获取训练数据集,其中,上述训练数据集中至少包含一类特征数据;以及训练模块,用于对上述训练数据集中的特征数据进行训练,得到上述预设分类器。
根据本公开的实施例,上述第二获取模块包括:第一获取单元,用于获取样本数据集;第一计算单元,用于计算包含在上述样本数据集中的各特征数据相对于上述样本数据集的所有特征数据的条件信息熵;第二计算单元,用于计算上述样本数据集中所有特征数据的信息 熵;第三计算单元,用于计算上述信息熵和条件信息熵的差值,得到上述各特征数据的信息增益;选择单元,用于根据得到的信息增益的大小关系,从上述样本数据集中选出信息增益满足预设条件的特征数据作为目标特征数据;以及第一确定单元,用于将上述目标特征数据对应的数据集合作为上述训练数据集。
根据本公开的实施例,上述训练模块包括:第二确定单元,用于根据上述目标特征数据的信息增益的大小关系,确定第一决策树的根节点和叶子节点以生成上述第一决策树;以及构建单元,用于根据上述第一决策树构建上述预设分类器。
根据本公开的实施例,上述***还包括:第三获取模块,用于获取包含有校验对象的特征数据的校验数据集;第二处理模块,用于将上述校验对象的特征数据输入上述决策树,得到第二分类结果;判断模块,用于判断是否能够根据上述第二分类结果识别上述校验对象对其关联对象的偏好程度;确定模块,用于若根据上述第二分类结果不能识别上述校验对象对其关联对象的偏好程度时,基于上述校验对象的特征数据确定第二决策树的根节点和叶子节点以生成上述第二决策树;第一构建模块,用于根据上述第二决策树构建上述预设分类器;或者第二构建模块,用于根据上述第一决策树和上述第二决策树构建上述预设分类器。
本公开的另一个方面提供了一种计算机可读存储介质,其上存储有可执行指令,上述指令被处理器执行时用于实现上述的对象识别方法。
本公开的另一个方面提供了一种对象识别***,包括:上述计算机可读存储介质;以及上述处理器。
附图说明
为了更完整地理解本公开及其优势,现在将参考结合附图的以下描述,其中:
图1示意性示出了根据本公开的对象识别方法及其***的示例性***架构;
图2示意性示出了根据本公开实施例的对象识别方法及其***的应 用场景图;
图3示意性示出了根据本公开实施例的对象识别方法的流程图;
图4A示意性示出了根据本公开另一实施例的对象识别方法的流程图;
图4B示意性示出了根据本公开另一实施例的对象识别方法的流程图;
图4C示意性示出了根据本公开另一实施例的对象识别方法的流程图;
图4D示意性示出了根据本公开另一实施例的对象识别方法的流程图;
图5示意性示出了根据本公开实施例的对象识别方法的识别结果图;
图6示意性示出了根据本公开实施例的对象识别***的框图;以及
图7示意性示出了应用本公开实施例的对象识别方法的计算机***的框图。
具体实施方式
以下,将参照附图来描述本公开的实施例。但是应该理解,这些描述只是示例性的,而并非要限制本公开的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本公开的概念。
在此使用的术语仅仅是为了描述具体实施例,而并非意在限制本公开。这里使用的词语“一”、“一个(种)”和“该”等也应包括“多个”、“多种”的意思,除非上下文另外明确指出。此外,在此使用的术语“包括”、“包含”等表明了特征、步骤、操作和/或部件的存在,但是并不排除存在或添加一个或多个其他特征、步骤、操作或部件。
在此使用的所有术语(包括技术和科学术语)具有本领域技术人员通常所理解的含义,除非另外定义。应注意,这里使用的术语应解释为具有与本说明书的上下文相一致的含义,而不应以理想化或过于刻板的方式来解释。
附图中示出了一些方框图和/或流程图。应理解,方框图和/或流程 图中的一些方框或其组合可以由计算机程序指令来实现。这些计算机程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理***的处理器,从而这些指令在由该处理器执行时可以创建用于实现这些方框图和/或流程图中所说明的功能/操作的***。
因此,本公开的技术可以硬件和/或软件(包括固件、微代码等)的形式来实现。另外,本公开的技术可以采取存储有指令的计算机可读介质上的计算机程序产品的形式,该计算机程序产品可供指令执行***使用或者结合指令执行***使用。在本公开的上下文中,计算机可读介质可以是能够包含、存储、传送、传播或传输指令的任意介质。例如,计算机可读介质可以包括但不限于电、磁、光、电磁、红外或半导体***、***、器件或传播介质。计算机可读介质的具体示例包括:磁存储***,如磁带或硬盘(HDD);光存储***,如光盘(CD-ROM);存储器,如随机存取存储器(RAM)或闪存;和/或有线/无线通信链路。
本公开的实施例提供了一种对象识别方法及其***。该方法包括数据获取过程和对象识别过程。在数据获取过程中,既需要获取能反映对象对其关联对象的特征数据,又需要获取用于依据输入的特征数据对对象进行分类的分类器。在完成数据获取之后,进入对象识别过程,此时可以将获取的对象的特征数据输入分类器中,得到分类结果,根据分类结果识别对象对关联对象的偏好程度,如识别一个或多个用户对商品分类的个性化偏好等。
图1示意性示出了根据本公开的对象识别方法及其***的示例性***架构。
如图1所示,***架构100可以包括终端设备101、终端设备102、终端设备103,网络104和服务器105。网络104用以在终端设备101、终端设备102、终端设备103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、终端设备102、终端设备103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、终端设备102、终端设备103上可以安装有各种通讯客户端应用,例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客 户端、社交平台软件等,在此不再赘述。
终端设备101、终端设备102、终端设备103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。
服务器105可以是提供各种服务的服务器,例如对用户利用终端设备101、终端设备102、终端设备103所浏览的购物类网站提供支持的后台管理服务器(仅为示例)。后台管理服务器可以对接收到的产品信息查询请求等数据进行分析等处理,并将处理结果(例如目标推送信息、产品信息--仅为示例)反馈给终端设备。
需要说明的是,本公开实施例所提供的对象识别方法可以由服务器105执行,也可以由不同于服务器105的另外一个服务器或者一个服务器集群执行。相应地,用于对象识别的***可以设置于服务器105中,也可以设置与服务器105以外的另一个服务器或者一个服务器集群中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
目前,越来越多的用户会选择在电商平台或者其他交易平台上进行网购,***会产生海量的交易数据,例如,在购物网站上,用户面对不同分类的商品,会对各商品分类(即品类)执行点击、浏览、关注、购买等操作,执行这些操作时,***会产生海量的操作数据,这些数据往往会反映用户的个性化偏好,而能够根据用户对商品类型的偏好对用户进行准确识别,对个性化推荐有着非常重大的意义。
图2示意性示出了根据本公开实施例的对象识别方法及其***的应用场景图。
如图2所示,在该应用场景中,对象可以是任意指定的用户,可以是一个也可以是多个,如可以是登录某交易平台的用户A,也可以登录某交易平台的用户A、用户B、用户C和用户D等。对象的关联对象可以是显示在网页上的品类,例如,可以是服装品类,也可以是手机通讯品类等。用户会对不同品类的商品执行点击、浏览、关注、购买等操作,同时后台会产生大量的数据,这些数据用于反映用户对品类的偏好程度。电商为了给用户做个性化推荐,一般需要根据用户会对不同品类的商品 执行点击、浏览、关注、购买等操作时,从产生的海量数据中识别出用户A、用户B、用户C和用户D分别偏好哪个品类。
图3示意性示出了根据本公开实施例的对象识别方法的流程图。
如图3所示,该方法包括操作S301~S303,其中:
操作S301,获取对象的特征数据,其中,特征数据用于反映对象对其关联对象的偏好程度。
根据获取的与点击行为、浏览行为、关注行为、购物车行为相关的用户数据,得到用于反映用户对其品类的偏好程度的特征数据,如表1所示,其中,特征数据可以包括但不限于用户对品类的特征(编号为1-10的特征数据)、纯用户维度的特征(编号为11-13的特征数据)和纯品类维度的特征(编号为14-27的特征数据)。
表1
编号 特征数据名称
1 用户三级品类浏览得分
2 用户三级品类关注得分
3 用户三级品类订单得分
4 用户三级品类分类页点击得分
5 用户三级品类下的品牌总得分
6 用户三级品类下的搜索总得分
7 用户三级品类下的购物车总得分
8 用户短期三级品类得分
9 用户偏好的三级品类价格
10 用户三级品类购买力等级
11 三级品类本身的平均价格
12 三级品类易购性因子
13 三级品类订单总数
14 用户购买商品数得分
15 用户购买品类数得分
16 用户下单频率得分
17 用户订单平均金额得分
18 用户订单最高金额得分
19 用户最近下单间隔得分
20 用户浏览页面数得分
21 用户浏览品类数得分
22 用户浏览频率得分
23 用户最近浏览间隔得分
24 用户浏览忠诚度得分
25 用户下单忠诚度得分
26 用户消费能力得分
27 用户价值总得分
需要说明的是,在获取对象的特征数据之前,根据过去用户在交易平台上对品类的点击行为、浏览行为、关注行为、购物车行为等,获取与行为相关的用户数据,如可以从浏览表中获取用户过去一个月浏览的商品;可以从关注表中获取用户过去三个月关注的商品;可以从订单表中获取用户过去一个年购买的商品;可以从点击表中获取用户过去一个月在某购物应用分类页点击的商品;可以从用户三级品类品牌得分表中获取用户三级品类品牌得分;可以从用户搜索表中获取用户三级品类搜索得分;可以从购物车表中获取用户三级品类加入购物车得分;可以从用户三级品类价格段表中获取用户偏好价格和购买力特征;可以从三级品类属性表中获取纯三级品类维度特征;还可以从用户价值表中获取纯用户维度特征,在此不做限定。
需要说明的是,在本公开实施例中,对商品的分类不作限定,它可以包括但不限于不同种类的商品,或者同一种类商品在不同维度上的分类,如三级品类。
在获取与点击行为、浏览行为、关注行为、购物车行为相关的用户数据之后,需要对数据进行清洗,如用户一天内浏览同一个三级品类超过100次,算作100次;关注数据中剔除取消关注的三级品类;一个用户同一天内购买多次三级品类,算作1次等。数据清洗的目的在于删除重复信息、纠正存在的错误,并提供数据的一致性。
在点击日志表中,用户点击过的三级品类,打标签1,用户没有点击过的三级品类打标签0,可以想象,若用户对该三级品类有偏好,则可以视为点击过,用户的标签为1,该标签可以作为识别用户偏好的重要指标。
操作S302,将获取的特征数据输入预设分类器中,得到第一分类结果,其中,预设分类器用于依据输入的特征数据对对象进行分类。
需要说明的是,预设分类器用于依据输入的特征数据对对象进行分类。将对象的特征数据输入预设分类器中,可以得到该对象的分类结果。如将用户A对应的100个三级品类的特征数据输入到分类器后,得到用户A的100个对应的三级品类的分类结果;若将用户A、用户B、用户C和用户D的100个三级品类的特征数据输入到分类器后, 得到4个用户的100个对应的三级品类的分类结果。
需要说明的是,对象可以是一个用户也可以是多个用户,用户可以是网站上的所有用户,也可以是部分用户。特征数据可以是表1中的一类或多类,在此不做限定。
操作S303,根据分类结果识别对象对关联对象的偏好程度。
分类结果可以是反映对象对其关联对象的偏好程度的预测得分,为了使各个结果达到等权重、等作用的效果,对预测分数做归一化处理,使预测分数分布在(0,1)之间。可以看出,预测得分越高,表明对象对其关联对象的偏好程度越高,在给用户推荐商品时,优先考虑用户偏好品类下的商品,这样可以更换的抓住用户的偏好心理,能够极大提升网购平台的用户体验,同时提升电商网站的点击率、商品交易总额(Gross Merchandise Volume,简称为GMV)和订单等指标。
三级品类偏好的推荐数据格式如下图所示,其中,每一行以空格分割后的第一列是用户名,第二列首先以逗号分割,逗号分割后冒号前面的是三级品类编号,冒号后面的是用户对这个三级品类偏好的分数,并且本公开对用户偏好的三级品类分数进行了降序排列,使得分数结果一目了然,易于分析。
A 12354:0.04422928,13783:0.04121400,12019:0.0318690
B 9724:0.04348646,6907:0.0429534
C 655:0.47220629,1350:0.12314458,6907:0.120676905
D 6739:0.05305680,9773:0.04604812,1595:0.04556597
表2
在表2中,第一行空格前面的A表示用户名,空格后面的先用逗号分割,12354表示一个三级品类编号,0.04422928表示用户A对12354这个三级品类的偏好分数,0.04121400表示用户A对13783这个三级品类的偏好分数,可以看出,本公开推荐的数据格式对用户偏好的三级品类分数进行了降序排列,很容易可以得出结论:用户A对品类12354这个三级品类比较偏好,同理可以识别出其他用户比较偏好的三级品类。
需要说明的是,表2只是对用户三级品类偏好分数的推荐格式,仅为示意,并非对三级品类数量、用户数量及格式的限定。
在实现本公开构思的过程中,发明人发现,相关技术中提供了一种对象识别方法,即基于用户历史行为的统计用户偏好。这种方法参与识别的特征数据有限,维度比较单一,通常是用户的订单特征、用户的浏览特征、用户的关注特征和用户加入购物等特征,都是属于用户对品类的交互维度特征,且参与识别的特征数据是相互独立的,忽略了特征数据之间的交互关系,不能挖掘特征数据的潜在含义,同时每一种特征行为的权重依赖于分析师的经验,偏好得分不能根据线上得分及时更新,导致无法全面、有效的识别出对象对相关对象的偏好程度。
与相关技术相比,本公开实施例加入更多维度的特征数据,且考虑到特征数据之间的交互关系,克服了仅仅使用单一维度的特征数据进行识别而导致无法全面、有效的识别出对象对相关对象的偏好程度的缺陷,实现全面、有效地识别出对象对相关对象的偏好程度的技术效果。
在实现本公开构思的过程中,发明人发现,相关技术中还提供了一种对象识别方法,即基于时间权重计算的方法。这种方法一方面依赖于时间权重,而时间权重会在大促期间异常,另一方面依赖于购买周期模型,导致不足以准确识别出对象对相关对象的偏好程度。
与相关技术相比,本公开实施例将特征数据输入分类器,利用预设分类器对输入对象的特征数据进行分类,根据分类结果识别对象对关联对象的偏好程度,提高识别用户的准确度。这种机器学习的自动化模式,对推荐公司的智能化模型体系有重大意义。
本公开实施例,利用多维度的特征数据训练出具有高识别能力的的分类器,能够全面、有效、准确的识别出对象对其关联对象的偏好程度,向用户提供可以信赖的偏好商品,构造用于偏好图,对于识别用户搜索意向,提升用户体验,智能化平台服务,都具有重大意义。
图4A示意性示出了根据本公开另一实施例的对象识别方法的流程图。
如图4A所示,在将获取的特征数据输入预设分类器中,得到第一分类结果之前,该方法还包括操作S401~S402,其中:
操作S401,获取训练数据集,其中,训练数据集中至少包含一类特征数据。
根据本公开的实施例,在将获取的特征数据输入预设分类器中,得到第一分类结果之前,先需要获取用来生成预设分类器的训练数据集。
需要说明的是,训练数据集中的特征数据选取比较灵活,可以根据处理能力选择,也可以根据处理精度要求自主选择。训练数据集可以包含一个对象的特征数据,也可以包含多个对象的特征数据,特征数据有用户对品类的交互偏好特征、纯用户维度的特征和纯品类维度,可以实现对数据的大规模,如可以是表2中的一个或多个特征数据。
操作S402,对训练数据集中的特征数据进行训练,得到预设分类器。
本公开的实施例,训练分类器可以包括多种方式/手段,在此不做限定。例如,可以使用随机森林和梯度提升树(Gradient Boosting Decision Tree,简称为GBDT)提供的机器学习算法进行训练,其中,本公开提供一种经过改进的GBDT,基本学习单元是回归决策树,对其中多个参数结合评估指标反馈,进行了最佳参数的调试。
本发明是改进的GBDT,通过调试最佳参数,模型本身就能发现特征之间的交互作用,比如分析师分析历史数据只想到了点击特征和订单特征,但是最新的数据显示把这两个特征进行某种合理的非线性组合就可以模拟出加入购物车特征,梯度提升树是可以做到挖掘更多的潜在特征,本发明用大数据技术Spark实现,每天做成订阅任务,数据更新及时,模型训练需要的时候也能随时自动化训练。
梯度提升树主要关注降低偏差,能基于泛化性能相当弱的学习器构建出很强的集成。梯度提升树是从弱学习算法出发,反复学习,得到一系列基本分类器,然后组合这些基本分类器,构成一个强分类器。提升方法实际采用加法模型与前向分布算法,以决策树为基函数的提升方法称为提升树,梯度提升算法是用最速下降法最小化损失函数,在回归预测问题上有比较好的效果。
在实现本公开构思的过程中,发明人发现,相关技术中提供的基 于历史行为的统计和基于时间权重的计算方法,依赖于分析师的业务经验,分析结果不能根据线上得分的变化而及时变化,导致识别结果不准确。
与相关技术相比,本公开实施例因为采用了预设分类器进行对象识别,预设分类器从训练数据集中训练出来,这种机器学习的自动化模式,可以对大数据进行识别,达到简化识别流程,提供识别效率和准确度的技术效果,对智能化模型体系有重大意义,同时训练过程可以实现自动化,训练结果随数据的变化而及时更新。
图4B示意性示出了根据本公开另一实施例的对象识别方法的流程图。
如图4B所示,获取训练数据集可以包括操作S501~S506,其中:
操作S501,获取样本数据集。
操作S502,计算包含在样本数据集中的各特征数据相对于样本数据集的所有特征数据的条件信息熵。
操作S503,计算样本数据集中所有特征数据的信息熵。
操作S504,计算信息熵和条件信息熵的差值,得到各特征数据的信息增益。
操作S505,根据得到的信息增益的大小关系,从样本数据集中选出信息增益满足预设条件的特征数据作为目标特征数据。
操作S506,将目标特征数据对应的数据集合作为训练数据集。
需要说明的是,样本数据集可以是特征数据集的全部,也可以是特征数据集的部分。找出局部最优的特征,也就是按照这个特征进行分类后,数据集是否更加能被尽量分开。衡量局部最优特征的方法是计算样本集中所有特征数据的信息熵和样本数据集中的各特征数据相对于样本数据集的所有特征数据的条件信息熵,得出每个特征数据的信息增益值,值越大代表特征数据越重要。
例如按照如下公式计算特征数据代表的信息量。
I e=-log 2p i
按照这个公式,认为小概率事件包含的信息量大,经常出现的事件 信息量小。
信息熵是信息量的期望,熵来衡量节点数据集合的有序性。
Figure PCTCN2018109020-appb-000001
信息增益表示在某个特征参与下,整个数据的不确定性减少了多少,信息增益越大,特征就越重要。
g(D,A)=H(D)-H(D|A)
H(D)表示样本数据集D中所有特征数据的信息熵,H(D|A)在样本数据集中的特征数据A相对于样本数据集D的所有特征数据的条件信息熵,g(D,A)表示特征数据A给样本数据集D带来的信息增益。如样本数据集D中所有特征数据的信息熵是80%,特征数据A相对于样本数据集D的所有特征数据的条件信息熵60%,那么特征数据A给样本数据集D带来的信息增益是20%,依此计算每个特征数据给样本数据集D带来的信息增益值。
在得到每个特征数据信息增益值之后,根据指定信息增益值的阈值,选择大于该阈值的特征数据对应的数据集作为训练数据集。
本公开实施例由于可以根据实际需要从特征数据集中按照每个特征数据对整个样本数据集带来的信息增益,筛选出重要性高的特征数据组成样本数据集,为后续分类器的提供了良好的数据基础。
图4C示意性示出了根据本公开另一实施例的对象识别方法的流程图。
如图4C所示,对训练数据集中的特征数据进行训练,得到预设分类器可以包括操作S601~S602,其中:
操作S601,根据目标特征数据的信息增益的大小关系,确定第一决策树的根节点和叶子节点以生成第一决策树。
操作S602,根据第一决策树构建预设分类器。
需要说明的是,每次选择特征数据时,都挑选当前条件下最优的那个特征作为划分规则,即局部最优的特征。
图5给出了决策树的示意图。
如图5所示,在训练样本集中,当特征的信息增益大小排序为订单特征>浏览特征>关注特征时,确定出决策树叶子节点为关注特征,订单特征和浏览特征为根节点。
决策树的生成和决策树的修剪过程时,在每次选择特征数据时,也要挑选当前条件下最优的那个特征作为划分规则,即局部最优的特征,在此不再赘述。
本公开的实施例,根据信息增益的大小,每次选择特征数据时,都挑选出当前条件下最优的那个特征作为划分依据,即局部最优特征,以此构建出的决策树分类能力较强。
图4D示意性示出了根据本公开另一实施例的对象识别方法的流程图。
如图4D所示,在根据第一决策树构建预设分类器之后,该方法还可以包括操作S701~S705,其中:
操作S701,获取包含有校验对象的特征数据的校验数据集;
操作S702,将校验对象的特征数据输入决策树,得到第二分类结果;
操作S703,判断是否能够根据第二分类结果识别校验对象对其关联对象的偏好程度;
操作S704,若否,则基于校验对象的特征数据确定第二决策树的根节点和叶子节点以生成第二决策树;
操作S705,根据第二决策树构建预设分类器;或者
根据第一决策树和第二决策树构建预设分类器。
构建预设分类器有两种方案,方案一是根据第二决策树构建预设分类器;方案二是根据第一决策树和第二决策树构建预设分类器。图4D只是示意性示出方案一的流程图,方案二在此不再赘述。
需要说明的是,GBDT里面的每一棵树看成是残差迭代树,残差最小化作为全局最优的方向,优化目标是让结果变成最好,后面每一棵回归树都在学***方误差损失函数时,每一棵回归树学习的是之前所有树的结论和残差,拟合得到一个当前 的残差回归树,残差的意义如公式:残差=真实值-预测值。提升树即是整个迭代过程生成的决策树的累加。
GBDT建立的梯度提升树残差用的是均方误差:
L(Y,F(X))=(Y-F(X)) 2/2
GBDT的优化目标是最小化损失函数:
J=∑ iL(Y i,F(X i))
求极小值的时候,把F(X i)当做参数,对损失函数求偏导数,
Figure PCTCN2018109020-appb-000002
所以在以均方误差作为损失函数的时候可以得出残差就是梯度的负方向:
Figure PCTCN2018109020-appb-000003
梯度下降和预测的关系可以表示如下
F(X i):=F(X i)+h(X i)
F(X i):=F(X i)+Y i-F(X i)
Figure PCTCN2018109020-appb-000004
本公开实施例中,根据第一决策树构建预设分类器之后,将校验对象的特征数据输入决策树,若能得到分类结果,则根据分类结果识别校验对象对其关联对象的偏好程度,若不能,则要学习第一决策树的残差,构建第二棵决策树,建立下一棵回归树减小残差就等价于建立下一棵决策树最小化梯度,更新到整个森林也是一样的,直到达到识别要求为止。
需要说明的是,分类器有多个决策树构成时,梯度提升树算法会根据每棵树的输出结果乘一个因子(0<v<1),其中
Figure PCTCN2018109020-appb-000005
是第m棵树的输出,而f(m)是前m棵树的预测结果,
Figure PCTCN2018109020-appb-000006
其中因子v可以看做是每棵树的权重,它可以控制提升过程的学***衡的过程。
GBDT预测用户三级品类偏好,评估指标使用AUC,为了提高验证集中的AUC评估指标,通过观察训练集中AUC的变化,调试GBDT的特征标签组合和GBDT固有的参数,使AUC达到0.8以上。
通常影响梯度提升树GBDT的参数可以分为三类:
1.影响提升算法运行的参数:
a.LearningRate:学习率。GBDT算法就是通过对初始模型进行一次次的调整来实现的,学习率就是衡量每次调整幅度的一个参数。理论上,这个参数越小,迭代出的结果往往越好,但需要的计算成本越大;
b.NumIterations:弱分类器数量。就是生成的所有的弱学习器的数目,因为GBDT是串行的,就是迭代的次数,当然不是越多越好,提升算法也会有过拟合的风险;
c.样本子集所占比例:用来训练弱学习器的样本子集占样本总体的比重,一般都是随机抽样以降低方差,默认是选总体80%的样本来训练。
2.单独影响每个弱学习器(回归树)的参数:
a.分支最小样本量:一个节点想要继续分支所需要的最小样本数;
b.叶节点最小样本量:一个节点要划分为叶节点所需最小样本数;
c.树最大深度:树的层次,树越大越有过拟合的风险;
d.最大叶节点量:叶节点的最大数目,和树最大深度可以相互替代;
e.树的分类数量:NumClasses;
f.MaxBins最大特征子集量。
3.学习目标参数,主要是下面两个:
a.损失函数;b.随机种子。
以上对提升树的效果评估不是本发明的重点,只要能实现相同目的的方法都在本发明保护范围内。
在本公开实施例中,梯度提升树主要关注降低偏差,能基于泛化性能相当弱的学习器构建出很强的集成。梯度提升树是从弱学习算法出发,反复学习,得到一系列基本分类器,然后组合这些基本分类器,构成一个强分类器。提升方法实际采用加法模型与前向分布算法,以决策树为基函数的提升方法称为提升树,梯度提升算法是用最速下降法最小化损失函数,在回归预测问题上有比较好的效果。
图6示意性示出了根据本公开实施例的对象识别***的框图。
如图6所示,该对象识别***包括:第一获取模块601、第一处理模块602、识别模块603。
第一获取模块601用于获取对象的特征数据,其中,特征数据用于反映对象对其关联对象的偏好程度。
根据获取的与点击行为、浏览行为、关注行为、购物车行为相关的用户数据,得到用于反映用户对其品类的偏好程度的特征数据,如表1所示,其中,特征数据可以包括但不限于用户对品类的特征(编号为1-10的特征数据)、纯用户维度的特征(编号为11-13的特征数据)和纯品类维度的特征(编号为14-27的特征数据)。
需要说明的是,在获取对象的特征数据之前,根据过去用户在交易平台上对品类的点击行为、浏览行为、关注行为、购物车行为等,获取与行为相关的用户数据,如可以从浏览表中获取用户过去一个月浏览的商品;可以从关注表中获取用户过去三个月关注的商品;可以从订单表中获取用户过去一个年购买的商品;可以从点击表中获取用户过去一个月在某购物应用分类页点击的商品;可以从用户三级品类品牌得分表中获取用户三级品类品牌得分;可以从用户搜索表中获取用户三级品类搜索得分;可以从购物车表中获取用户三级品类加入购物车得分;可以从用户三级品类价格段表中获取用户偏好价格和购买力特征;可以从三级品类属性表中获取纯三级品类维度特征;还可以从用户价值表中获取纯用户维度特征,在此不做限定。
需要说明的是,在本公开实施例中,对商品的分类不作限定,它 可以包括但不限于不同种类的商品,或者同一种类商品在不同维度上的分类,如三级品类。
在获取与点击行为、浏览行为、关注行为、购物车行为相关的用户数据之后,需要对数据进行清洗,如用户一天内浏览同一个三级品类超过100次,算作100次;关注数据中剔除取消关注的三级品类;一个用户同一天内购买多次三级品类,算作1次等。数据清洗的目的在于删除重复信息、纠正存在的错误,并提供数据的一致性。
在点击日志表中,用户点击过的三级品类,打标签1,用户没有点击过的三级品类打标签0,可以想象,若用户对该三级品类有偏好,则可以视为点击过,用户的标签为1,该标签可以作为识别用户偏好的重要指标。
第一处理模块602用于将获取的特征数据输入预设分类器中,得到第一分类结果,其中,预设分类器用于依据输入的特征数据对对象进行分类。
需要说明的是,预设分类器用于依据输入的特征数据对对象进行分类。将对象的特征数据输入预设分类器中,可以得到该对象的分类结果。如将用户A对应的100个三级品类的特征数据输入到分类器后,得到用户A的100个对应的三级品类的分类结果;若将用户A、用户B、用户C和用户D的100个三级品类的特征数据输入到分类器后,得到4个用户的100个对应的三级品类的分类结果。
需要说明的是,对象可以是一个用户也可以是多个用户,用户可以是网站上的所有用户,也可以是部分用户。特征数据可以是表1中的一类或多类,在此不做限定。
识别模块603用于根据分类结果识别对象对关联对象的偏好程度。
分类结果可以是反映对象对其关联对象的偏好程度的预测得分,为了使各个结果达到等权重、等作用的效果,对预测分数做归一化处理,使预测分数分布在(0,1)之间。可以看出,预测得分越高,表明对象对其关联对象的偏好程度越高,在给用户推荐商品时,优先考虑用户偏好品类下的商品,这样可以更换的抓住用户的偏好心理,能够极大提升网购平台的用户体验,同时提升电商网站的点击率、商品交易总 额(Gross Merchandise Volume,简称为GMV)和订单等指标。
三级品类偏好的推荐数据格式如下图所示,其中,每一行以空格分割后的第一列是用户名,第二列首先以逗号分割,逗号分割后冒号前面的是三级品类编号,冒号后面的是用户对这个三级品类偏好的分数,并且本公开对用户偏好的三级品类分数进行了降序排列,使得分数结果一目了然,易于分析。
在实现本公开构思的过程中,发明人发现,相关技术中提供了一种对象识别方法,即基于用户历史行为的统计用户偏好。这种方法参与识别的特征数据有限,维度比较单一,通常是用户的订单特征、用户的浏览特征、用户的关注特征和用户加入购物等特征,都是属于用户对品类的交互维度特征,且参与识别的特征数据是相互独立的,忽略了特征数据之间的交互关系,不能挖掘特征数据的潜在含义,同时每一种特征行为的权重依赖于分析师的经验,偏好得分不能根据线上得分及时更新,导致无法全面、有效的识别出对象对相关对象的偏好程度。
与相关技术相比,本公开实施例加入更多维度的特征数据,且考虑到特征数据之间的交互关系,克服了仅仅使用单一维度的特征数据进行识别而导致无法全面、有效的识别出对象对相关对象的偏好程度的缺陷,实现全面、有效地识别出对象对相关对象的偏好程度的技术效果。
在实现本公开构思的过程中,发明人发现,相关技术中还提供了一种对象识别方法,即基于时间权重计算的方法。这种方法一方面依赖于时间权重,而时间权重会在大促期间异常,另一方面依赖于购买周期模型,导致不足以准确识别出对象对相关对象的偏好程度。
与相关技术相比,本公开实施例将特征数据输入分类器,利用预设分类器对输入对象的特征数据进行分类,根据分类结果识别对象对关联对象的偏好程度,提高识别用户的准确度。这种机器学习的自动化模式,对推荐公司的智能化模型体系有重大意义。
本公开实施例,利用多维度的特征数据训练出具有高识别能力的的分类器,能够全面、有效、准确的识别出对象对其关联对象的偏好程度,向用户提供可以信赖的偏好商品,构造用于偏好图,对于识别用户搜索意向,提升用户体验,智能化平台服务,都具有重大意义。
作为一种可选的实施方式,***还包括:第二获取模块,用于获取训练数据集,其中,训练数据集中至少包含一类特征数据;以及训练模块,用于对训练数据集中的特征数据进行训练,得到预设分类器。
作为一种可选的实施方式,第二获取模块包括:第一获取单元,用于获取样本数据集;第一计算单元,用于计算包含在样本数据集中的各特征数据相对于样本数据集的所有特征数据的条件信息熵;第二计算单元,用于计算样本数据集中所有特征数据的信息熵;第三计算单元,用于计算信息熵和条件信息熵的差值,得到各特征数据的信息增益;选择单元,用于根据得到的信息增益的大小关系,从样本数据集中选出信息增益满足预设条件的特征数据作为目标特征数据;以及第一确定单元,用于将目标特征数据对应的数据集合作为训练数据集。
作为一种可选的实施方式,训练模块包括:第二确定单元,用于根据目标特征数据的信息增益的大小关系,确定第一决策树的根节点和叶子节点以生成第一决策树;以及构建单元,用于根据第一决策树构建预设分类器。
作为一种可选的实施方式,***还包括:第三获取模块,用于获取包含有校验对象的特征数据的校验数据集;第二处理模块,用于将校验对象的特征数据输入决策树,得到第二分类结果;判断模块,用于判断是否能够根据第二分类结果识别校验对象对其关联对象的偏好程度;确定模块,用于若根据第二分类结果不能识别校验对象对其关联对象的偏好程度时,基于校验对象的特征数据确定第二决策树的根节点和叶子节点以生成第二决策树;第一构建模块,用于根据第二决策树构建预设分类器;或者第二构建模块,用于根据第一决策树和第二决策树构建预设分类器。
需要说明的是,装置部分实施例中各模块/单元/子单元等的实施方式、解决的技术问题、实现的功能、以及达到的技术效果分别与方法部分实施例中各对应的步骤的实施方式、解决的技术问题、实现的功能、以及达到的技术效果相同或类似,在此不再赘述。
本公开的另一个方面提供了一种计算机可读存储介质,其上存储有可执行指令,指令被处理器执行时用于实现的上述对象识别方法。
本公开的另一个方面还提供了一种对象识别***,包括:上述计算机可读存储介质;以及处理器。
图7示意性示出了应用本公开实施例的对象识别方法的计算机***的框图。
如图7所示,根据本公开实施例的计算机***700包括处理器701,其可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。处理器701例如可以包括通用微处理器(例如CPU)、指令集处理器和/或相关芯片组和/或专用微处理器(例如,专用集成电路(ASIC)),等等。处理器701还可以包括用于缓存用途的板载存储器。处理器701可以包括用于执行参考图3、图4A~图4D描述的根据本公开实施例的方法流程的不同动作的单一处理单元或者是多个处理单元。
在RAM 703中,存储有***700操作所需的各种程序和数据。处理器701、ROM 702以及RAM 703通过总线704彼此相连。处理器701通过执行ROM 702和/或RAM 703中的程序来执行以上参考图3、图4A~图4D描述的对象识别的各种操作。需要注意,所述程序也可以存储在除ROM 702和RAM 703以外的一个或多个存储器中。处理器701也可以通过执行存储在所述一个或多个存储器中的程序来执行以上参考图3、图4A~图4D描述的对象识别的各种操作。
根据本公开的实施例,***700还可以包括输入/输出(I/O)接口705,输入/输出(I/O)接口705也连接至总线704。***500还可以包括连接至I/O接口705的以下部件中的一项或多项:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装入存储部分708。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以 被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分709从网络上被下载和安装,和/或从可拆卸介质711被安装。在该计算机程序被处理器701执行时,执行本公开的***中限定的功能。
需要说明的是,本公开所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。根据本公开的实施例,计算机可读介质可以包括上文描述的ROM 702和/或RAM 703和/或ROM 702和RAM 703以外的一个或多个存储器。
附图中的流程图和框图,图示了按照本公开各种实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码 的一部分,模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。
本领域技术人员可以理解,本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合或/或结合,即使这样的组合或结合没有明确记载于本公开中。特别地,在不脱离本公开精神和教导的情况下,本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合。所有这些组合和/或结合均落入本公开的范围。
尽管已经参照本公开的特定示例性实施例示出并描述了本公开,但是本领域技术人员应该理解,在不背离所附权利要求及其等同物限定的本公开的精神和范围的情况下,可以对本公开进行形式和细节上的多种改变。因此,本公开的范围不应该限于实施例,而是应该不仅由所附权利要求来进行确定,还由所附权利要求的等同物来进行限定。

Claims (12)

  1. 一种对象识别方法,包括:
    获取对象的特征数据,其中,所述特征数据用于反映所述对象对其关联对象的偏好程度;
    将获取的特征数据输入预设分类器中,得到第一分类结果,其中,所述预设分类器用于依据输入的特征数据对所述对象进行分类;以及
    根据所述第一分类结果识别所述对象对所述关联对象的偏好程度。
  2. 根据权利要求1所述的方法,其中,在将获取的特征数据输入预设分类器中,得到第一分类结果之前,所述方法还包括:
    获取训练数据集,其中,所述训练数据集中至少包含一类特征数据;以及
    对所述训练数据集中的特征数据进行训练,得到所述预设分类器。
  3. 根据权利要求2所述的方法,其中,获取训练数据集包括:
    获取样本数据集;
    计算包含在所述样本数据集中的各特征数据相对于所述样本数据集的所有特征数据的条件信息熵;
    计算所述样本数据集中所有特征数据的信息熵;
    计算所述信息熵和条件信息熵的差值,得到所述各特征数据的信息增益;
    根据得到的信息增益的大小关系,从所述样本数据集中选出信息增益满足预设条件的特征数据作为目标特征数据;以及
    将所述目标特征数据对应的数据集合作为所述训练数据集。
  4. 根据权利要求3所述的方法,其中,对所述训练数据集中的特征数据进行训练,得到所述预设分类器包括:
    根据所述目标特征数据的信息增益的大小关系,确定第一决策树的根节点和叶子节点以生成所述第一决策树;以及
    根据所述第一决策树构建所述预设分类器。
  5. 根据权利要求4所述的方法,其中,在根据所述第一决策树构建所述预设分类器之后,所述方法还包括:
    获取包含有校验对象的特征数据的校验数据集;
    将所述校验对象的特征数据输入所述决策树,得到第二分类结果;
    判断是否能够根据所述第二分类结果识别所述校验对象对其关联对象的偏好程度;
    若否,则基于所述校验对象的特征数据确定第二决策树的根节点和叶子节点以生成所述第二决策树;
    根据所述第二决策树构建所述预设分类器;或者
    根据所述第一决策树和所述第二决策树构建所述预设分类器。
  6. 一种对象识别***,包括:
    第一获取模块,用于获取对象的特征数据,其中,所述特征数据用于反映所述对象对其关联对象的偏好程度;
    第一处理模块,用于将获取的特征数据输入预设分类器中,得到第一分类结果,其中,所述预设分类器用于依据输入的特征数据对所述对象进行分类;以及
    识别模块,用于根据所述第一分类结果识别所述对象对所述关联对象的偏好程度。
  7. 根据权利要求6所述的***,其中,所述***还包括:
    第二获取模块,用于获取训练数据集,其中,所述训练数据集中至少包含一类特征数据;以及
    训练模块,用于对所述训练数据集中的特征数据进行训练,得到所述预设分类器。
  8. 根据权利要求7所述的***,其中,所述第二获取模块包括:
    第一获取单元,用于获取样本数据集;
    第一计算单元,用于计算包含在所述样本数据集中的各特征数据相对于所述样本数据集的所有特征数据的条件信息熵;
    第二计算单元,用于计算所述样本数据集中所有特征数据的信息熵;
    第三计算单元,用于计算所述信息熵和条件信息熵的差值,得到所述各特征数据的信息增益;
    选择单元,用于根据得到的信息增益的大小关系,从所述样本数据集中选出信息增益满足预设条件的特征数据作为目标特征数据;以及
    第一确定单元,用于将所述目标特征数据对应的数据集合作为所述训练数据集。
  9. 根据权利要求8所述的***,其中,所述训练模块包括:
    第二确定单元,用于根据所述目标特征数据的信息增益的大小关系,确定第一决策树的根节点和叶子节点以生成所述第一决策树;以及
    构建单元,用于根据所述第一决策树构建所述预设分类器。
  10. 根据权利要求9所述的***,其中,所述***还包括:
    第三获取模块,用于获取包含有校验对象的特征数据的校验数据集;
    第二处理模块,用于将所述校验对象的特征数据输入所述决策树,得到第二分类结果;
    判断模块,用于判断是否能够根据所述第二分类结果识别所述校验对象对其关联对象的偏好程度;
    确定模块,用于若根据所述第二分类结果不能识别所述校验对象对其关联对象的偏好程度时,基于所述校验对象的特征数据确定第二决策树的根节点和叶子节点以生成所述第二决策树;
    第一构建模块,用于根据所述第二决策树构建所述预设分类器;或者
    第二构建模块,用于根据所述第一决策树和所述第二决策树构建所述预设分类器。
  11. 一种计算机可读存储介质,其上存储有可执行指令,所述指令被处理器执行时用于实现权利要求1至5中任一项所述的对象识别***。
  12. 一种对象识别***,包括:
    权利要求11所述的计算机可读存储介质;以及
    处理器。
PCT/CN2018/109020 2017-10-09 2018-09-30 对象识别方法及其*** WO2019072128A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710933661.3A CN109636430A (zh) 2017-10-09 2017-10-09 对象识别方法及其***
CN201710933661.3 2017-10-09

Publications (1)

Publication Number Publication Date
WO2019072128A1 true WO2019072128A1 (zh) 2019-04-18

Family

ID=66051091

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/109020 WO2019072128A1 (zh) 2017-10-09 2018-09-30 对象识别方法及其***

Country Status (2)

Country Link
CN (1) CN109636430A (zh)
WO (1) WO2019072128A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348471A (zh) * 2019-05-23 2019-10-18 平安科技(深圳)有限公司 异常对象识别方法、装置、介质及电子设备
CN110688553A (zh) * 2019-08-13 2020-01-14 平安科技(深圳)有限公司 基于数据分析的信息推送方法、装置、计算机设备及存储介质
CN112287196A (zh) * 2020-08-28 2021-01-29 北京沃东天骏信息技术有限公司 对象识别方法及装置、计算机可读存储介质、电子设备
CN112529289A (zh) * 2020-12-07 2021-03-19 北京嘀嘀无限科技发展有限公司 人流扩散风险的预测方法、装置、电子设备及存储介质
CN112991110A (zh) * 2021-04-25 2021-06-18 湖南知名未来科技有限公司 多维度画像标准的客户类型识别方法及知识产权监控***
CN113205189A (zh) * 2021-05-12 2021-08-03 北京百度网讯科技有限公司 训练预测模型的方法、预测方法及装置
CN113657941A (zh) * 2021-08-20 2021-11-16 北京沃东天骏信息技术有限公司 策略生成方法、生成装置、电子设备以及可读存储介质
CN113743435A (zh) * 2020-06-22 2021-12-03 北京沃东天骏信息技术有限公司 业务数据分类模型的训练、业务数据的分类方法和装置
CN115146725A (zh) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 对象分类模式的确定方法、对象分类方法、装置和设备
CN115935278A (zh) * 2023-03-08 2023-04-07 深圳市大数据研究院 环境识别方法、电子设备、计算机可读存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541776A (zh) * 2019-09-20 2021-03-23 北京达佳互联信息技术有限公司 数据处理方法、装置、电子设备及存储介质
CN111310074B (zh) * 2020-02-13 2023-08-18 北京百度网讯科技有限公司 兴趣点的标签优化方法、装置、电子设备和计算机可读介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750789A (zh) * 2015-03-12 2015-07-01 百度在线网络技术(北京)有限公司 标签的推荐方法及装置
CN104951441A (zh) * 2014-03-24 2015-09-30 阿里巴巴集团控股有限公司 一种对对象进行排序的方法及装置
CN105975929A (zh) * 2016-05-04 2016-09-28 北京大学深圳研究生院 一种基于聚合通道特征的快速行人检测方法
CN106204122A (zh) * 2016-07-05 2016-12-07 北京京东尚科信息技术有限公司 触点价值度量方法和装置
WO2017107026A1 (en) * 2015-12-21 2017-06-29 Zhaohui Zheng Method and system for exploring a personal interest space

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186539B (zh) * 2011-12-27 2016-07-27 阿里巴巴集团控股有限公司 一种确定用户群体、信息查询及推荐的方法及***
CN105469263A (zh) * 2014-09-24 2016-04-06 阿里巴巴集团控股有限公司 一种商品推荐方法及装置
CN106228389A (zh) * 2016-07-14 2016-12-14 武汉斗鱼网络科技有限公司 基于随机森林算法的网络潜力用户挖掘方法及***

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951441A (zh) * 2014-03-24 2015-09-30 阿里巴巴集团控股有限公司 一种对对象进行排序的方法及装置
CN104750789A (zh) * 2015-03-12 2015-07-01 百度在线网络技术(北京)有限公司 标签的推荐方法及装置
WO2017107026A1 (en) * 2015-12-21 2017-06-29 Zhaohui Zheng Method and system for exploring a personal interest space
CN105975929A (zh) * 2016-05-04 2016-09-28 北京大学深圳研究生院 一种基于聚合通道特征的快速行人检测方法
CN106204122A (zh) * 2016-07-05 2016-12-07 北京京东尚科信息技术有限公司 触点价值度量方法和装置

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348471B (zh) * 2019-05-23 2023-09-01 平安科技(深圳)有限公司 异常对象识别方法、装置、介质及电子设备
CN110348471A (zh) * 2019-05-23 2019-10-18 平安科技(深圳)有限公司 异常对象识别方法、装置、介质及电子设备
CN110688553A (zh) * 2019-08-13 2020-01-14 平安科技(深圳)有限公司 基于数据分析的信息推送方法、装置、计算机设备及存储介质
CN113743435A (zh) * 2020-06-22 2021-12-03 北京沃东天骏信息技术有限公司 业务数据分类模型的训练、业务数据的分类方法和装置
CN112287196A (zh) * 2020-08-28 2021-01-29 北京沃东天骏信息技术有限公司 对象识别方法及装置、计算机可读存储介质、电子设备
CN112529289A (zh) * 2020-12-07 2021-03-19 北京嘀嘀无限科技发展有限公司 人流扩散风险的预测方法、装置、电子设备及存储介质
CN112991110A (zh) * 2021-04-25 2021-06-18 湖南知名未来科技有限公司 多维度画像标准的客户类型识别方法及知识产权监控***
CN112991110B (zh) * 2021-04-25 2024-02-02 湖南知名未来科技有限公司 多维度画像标准的客户类型识别方法及知识产权监控***
CN113205189A (zh) * 2021-05-12 2021-08-03 北京百度网讯科技有限公司 训练预测模型的方法、预测方法及装置
CN113205189B (zh) * 2021-05-12 2024-02-27 北京百度网讯科技有限公司 训练预测模型的方法、预测方法及装置
CN113657941A (zh) * 2021-08-20 2021-11-16 北京沃东天骏信息技术有限公司 策略生成方法、生成装置、电子设备以及可读存储介质
CN115146725B (zh) * 2022-06-30 2023-05-30 北京百度网讯科技有限公司 对象分类模式的确定方法、对象分类方法、装置和设备
CN115146725A (zh) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 对象分类模式的确定方法、对象分类方法、装置和设备
CN115935278B (zh) * 2023-03-08 2023-06-20 深圳市大数据研究院 环境识别方法、电子设备、计算机可读存储介质
CN115935278A (zh) * 2023-03-08 2023-04-07 深圳市大数据研究院 环境识别方法、电子设备、计算机可读存储介质

Also Published As

Publication number Publication date
CN109636430A (zh) 2019-04-16

Similar Documents

Publication Publication Date Title
WO2019072128A1 (zh) 对象识别方法及其***
US20160267377A1 (en) Review Sentiment Analysis
US10360644B2 (en) User characteristics-based sponsored company postings
WO2017190610A1 (zh) 目标用户定向方法、装置和计算机存储介质
US9449283B1 (en) Selecting a training strategy for training a machine learning model
AU2016346497A1 (en) Method and system for performing a probabilistic topic analysis of search queries for a customer support system
US9715486B2 (en) Annotation probability distribution based on a factor graph
CN111095330B (zh) 用于预测在线用户交互的机器学习方法和***
US20160180455A1 (en) Generating device, generating method, and non-transitory computer readable storage medium
JP2017016485A (ja) 算出装置、算出方法及び算出プログラム
US10089675B1 (en) Probabilistic matrix factorization system based on personas
US10290032B2 (en) Blacklisting based on image feature analysis and collaborative filtering
CN109034853B (zh) 基于种子用户寻找相似用户方法、装置、介质和电子设备
JP6560323B2 (ja) 判定装置、判定方法及び判定プログラム
US11741111B2 (en) Machine learning systems architectures for ranking
US20190220909A1 (en) Collaborative Filtering to Generate Recommendations
CN110717597A (zh) 利用机器学习模型获取时序特征的方法和装置
Wang et al. Viewability prediction for online display ads
CN113763019A (zh) 一种用户信息管理方法和装置
US20230297862A1 (en) Performing predictive inferences using multiple predictive models
CN110490682B (zh) 分析商品属性的方法和装置
CN113450172A (zh) 一种商品推荐方法和装置
CN107357847B (zh) 数据处理方法及其装置
US20150339693A1 (en) Determination of initial value for automated delivery of news items
US20160171608A1 (en) Methods and systems for finding similar funds

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18866146

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18866146

Country of ref document: EP

Kind code of ref document: A1