CN107229640A - Similarity processing method, object screening technique and device - Google Patents

Similarity processing method, object screening technique and device Download PDF

Info

Publication number
CN107229640A
CN107229640A CN201610174122.1A CN201610174122A CN107229640A CN 107229640 A CN107229640 A CN 107229640A CN 201610174122 A CN201610174122 A CN 201610174122A CN 107229640 A CN107229640 A CN 107229640A
Authority
CN
China
Prior art keywords
value
similarity
attributes
filtered out
clustering factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610174122.1A
Other languages
Chinese (zh)
Inventor
郑苏杭
徐萧萧
应倩岚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610174122.1A priority Critical patent/CN107229640A/en
Priority to TW106106682A priority patent/TW201800966A/en
Priority to PCT/CN2017/076424 priority patent/WO2017162063A1/en
Publication of CN107229640A publication Critical patent/CN107229640A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of similarity processing method, object screening technique and device.The similarity processing method includes:Design conditions are obtained, wherein, in the case where design conditions are satisfied, the maximum that can calculate the object number of similarity two-by-two is k;I object is filtered out from n object according to design conditions, wherein, i is less than or equal to n, and i is less than or equal to k;Similarity is calculated two-by-two to i object.Present application addresses the larger caused technical problem of Similarity Measure.

Description

Similarity processing method, object screening technique and device
Technical field
The application is related to Similarity Measure field, in particular to a kind of similarity processing method, object screening side Method and device.
Background technology
In the prior art, the process for calculating cosine similarity is not difficult in itself, but under the background that big data is applied, association It is the problem of calculating performance to filter the main bottleneck faced together.Applicating cooperation filtering needs to calculate once between individual two-by-two Similarity, it is assumed that have N number of object, then computation complexity is N2
Inventor has found that in actual application, calculation scale is than larger.So that scene is recommended by Taobao as an example, such as Fruit uses the collaborative filtering based on commodity, the online commodity of Taobao 800,000,000, then computation complexity then for 800,000,000 it is flat Side, this calculation scale is unaffordable.This large-scale calculating can cause the presence of some problems, for example, Need carry out computation complexity for 800,000,000 square calculating, then be accomplished by substantial amounts of server, if at present layout Server be insufficient to many, will cause server unanimously in computing at full capacity, will be unable to for other requests Response, can cause bad consequence to occur.
In addition, inventor also found, in other scenes, it is also possible to there are some requirements to calculating, for example, Requirement of calculating time etc., time requirement is unable to reach if calculation scale is larger.
For Similarity Measure in correlation technique it is larger caused by the problem of, effective solution party is not yet proposed at present Case.
The content of the invention
The embodiment of the present application provides a kind of similarity processing method, object screening technique and device, at least to solve phase The problem of like caused by degree calculation scale is larger.
According to the one side of the embodiment of the present application there is provided a kind of similarity processing method, this method includes:Obtain Design conditions, wherein, in the case where the design conditions are satisfied, the object number of similarity two-by-two can be calculated Maximum be k;I object is filtered out from n object according to the design conditions, wherein, i is less than or equal to N, i are less than or equal to k;Similarity is calculated two-by-two to the i object.
According to the another aspect of the embodiment of the present application, a kind of similarity processing unit is additionally provided, the device includes:The One acquisition module, for obtaining design conditions, wherein, in the case where the design conditions are satisfied, it can calculate The maximum of the object number of similarity is k two-by-two;First screening module, for individual from n according to the design conditions I object is filtered out in object, wherein, i is less than or equal to n, and i is less than or equal to k;Second computing module, for institute State i object and calculate similarity two-by-two.
According to the one side of the embodiment of the present application, a kind of similarity processing method is additionally provided, this method includes:One Plant object screening technique, it is characterised in that including:Obtain each object difference in n object corresponding one or many The value of individual attribute;I are filtered out from the n object according to the value of one or more attributes of each object Similar object.
According to the one side of the embodiment of the present application, a kind of similarity processing method is additionally provided, this method includes:Root The value of corresponding one or more attributes is distinguished according to each object in n object, i is filtered out from the n object Individual object;Similarity is calculated two-by-two to the i object.
According to the one side of the embodiment of the present application, a kind of object screening plant is additionally provided, the device includes:Second Acquisition module, the value of corresponding one or more attributes is distinguished for obtaining each object in n object;Second screening Module, the value for one or more attributes according to each object filters out i phase from the n object As object.
According to the one side of the embodiment of the present application, a kind of similarity processing unit is additionally provided, the device includes:The Three screening modules, the value for distinguishing corresponding one or more attributes according to each object in n object, from described I object is filtered out in n object;Second computing module, for calculating similarity two-by-two to the i object.
In the embodiment of the present application, using according to design conditions come the radical of the object of similarity two-by-two of Reduction Computation as far as possible, The purpose for calculating object is rationally screened according to design conditions so as to reach, Similarity Measure in correlation technique is solved and advises The problem of caused by mould is larger.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please is used to explain the application, does not constitute the improper restriction to the application.In accompanying drawing In:
Fig. 1 is a kind of hardware block diagram of the computer equipment of similarity processing method according to the embodiment of the present application;
Fig. 2 is the flow chart of the similarity processing method according to the application first embodiment;
Fig. 3 is the schematic diagram of multiple server load balancings according to the embodiment of the present application;
Fig. 4 is the flow chart of the similarity processing method according to the application second embodiment;
Fig. 5 is the flow chart of the similarity processing method according to the application 3rd embodiment;
Fig. 6 is the schematic diagram of the similarity processing procedure according to the embodiment of the present application;
Fig. 7 is the flow chart of the object screening technique according to the embodiment of the present application;
Fig. 8 is the flow chart of the similarity processing method according to the application fourth embodiment
Fig. 9 is the schematic diagram of the similarity processing unit according to the application first embodiment;
Figure 10 is the schematic diagram of the similarity processing unit according to the application second embodiment;
Figure 11 is the schematic diagram of the similarity processing unit according to the application 3rd embodiment;
Figure 12 is the schematic diagram of the similarity processing unit according to the application fourth embodiment;
Figure 13 is the schematic diagram of the object screening plant according to the embodiment of the present application;
Figure 14 is the schematic diagram of the similarity processing unit according to the embodiment of the application the 5th;And
Figure 15 is a kind of structured flowchart of computer equipment according to the embodiment of the present application.
Embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment The only embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to The scope of the application protection.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, " Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except Here the order beyond those for illustrating or describing is implemented.In addition, term " comprising " and " having " and they Any deformation, it is intended that covering is non-exclusive to be included, for example, containing series of steps or module or the mistake of unit Journey, method, system, product or equipment are not necessarily limited to those steps clearly listed or module or unit, but can Including it is not listing clearly or for the intrinsic other steps of these processes, method, product or equipment or module or Unit.
Embodiment 1
According to the embodiment of the present application, a kind of embodiment of the method for similarity processing method is additionally provided, it is necessary to illustrate, It can be performed the step of the flow of accompanying drawing is illustrated in the computer system of such as one group computer executable instructions, And, although logical order is shown in flow charts, but in some cases, can be with suitable different from herein Sequence performs shown or described step.
For the ease of description, at this to the invention relates to several terms illustrate:
Collaborative filtering:The similar population with individual is found, therefore calculates the similarity between two individuals, is to cooperate with Filter core missions.
Pre- cluster:According to business scenario, further the object to be studied is clustered, each individual only with it Other individuals in same clustering cluster carry out Similarity Measure.
Cosine similarity algorithm:For calculating the similarity between two individuals.In the examples below not to phase It is improved like degree computational methods, but alphabetical institute's generation in the method for utilizing existing Similarity Measure, below equation The implication of table will be understood by those skilled in the art, therefore, just repeat no more in the present embodiment. Existing and future all similarity algorithms can apply the numerical procedure in following the present embodiment.
The formula of conventional similarity calculating method has following several:
Euclidean distance:
Pearson correlation coefficient:
Cosine similarity algorithm:
Tanimoto coefficients:
The embodiment of the method that the embodiment of the present application is provided can carry out computing on the server, preferably make to provide With experience, operation result inquiry service can also be provided, for example, clothes can be checked by webpage or client The operation result being engaged on device.Server can be understood as a kind of computer.Certainly, with the development of technology, cloud computing Having obtained the method provided in increasingly wider application, the embodiment of the present application can also promote the use of in cloud computing. The computing capability of terminal can also strengthen with the development of technology, be counted when terminal can get corresponding data During calculation, for example, terminal can include but is not limited to:Mobile phone, tablet personal computer and other portable sets.But It is that one kind is preferably selected in server disposition the application following examples for now to be.
Under the conditions of current technology, the hardware structure that server, terminal, cloud computing are relied on all be it is similar, A kind of computer equipment can be regarded as.The embodiment of the present application can be performed in this computer equipment.With technology Development, computer equipment hardware structure changed, or occurs in that the arithmetic facility of new architecture, and the application is real Applying example can also implement.Illustrated below by taking the framework of the computing device in Fig. 1 as an example.
Fig. 1 is a kind of hardware block diagram of the computer equipment of similarity processing method according to the embodiment of the present application. As shown in figure 1, computer equipment 1 can include one or more (one is only shown in figure) processors 102 (place Reason device 102 can include but is not limited to the processing unit of Micro-processor MCV or PLD FPGA etc.), use Memory 104 in data storage and the transport module 106 for communication function.Those of ordinary skill in the art can To understand, the structure shown in Fig. 1 is only signal, and it does not cause to limit to the structure of above-mentioned electronic installation.For example, Computer equipment 1 may also include than shown in Fig. 1 more either less components or with different from shown in Fig. 1 Configuration.
Memory 104 can be used for similar in the software program and module of storage application software, such as the embodiment of the present application Corresponding programmed instruction/the module of processing method is spent, processor 102 is stored in the software journey in memory 104 by operation Sequence and module, so as to perform various function application and data processing, that is, realize the similarity of above-mentioned application program Processing method.Memory 104 may include high speed random access memory, may also include nonvolatile memory, such as one or The multiple magnetic storage devices of person, flash memory or other non-volatile solid state memories.In some instances, memory 104 can further comprise the memory remotely located relative to processor 102, and these remote memories can pass through net Network is connected to computer equipment 1.The example of above-mentioned network include but is not limited to internet, intranet, LAN, Mobile radio communication and combinations thereof.
Transmitting device 106 is used to data are received or sent via a network.Above-mentioned network instantiation may include The wireless network that the communication providerses of computer equipment 1 are provided.In an example, transmitting device 106 includes one Network adapter (Network Interface Controller, NIC), it can pass through base station and other network equipments It is connected to be communicated with internet.In an example, transmitting device 106 can be radio frequency (Radio Frequency, RF) module, it is used to wirelessly be communicated with internet.In another embodiment, Transmitting device 106 can also be a wired interface module, be communicated by wired mode and internet.
Under above-mentioned running environment, similarity processing method as shown in Figure 2 is present embodiments provided.Fig. 2 is basis The flow chart of the similarity processing method of the application first embodiment, as shown in Fig. 2 the flow can include following step Suddenly:
Step S202, obtains design conditions, wherein, in the case where design conditions are satisfied, two two-phases can be calculated Seemingly the maximum of the object number of degree is k.
As an alternative embodiment, design conditions can include at least one of:For calculating similarity Resource, the time for calculating similarity, the scale for calculating similarity.For example, design conditions can calculate similarity Time within 3 seconds, or calculate similarity operation times within 1,000,000 times, design conditions can be used for Calculate similarity resource, calculate similarity time, calculate similarity scale in one or more.Calculate Condition can also be the condition that other types can be construed as limiting to the calculating of similarity.
The present embodiment is explained by taking computing capability as an example to design conditions.Generally computing capability is by calculating institute What the resource that can be used was determined, now, when computing capability represents that k takes maximum, the k object that can be calculated Similarity two-by-two.
The computing capability can be the computing capability of a server, as an alternative embodiment, computing capability When representing that k takes maximum, the similarity two-by-two for the k object that can be calculated, computing capability can be the meter of server Calculation ability, for example, shared k individual, in Similarity Measure, it is necessary to any two individual in k object Between calculate a similarity, the complexity of calculating is k2If the computing capability maximum of server can reach calculating k2Similarity, then computing capability be k.
For example, a server computational power is that can carry out the calculating that complexity is 10000, then, k value Maximum is just 100.Either, computing capability can also the direct maximum similarity two-by-two that use can be calculated number To be indicated, for example, the computing capability of a server is exactly the similarity two-by-two of 100 objects of max calculation. The computing capability can also the computing capability that is provided of multiple servers, at this point it is possible to according to the meter of the multiple servers Calculation ability calculates k values, then carries out load balancing according to the computing capability of each server of multiple servers.
Fig. 3 is the schematic diagram of multiple server load balancings according to the embodiment of the present application, as shown in figure 3, server Group or cloud computing can jointly be completed by multiple servers, wherein, the computing capability of server 1 is 2000, service The computing capability of device 2 is 3000, and the computing capability of server 3 is 1000, and the computing capability of server 4 is 4000, Therefore using four servers as from the point of view of server zone or cloud computing, the computing capability of the integrity service device is 1W.It is logical The calculating of similarity two-by-two of k object can preferably be completed by crossing the computing capability of multiple servers offer.Certainly, For Cloud Server, it is how to carry out load balancing that developer, which may not be needed excessively to be concerned about inside Cloud Server, The maximum computing capability that there is provided according to Cloud Server is only needed to calculate.If the computing capability of server occurs Change, can readjust k values according to the computing capability of the server after change.
In another optional embodiment, k value can be a server or multiple servers or cloud clothes Business device carries out value during Similarity Measure completely.Certainly, sometimes some or some server it is also required to provide other Service, for example, the resource of server 20% needs to distribute to other services to be calculated, then meter now Calculation ability is accomplished by being calculated according to the resource of the server 80%, for multiple servers or Cloud Server It is identical.
Step S204, i object is filtered out according to design conditions from n object, wherein, i is less than or equal to n, i Less than or equal to k.
It can be sieved according to computing capability from n object to filter out i object from n object according to design conditions I object is selected, wherein, i is less than or equal to n, and i is less than or equal to k.
Indicated that when i is equal to k and be desirable with all available computing capabilitys to carry out the calculating of similarity two-by-two. When actually implementing, the i values less than k can also be selected, thus it is possible, on the one hand, result of calculation can be obtained faster; On the other hand, it can also reduce and calculate brought resource consumption, the resource of saving can service for carrying out other. As optional embodiment, if selected for the i values less than k, brought resource can be reduced with computation complexity Save, be supplied to other to service the resource.
Step S206, similarity is calculated to i object two-by-two.
By above-mentioned steps, an appropriate number of object is screened from substantial amounts of object according to computing capability, two are then carried out Two Similarity Measures, solve the problem of Similarity Measure in the prior art is larger caused, so as to reduce phase The complexity calculated like degree.
In above-mentioned steps S204, i object is filtered out from n object according to computing capability.The mode of screening has Many kinds, if for example, simple in order to reduce calculation scale, then i can be filtered out from n object at random individual Object is calculated.Such result of calculation is also what is had the certain significance to a certain extent.In one example, Have much if only ratio substantially similar in 100W answer for wanting to know about certain network surveying, be now not necessarily to The calculating of similarity two-by-two is all carried out to 100w answer, but therefrom randomly chooses 10w to be calculated, now Result can also represent the result intentionally got.
Can also not be very high to filter out similarity by way of filtering in other optional embodiment Object, then the object higher to remaining similarity is calculated again.Illustrated below by way of several examples:
For another example the value of corresponding one or more attributes is distinguished according to each object in n object, from n object In filter out i object.Each object in n object has one or more attributes, according to each object The value of one or more attributes is screened.
N object all has the first attribute, and it is individual right that i is filtered out from n object according to the first attribute of n object As that can screen, can also be segmented according to actual needs or single-point screening in certain interval range.For example, n Individual object all has the first attribute and the second attribute, according to the first attribute and the second attribute of n object from n object In filter out i object, filtered out first from n object meet the first attribute selection condition a object, so Screening meets i object of the second attribute selection condition from a object afterwards, wherein, i≤a.Equally, if needed I object is filtered out from n object according to more attributes, then can be successively according to each attribute from n object Middle screening.
, first can be with if selecting 200 objects from 2000 objects by height, three attributes of body weight and age More than 1.2 meters of object is picked out according to height, 1000 is picked out, then according to body weight, picks out 30 kilograms Object above, picks out 600, then picks out the age in the object of 8 years old and 12 years old according to the age, pick out 200 Individual object, calculates the similarity for this 200 objects picked out.It can also be selected 8 years old when selecting object according to the age To the object of 12 years old, the value of screening attribute and attribute can be changed according to specific purposes.By filtering out appropriate object meter Similarity is calculated, the complexity of calculating can be reduced.
As an alternative embodiment, distinguishing corresponding one or more attributes according to each object in n object Value, i object is filtered out from n object to be included:The value of one or more attributes is dropped into preset range Object is screened from n object as i object, wherein, preset range is determined according to i value.
By taking the similarity between calculating Taobao shop as an example, for example, one has 100,000 shops, but computation complexity is too Height, existing computing capability can only calculate the similarity in 50,000 shops, therefore it is close to want preliminary screening to go out property value, 50,000 higher shops of similarity carry out Similarity Measure, and this 100,000 shops have with multiple attributes, for example, Two attributes of quantity and every daily sales of daily sale kinds of goods.If only according to the quantity of sale kinds of goods if can be from 50,000 shops are filtered out in 100000 shops, just without using sales volume.If can not only be screened according to the quantity of kinds of goods Go out 50,000 shops, for example, the shop filtered out according to item quantity has 70,000, then can be further according to sales volume Further screened, if still can not screened according to sales volume, can be screened further according to other attributes. Gradually the quantity of increase screening attribute, the number of objects of computing capability is met with acquisition.Wherein, preset range is basis What i value was determined, according to the difference of i quantity, preset range can be adjusted, for example, screening shop with item quantity During paving, if limiting commodity amount as 100-200 parts, 60,000 shops are filtered out, then are adjusted to commodity amount 100-150 parts, to filter out 50,000 shops.
For example, selecting 200 objects from 2000 objects by height, three attributes of body weight and age, first may be used To pick out more than 1.2 meters of object according to height, 1000 are picked out, then according to body weight, 30,000 are picked out Object more than gram, picks out 600, then picks out object of the age between 8 years old to 12 years old according to the age, chooses Select 300.More than 200, at this point it is possible to which age attribute is adjusted to 9 years old to 11 years old, then pick out 200 objects, calculate the similarity for this 200 objects picked out.
Screened from n object by the way that the value of one or more attributes to be dropped into the object of preset range and be used as i Individual object, filters out appropriate number of object, then calculates similarity, can be while ensureing that data result is accurate Meet the requirement of computing capability, using according to computing capability come the radical of the object of similarity two-by-two of Reduction Computation as far as possible, The purpose for calculating object is rationally screened according to computing capability so as to reach, Similarity Measure in correlation technique is solved and advises The problem of caused by mould is larger.
Calculated according to the value of the corresponding one or more same alike results of each object and obtain the corresponding cluster of each object difference The factor;I object is filtered out from n object according to clustering factor.Fig. 4 is according to the application second embodiment The flow chart of similarity processing method.As shown in figure 4, the flow may include steps of:
Step S301, calculates according to the value of the corresponding one or more same alike results of each object and obtains each object difference Corresponding clustering factor.
Step S302, i object is filtered out according to clustering factor from n object.
It is then right from n according to clustering factor by obtaining the corresponding clustering factor of each object by above step I object is filtered out as in, can be needed to be adjusted clustering factor according to user, make the phase of object filtered out Like Du Genggao, and then reduce Similarity Measure scale.
In an optional embodiment, calculated according to the value of the corresponding one or more same alike results of each object To each object, corresponding clustering factor can be respectively:Calculated according to the value weighted sum of multiple same alike results and obtain each Object distinguishes corresponding clustering factor, and an appropriate number of object is filtered out from multiple objects according to obtained clustering factor To calculate similarity between any two, that is, i object is filtered out from n object according to clustering factor.Wherein, Each object may have multiple same alike results, it is also possible to without some attribute, if without some attribute, can So that the property value is designated as into zero when calculating clustering factor.
For example, 100,000 shops altogether, but computation complexity is too high, existing computing capability can only calculate 50,000 The similarity in shop, therefore to filter out that property value is close, 50,000 higher shops of similarity carry out Similarity Measure, This 100,000 shops all have multiple attributes, for example, two attributes of quantity and every daily sales of daily sale kinds of goods. Similarity is calculated in order to obtain the more shops of common trait, can be carried out according to the quantity and sales volume of sale kinds of goods Weighted sum is calculated as clustering factor, then filters out i object from n object further according to clustering factor.Weighting It can be that weighting parameters are determined according to specific object with calculating, weighting parameters or calculating can also be pre-entered by user Model calculates the clustering factor in shop, and i object is then filtered out from n object according to the clustering factor calculated.
, can be with for another example select 200 objects from 2000 objects by height, three attributes of body weight and age Height, the important coefficient of three attributes of body weight and age are set, that is, corresponding one or many according to each object The value of individual same alike result, which is calculated, obtains the corresponding clustering factor of each object difference, for example, passing through height × 0.5+ body weight × 0.2+ age × 0.3 obtains clustering factor, and 2000 objects are arranged according to clustering factor, 200 are therefrom chosen Individual calculation and object similarity, 200 objects of selection can be continuous or be segmented discontinuous.
Fig. 5 is the flow chart of the similarity processing method according to the application 3rd embodiment.As shown in figure 5, the flow It may include steps of:
Step S401, the size for distinguishing corresponding clustering factor according to each object is arranged each object.
According to each object respectively corresponding clustering factor size to each object carry out arrangement can be:Calculating During the corresponding clustering factor of each object, all objects are arranged from small to large according to the size of clustering factor, Huo Zhecong Minispread is arrived greatly.
Step S402, selects continuous i object from the n object arranged.
As an alternative embodiment, filtering out i object from n object according to clustering factor includes:Press The size for distinguishing corresponding clustering factor according to each object is arranged each object;From the n object arranged Select continuous i object.
For example, being weighted and calculating as clustering factor, then according to every according to the quantity and sales volume of sale kinds of goods The size of the corresponding clustering factor in individual shop is arranged each shop, for example, right from small to large according to clustering factor Shop is arranged, and continuous i shop is then selected from the n shop arranged, to continuous i shop Calculate similarity two-by-two.I continuous shops for meeting computing capability are selected from continuously arranged shop, due to choosing The similarity highest in the i continuous shops gone out, therefore the technical scheme of the embodiment of the present application can reduce Similarity Measure Complexity.
As an alternative embodiment, distinguishing corresponding one or more attributes according to each object in n object Value, i object is filtered out from n object to be included:According to the corresponding one or more same alike results of each object Value calculating obtains each object and distinguishes corresponding clustering factor;It is right from n according to clustering factor and one or more attributes I object is filtered out as in.
For example, one has 100,000 shops, but computation complexity is too high, and existing computing capability can only calculate 5 The similarity in ten thousand shops, therefore to filter out that property value is close, 50,000 higher shops of similarity carry out similarity Calculate, this 100,000 shops all have multiple attributes, for example, the quantity and every daily sales two of daily sale kinds of goods Individual attribute.It can be weighted and calculated as clustering factor according to the quantity and sales volume of sale kinds of goods, then root again I object is filtered out from n object according to clustering factor and sales volume.For example, weighted sum computational methods are:Sale Item quantity × 0.9+ sales volume × 0.1.In screening, attribute number and clustering factor can be adjusted according to i quantity Computational methods, to filter out the object that similarity is higher.
The technical scheme of the application is sketched with reference to an optional embodiment:
Fig. 6 is the schematic diagram of the similarity processing procedure according to the embodiment of the present application.As shown in Figure 6, it is assumed that individual is total Number is n, is designated as K1-Kn, it is necessary to K during calculating similarity1Respectively with K1-KnCalculate similarity two-by-two, collaboration Filter algorithm needs to calculate the similarity between two individuals, though in the distributed frame of programming model (MapReduce) Performed under frame, but same individual KiIt can be assigned in same reduction (Reduce), i.e. the Reduce most matters of fundamental importance Calculate complexity and reach n2.The technical scheme of the embodiment of the present application can be divided into numeric type cluster and enumeration type clusters two kinds, its In, first to full individual, by different clustering factors, (herein, clustering factor can be understood as not belonging to together data value cluster Under the weighted sum of property, such as weighted sum of two attributes, extreme case, clustering factor can be an attribute, now There is no weighted sum, clustering factor can be just an attribute) sequence, for example, need to find out in business 2m cluster because Sub- Q similar individuals, all individuals choose individual by factor Q sequence in the object arranged according to size order KiThe individual and lower neighbouring m individual of upper neighbouring m of line.In another example, it is still desirable to 2m individual, then, piece Act type cluster gathers the individual of specific identical t enumerated value (it can be appreciated that property value) for same class (enumerated value Polymerization), usual t and m has mutual restricting relation, and t demonstrates the need for more greatly the individual number with same alike result value can Can be fewer, therefore, gather smaller for of a sort number of individuals m, can suitably be adjusted when there is strict limitation m value in business Whole t value.Finally gather the computation complexity after class to be reduced to (2m)2, wherein m<<n.
By above-mentioned technical proposal, pre- cluster treatment technology can calculate number of individuals to lift computational efficiency in reduction, together When, the processing mode that the individual that differs greatly is rejected in its individual contrast retained with adjacent features does not interfere with data result Accuracy and reasonability, high similarity (Top the is similar) individual that can be calculated completely to full dose individual coincide.
A kind of optional application mode of the embodiment of the present application:By calculate that shop constituted based on classification it is similar exemplified by, first A table T_tezheng is built, it there are three fields:Seller_id, cate_id, wgt, represent the ID in shop respectively, The ID of classification and shop in such characteristic component (being standardized processing) now,
Available by cosine similarity is exactly similarity between two shops.Assuming that some classification logotype cate_id Under shop number have 100w, its computation complexity is at least 10000w2, but this 10000w2Individual similar individuals are real It is similar to individual itself also with regard to larger 1000 of similarity, similar individuals outside usually more than 1000 all can be by Filtering.According to specific shop attribute, many clustering factors can be found out.Continuation is constituted phase with shop classification above Exemplified by being calculated like degree, for example, in business, it is desirable to which two shop main management one-level industries are consistent, and business capacity will be approached, Comparativity is just had, therefore shop can be gathered as clustering factor using managing two dimensions of classification and moon transaction value mainly Class.Two fields are extended on T_tezheng this table, is calculated, assigned to according to the field of the table after extension Data volume on each reduce can be many less, and the similar individuals that final goal excavates individual are often special with greater need for these Levy similar constraint.In actual applications, according to different application scenarios different clustering factors can be specified to set The processing scheme clustered in advance, it is ensured that the requirement for calculating performance is met while data result is accurate.
The embodiment of the present application can pass through SQL (Structured Query Language, referred to as SQL) Open data processing service (Open Date Processing Service, referred to as ODPS) platform is deployed in realize.
The embodiment of the present application is after the principle of MapReduce processing Join processes is understood in depth, in good time with reference to business scenario, Further the object to be studied is clustered, each individual with its other individual in same clustering cluster only with entering Row Similarity Measure, it is final effectively to reduce the connecting key (Key) for being assigned to same Reduce.Assuming that computing capability Receptible maximum computational complexity is K2As long as, then ensure that the number of individuals under each clustering cluster is no more than K. By this screening mode, can according to computing capability come the radical of the object of similarity two-by-two of Reduction Computation as far as possible, from And the purpose rationally screened according to computing capability and calculate object has been reached, solve Similarity Measure scale in correlation technique The problem of caused by larger.
In the above-described embodiments, it is reduced the complexity of Similarity Measure by the screening of object.This screening side Method can also be used alone, and any need can use this method when screening, and Fig. 7 is according to the application The flow chart of the object screening technique of embodiment, as shown in fig. 7, the flow may include steps of:
Step S701, obtains the value that each object in n object distinguishes corresponding one or more attributes.
For example, one has n object, n object all has the value of one or more attributes, obtained in n object The value of each object corresponding one or more attributes respectively can obtain in n object each attribute of object Value or obtain the values of the corresponding multiple attributes of each object difference in n object, the value of the attribute of acquisition The number condition that can be screened according to object determine.
Step S702, filters out i individual similar according to the value of one or more attributes of each object from n object Object.
After the value for obtaining the corresponding one or more attributes of each object difference in n object, according to each object The value of one or more attributes filters out i similar objects from n object.
I similar objects are filtered out from n object according to the value of one or more attributes of each object to be: The object that the value of one or more attributes is dropped into preset range is screened from n object as i object, Wherein, preset range is determined according to i value.
If first for example, select 200 objects from 2000 objects by height, three attributes of body weight and age More than 1.2 meters of object can be first picked out according to height, 1000 are picked out, then according to body weight, picked out More than 30 kilograms of object, picks out 600, then picks out the age in the object of 8 years old and 12 years old according to the age, chooses 200 objects are selected, the similarity for this 200 objects picked out is calculated.Can also when selecting object according to the age The object of 8 years old to 12 years old is selected, can change the value of screening attribute and attribute to filter out appropriate number according to specific purposes The object of amount.
I similar objects are filtered out from n object according to the value of one or more attributes of each object to be: Calculated according to the value of the corresponding one or more same alike results of each object and obtain the corresponding clustering factor of each object difference; I object is filtered out from n object according to clustering factor.
I similar objects are filtered out from n object according to the value of one or more attributes of each object to be: Calculated according to the value of the corresponding one or more same alike results of each object and obtain the corresponding clustering factor of each object difference; I object is filtered out from n object according to clustering factor and one or more attributes.
Object screening technique one in the embodiment of object screening technique degree processing method similar to above in the present embodiment Cause, will not be repeated here.
In the embodiment of object screening technique, the result of object screening can be not used to calculate similarity, can be only It is, as data storage or pretreatment, during follow-up, the calculating of similarity can be carried out when needing Or the analysis of sample.The object screening technique can apply in several scenes, be adjusted for example, can apply in sampling Check the mark in analysis.
The present embodiment additionally provides a kind of similarity processing method.In the method, the calculating of similarity can be without obtaining Take design conditions, for example, design conditions are sufficient for the Similarity Measure of entire objects, but during due to actual calculating without The similarity between whole objects need to be calculated, now, Similarity Measure need not obtain design conditions, and indicate according to right The property value of elephant is screened, and then calculates similarity.Fig. 8 is handled according to the similarity of the application fourth embodiment The flow chart of method, as shown in figure 8, the flow may include steps of:
Step S801:The value of corresponding one or more attributes is distinguished according to each object in n object, it is right from n I object is filtered out as in.
I is filtered out from n object according to the value of the corresponding one or more attributes respectively of each object in n object Individual object can be filtered out i similar objects or filter out the low object of i similarity, or also may be used To be arbitrarily to filter out i object, the value of corresponding one or more attributes is distinguished according to each object in n object Can be object of the value for the one or more attributes for screening object in some interval to screen object.
Step S802:Similarity is calculated two-by-two to i object.
Similarity is calculated two-by-two to i object after i object is filtered out.
In similarity processing method in the present embodiment, the implementation of the process degree processing method similar to above of object screening Object screening process in example is consistent, will not be repeated here.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as to one it is The combination of actions of row, but those skilled in the art should know, the application is not limited by described sequence of movement System, because according to the application, some steps can be carried out sequentially or simultaneously using other.Secondly, art technology Personnel should also know that embodiment described in this description belongs to preferred embodiment, involved action and module And necessary to unit not necessarily the application.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The similarity processing method of example can add the mode of required general hardware platform to realize by software, naturally it is also possible to logical Cross hardware, but the former is more preferably embodiment in many cases.Understood based on such, the technical scheme of the application The part substantially contributed in other words to prior art can be embodied in the form of software product, the computer Software product is stored in a storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions are to make A station terminal equipment (can be mobile phone, computer, server, or network equipment etc.) perform the application each Method described in embodiment.
Embodiment 2
According to the embodiment of the present application, a kind of similarity processing dress for being used to implement above-mentioned similarity processing method is additionally provided Put, Fig. 9 is the schematic diagram of the similarity processing unit according to the application first embodiment, as shown in figure 9, the device Including:First acquisition module 10, the first screening module 20 and the first computing module 30.
First acquisition module 10, for obtaining design conditions, wherein, in the case where the design conditions are satisfied, The maximum that the object number of similarity two-by-two can be calculated is k.
First screening module 20, for i object to be filtered out from n object according to design conditions, wherein, i is small In equal to n, i is less than or equal to k.
First computing module 30, for calculating similarity two-by-two to i object.
In the similarity processing unit of the embodiment, the first acquisition module 10 can be used for performing in the embodiment of the present application Step S202, the step S204 that the first screening module 20 can be used for performing in the embodiment of the present application, first calculates mould Block 30 can be used for performing the step S206 in the embodiment of the present application.
The embodiment of the present application obtains design conditions by the first acquisition module 10, wherein, it is satisfied in the design conditions In the case of, the maximum that can calculate the object number of similarity two-by-two is k, and the first screening module 20 is according to calculating Condition filters out i object from n object, wherein, i is less than or equal to n, and i is less than or equal to k, the first computing module 30 pairs of i objects calculate similarity two-by-two, it is achieved thereby that the technique effect of reduction Similarity Measure complexity, and then Solve the larger caused technical problem of Similarity Measure.
In a kind of optional embodiment, design conditions include at least one of:For calculate similarity resource, Calculate the time of similarity, calculate the scale of similarity.
In a kind of optional embodiment, the first screening module 20 is used for right respectively according to each object in n object The value for the one or more attributes answered, filters out i object from n object.Wherein, it is each in n object Object all has one or more attributes, and the first screening module 20 is entered according to the value of one or more attributes of each object Row screening.
The embodiment of above-mentioned similarity processing method is equally applicable in similarity processing unit, for example, n object All there is the first attribute, the first screening module 20 filters out i according to the first attribute of n object from n object Individual object, can be screened in certain interval range, can also be segmented according to actual needs or single-point screening.Example again Such as, n object all has the first attribute and the second attribute, according to the first attribute and the second attribute of n object from n I object is filtered out in individual object, the first screening module 20 filters out satisfaction the first attribute sieve from n object first A object of condition is selected, then screening meets i object of the second attribute selection condition from a object, wherein, i≤a.Equally, then can basis successively if necessary to filtering out i object from n object according to more attributes Each attribute is screened from n object.Multiple attributes can also be once limited, i are filtered out from n object individual right As.
In a kind of optional embodiment, the first screening module 20 is pre- for the value of one or more attributes to be dropped into The object for determining scope is screened from n object as i object, wherein, preset range is true according to i value Fixed.
In a kind of optional embodiment, Figure 10 is showing according to the similarity processing unit of the application second embodiment It is intended to.As shown in Figure 10, the similarity processing unit includes:First acquisition module 10, the He of the first screening module 20 First computing module 30.Wherein, the first screening module 20 includes:First computing unit 201 and the first screening unit 202。
The first acquisition module 10 in the embodiment, effect and the sheet of the first screening module 20 and the first computing module 30 Apply for that the effect of the similarity processing unit of first embodiment is identical.It will not be repeated here.
First computing unit 201, is obtained for being calculated according to the value of the corresponding one or more same alike results of each object Each object distinguishes corresponding clustering factor.
First screening unit 202, for filtering out i object from n object according to clustering factor.
In a kind of optional embodiment, Figure 11 is showing according to the similarity processing unit of the application 3rd embodiment It is intended to.As shown in figure 11, the similarity processing unit includes:First acquisition module 10, the He of the first screening module 20 First computing module 30.Wherein, the first screening module 20 includes:First computing unit 201 and the first screening unit 202, the first screening unit 202 includes arrangement units 2021 and selecting unit 2022.
Arrangement units 2021, the size for distinguishing corresponding clustering factor according to each object is arranged each object Row.
Selecting unit 2022, for selecting continuous i object from the n object arranged.
In a kind of optional embodiment, Figure 12 is showing according to the similarity processing unit of the application fourth embodiment It is intended to.As shown in figure 12, the similarity processing unit includes:First acquisition module 10, the He of the first screening module 20 First computing module 30.Wherein, the first screening module 20 includes:Second computing unit 203 and the second screening unit 204。
Second computing unit 203, is obtained for being calculated according to the value of the corresponding one or more same alike results of each object Each object distinguishes corresponding clustering factor;
Second screening unit 204, for filtering out i from n object according to clustering factor and one or more attributes Individual object.
The second computing unit 203 in the embodiment can be identical with the first computing unit 201 in above-described embodiment, It can also differ, the second screening unit 204 can be identical with the first screening unit 202 in above-described embodiment, It can differ.
According to the embodiment of the present application, a kind of object screening plant for being used to implement above-mentioned object screening technique is additionally provided, Figure 13 is the schematic diagram of the object screening plant according to the embodiment of the present application, and as shown in figure 13, the device includes:The Two acquisition modules 40 and the second screening module 50.
Second acquisition module 40, the value of corresponding one or more attributes is distinguished for obtaining each object in n object.
Second screening module 50, the value for one or more attributes according to each object is filtered out from n object I similar objects.
In the object screening plant of the embodiment, the second acquisition module 40 can be used for performing the step in the embodiment of the present application Rapid S701, the second screening module 50 can be used for performing the step S702 in the embodiment of the present application.
According to the embodiment of the present application, a kind of similarity processing dress for being used to implement above-mentioned similarity processing method is additionally provided Put, Figure 14 is the schematic diagram of the similarity processing unit according to the embodiment of the application the 5th, as shown in figure 14, the dress Put including:3rd screening module 60 and the second computing module 70.
3rd screening module 60, the value for distinguishing corresponding one or more attributes according to each object in n object, I object is filtered out from n object.3rd screening module 60 can be with the first screening module in above-described embodiment 20 effect is identical.
Second computing module 70, for calculating similarity two-by-two to i object.
In the similarity processing unit of the embodiment, the 3rd screening module 60 can be used for performing in the embodiment of the present application Step S801, the second computing module 70 can be used for performing the step S802 in the embodiment of the present application.
Embodiment 3
Embodiments herein also provides a kind of computer equipment, the computer equipment can be computer equipment group in Any one computer equipment.Alternatively, in the present embodiment, above computer equipment can constitute server Equipment or the equipment for constituting server cluster, or the equipment for constituting cloud computing.In other words, cloud service Device is referred to as being computer equipment, need not will only be concerned about in these cloud computing servers for user The composition of the particular hardware of equipment.Certainly, the development of this terminal computing capability, it is possible to which terminal can also be added to cloud meter Come among calculating, now, above computer equipment can also be any one mobile terminal in mobile terminal group.
Alternatively, in the present embodiment, above computer equipment can be located in multiple network equipments of computer network At least one network equipment.
Alternatively, Figure 15 is a kind of structured flowchart of computer equipment according to the embodiment of the present application.As shown in figure 15, Computer equipment A can include:One or more (one is only shown in figure) processors 101, memory 103, And transmitting device 105.
The application affairs interaction side that memory 103 can be used in storage software program and module, such as the embodiment of the present application Method and the corresponding programmed instruction/module of device, processor 101 are stored in the software program in memory 103 by operation And module, so as to perform various function application and data processing, that is, realize above-mentioned application affairs exchange method. Memory 103 may include high speed random access memory, can also include nonvolatile memory, such as one or more magnetic Property storage device, flash memory or other non-volatile solid state memories.In some instances, memory 103 can enter One step includes the memory remotely located relative to processor 101, and these remote memories can be by network connection extremely Computer equipment A.The example of above-mentioned network includes but is not limited to internet, intranet, LAN, mobile logical Letter net and combinations thereof.
Above-mentioned transmitting device 105 is used to data are received or sent via a network.Above-mentioned network instantiation It may include cable network and wireless network.In an example, transmitting device 105 includes a network adapter NIC, It can be connected to be communicated with internet or LAN by netting twine and other network equipments with router.One In individual example, transmitting device 105 is radio frequency module, and it is used to wirelessly be communicated with internet. In the present embodiment,
Above computer equipment can perform the program code of following steps in the similarity processing method of application program:
Design conditions are obtained, wherein, in the case where design conditions are satisfied, the object of similarity two-by-two can be calculated The maximum of number is k;I object is filtered out from n object according to design conditions, wherein, i is less than or equal to N, i are less than or equal to k;Similarity is calculated two-by-two to i object.
Optionally, above computer equipment can also carry out the program code of following steps:Design conditions are included below extremely It is one of few:Resource, the time of calculating similarity, the scale for calculating similarity for calculating similarity.
Optionally, above computer equipment can also carry out the program code of following steps:Filtered out from n object I object includes:The value of corresponding one or more attributes is distinguished according to each object in n object, it is right from n I object is filtered out as in.
Optionally, above computer equipment can also carry out the program code of following steps:According to each in n object Object distinguishes the value of corresponding one or more attributes, and i object is filtered out from n object to be included:By one or The object that the value of multiple attributes drops into preset range is screened from n object as i object, wherein, in advance Determining scope is determined according to i value.
Optionally, above computer equipment can also carry out the program code of following steps:According to each in n object Object distinguishes corresponding one or more property values, and i object is filtered out from n object to be included:According to each right The corresponding clustering factor of each object difference is obtained as the value of corresponding one or more same alike results is calculated;According to cluster The factor filters out i object from n object.
Optionally, above computer equipment can also carry out the program code of following steps:According to clustering factor from n I object is filtered out in object to be included:Size according to each object corresponding clustering factor respectively is entered to each object Row arrangement;Continuous i object is selected from the n object arranged.
Optionally, above computer equipment can also carry out the program code of following steps:According to each in n object Object distinguishes corresponding one or more property values, and i object is filtered out from n object to be included:According to each right The corresponding clustering factor of each object difference is obtained as the value of corresponding one or more same alike results is calculated;According to cluster The factor and one or more attributes filter out i object from n object.
Above computer equipment can perform the program code of following steps in the similarity processing method of application program:
Obtain the value that each object in n object distinguishes corresponding one or more attributes;According to the one of each object Or the value of multiple attributes filters out i similar objects from n object.
Optionally, above computer equipment can also carry out the program code of following steps:According to the one of each object Or the value of multiple attributes filters out i similar objects from n object and included:The value of one or more attributes is fallen The object entered to preset range is screened from n object as i object, wherein, preset range is according to i Value determine.
Optionally, above computer equipment can also carry out the program code of following steps:According to the one of each object Or the value of multiple attributes filters out i similar objects from n object and included:Corresponding one according to each object Or the value of multiple same alike results calculates and obtains the corresponding clustering factor of each object difference;It is right from n according to clustering factor I object is filtered out as in.
Optionally, above computer equipment can also carry out the program code of following steps:According to the one of each object Or the value of multiple attributes filters out i similar objects from n object and included:Corresponding one according to each object Or the value of multiple same alike results calculates and obtains the corresponding clustering factor of each object difference;According to clustering factor and one or Multiple attributes filter out i object from n object.
Above computer equipment can perform the program code of following steps in the similarity processing method of application program:
The value of corresponding one or more attributes is distinguished according to each object in n object, is filtered out from n object I object;Similarity is calculated two-by-two to i object.
It will appreciated by the skilled person that the structure shown in Figure 15 is only signal, computer equipment A can also It is that smart mobile phone (such as Android phone, iOS mobile phones), tablet personal computer, palm PC and mobile Internet are set The terminal devices such as standby (Mobile Internet Devices, MID), PAD.Figure 15 it does not fill to above-mentioned electronics The structure put causes to limit.For example, computer equipment A may also include the component more or less than shown in Figure 15 (such as network interface, display device), or with the configuration different from shown in Figure 15.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can be with Completed by program come the device-dependent hardware of command terminal, the program can be stored in a computer-readable storage medium In matter, storage medium can include:Flash disk, read-only storage (Read-Only Memory, ROM), deposit at random Take device (Random Access Memory, RAM), disk or CD etc..
Embodiment 4
Embodiments herein additionally provides a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium It can be used for preserving the program code performed by the similarity processing method that above-described embodiment one is provided.
Alternatively, in the present embodiment, above-mentioned storage medium can be located in computer network Computer device cluster In any one computer equipment.Above computer equipment can constitute the equipment of server or constitute to take The equipment of business device cluster, or the equipment for constituting cloud computing.In other words, Cloud Server is it is also assumed that be a group meter Machine equipment is calculated, only for user by the particular hardware for the equipment that need not be concerned about in these cloud computing servers Constitute.Certainly, with the development of terminal computing capability, it is possible to which terminal can also be added among cloud computing, now, Above-mentioned storage medium can also be located in any one mobile terminal in mobile terminal group.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: Design conditions are obtained, wherein, in the case where design conditions are satisfied, the object number of similarity two-by-two can be calculated Maximum be k;I object is filtered out from n object according to design conditions, wherein, i is less than or equal to n, i Less than or equal to k;Similarity is calculated two-by-two to i object.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: Design conditions include at least one of:It is similar for the resource for calculating similarity, the time for calculating similarity, calculating The scale of degree
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: The value of corresponding one or more attributes is distinguished according to each object in n object, i are filtered out from n object Object.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: The value of corresponding one or more attributes is distinguished according to each object in n object, i are filtered out from n object Object includes:The value of one or more attributes is dropped into preset range object screened from n object as I object, wherein, preset range is determined according to i value.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: Corresponding one or more property values are distinguished according to each object in n object, i are filtered out from n object individual right As including:Each object difference is obtained according to the calculating of the value of the corresponding one or more same alike results of each object corresponding Clustering factor;I object is filtered out from n object according to clustering factor
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: Filtering out i object from n object according to clustering factor includes:Distinguish corresponding clustering factor according to each object Size each object is arranged;Continuous i object is selected from the n object arranged.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: Corresponding one or more property values are distinguished according to each object in n object, i are filtered out from n object individual right As including:Each object difference is obtained according to the calculating of the value of the corresponding one or more same alike results of each object corresponding Clustering factor;I object is filtered out from n object according to clustering factor and one or more attributes.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: Obtain the value that each object in n object distinguishes corresponding one or more attributes;According to the one of each object or many The value of individual attribute filters out i similar objects from n object.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: I similar objects are filtered out from n object according to the value of one or more attributes of each object includes:By one The object that the value of individual or multiple attributes drops into preset range is screened from n object as i object, wherein, Preset range is determined according to i value.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: I similar objects are filtered out from n object according to the value of one or more attributes of each object includes:According to The value of the corresponding one or more same alike results of each object, which is calculated, obtains the corresponding clustering factor of each object difference;Root I object is filtered out from n object according to clustering factor.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: I similar objects are filtered out from n object according to the value of one or more attributes of each object includes:According to The value of the corresponding one or more same alike results of each object, which is calculated, obtains the corresponding clustering factor of each object difference;Root I object is filtered out from n object according to clustering factor and one or more attributes.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: The value of corresponding one or more attributes is distinguished according to each object in n object, i are filtered out from n object Object;Similarity is calculated two-by-two to i object.
Above-mentioned the embodiment of the present application sequence number is for illustration only, and the quality of embodiment is not represented.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment The part of detailed description, may refer to the associated description of other embodiment.
, can be by other in several embodiments provided herein, it should be understood that disclosed technology contents Mode realize.Wherein, device embodiment described above is only schematical, such as described unit or module Division, only a kind of division of logic function can have other dividing mode when actually realizing, such as multiple lists Member or module or component can combine or be desirably integrated into another system, or some features can be ignored, or not hold OK.Another, shown or discussed coupling or direct-coupling or communication connection each other can be by some The INDIRECT COUPLING of interface, module or unit or communication connection, can be electrical or other forms.
The unit illustrated as separating component or module can be or may not be it is physically separate, as The part that unit or module are shown can be or may not be physical location or module, you can with positioned at a place, Or can also be distributed in multiple NEs or module.Can select according to the actual needs part therein or Whole units or module realize the purpose of this embodiment scheme.
In addition, each functional unit or module in the application each embodiment can be integrated in a processing unit or mould , can also two or more units or module collection in block or unit or module are individually physically present In Cheng Yi unit or module.Above-mentioned integrated unit or module can both be realized in the form of hardware, can also Realized in the form of SFU software functional unit or module.
If the integrated unit realized using in the form of SFU software functional unit and as independent production marketing or in use, It can be stored in a computer read/write memory medium.Understood based on such, the technical scheme essence of the application On all or part of the part that is contributed in other words to prior art or the technical scheme can be with software product Form is embodied, and the computer software product is stored in a storage medium, including some instructions are to cause one Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application State all or part of step of method.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD Etc. it is various can be with the medium of store program codes.
Described above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art For member, on the premise of the application principle is not departed from, some improvements and modifications can also be made, these improve and moistened Decorations also should be regarded as the protection domain of the application.

Claims (21)

1. a kind of similarity processing method, it is characterised in that including:
Design conditions are obtained, wherein, in the case where the design conditions are satisfied, it can calculate similar two-by-two The maximum of the object number of degree is k;
I object is filtered out from n object according to the design conditions, wherein, i is less than or equal to n, and i is small In equal to k;
Similarity is calculated two-by-two to the i object.
2. according to the method described in claim 1, it is characterised in that the design conditions include at least one of:With In the resource for calculating similarity, the time for calculating similarity, the scale for calculating similarity.
3. method according to claim 1 or 2, it is characterised in that filter out the i from the n object Individual object includes:
The value of corresponding one or more attributes is distinguished according to each object in the n object, from the n I object is filtered out in object.
4. method according to claim 3, it is characterised in that right respectively according to each object in the n object The value for the one or more attributes answered, from the n object filtering out i object includes:
The object that the value of one or more attributes is dropped into preset range screens work from the n object For the i object, wherein, the preset range is determined according to the value of the i.
5. method according to claim 3, it is characterised in that right respectively according to each object in the n object The one or more property values answered, from the n object filtering out i object includes:
Calculated according to the value of the corresponding one or more same alike results of each object and obtain each object point Not corresponding clustering factor;
The i object is filtered out from the n object according to the clustering factor.
6. method according to claim 5, it is characterised in that according to the clustering factor from the n object Filtering out the i object includes:
The size for distinguishing corresponding clustering factor according to each object is arranged each object;
The continuous i object is selected from the n object arranged.
7. method according to claim 3, it is characterised in that right respectively according to each object in the n object The one or more property values answered, from the n object filtering out i object includes:
Calculated according to the value of the corresponding one or more same alike results of each object and obtain each object point Not corresponding clustering factor;
The i are filtered out from the n object according to the clustering factor and one or more of attributes Object.
8. a kind of similarity processing unit, it is characterised in that including:
First acquisition module, for obtaining design conditions, wherein, in the case where the design conditions are satisfied, The maximum that the object number of similarity two-by-two can be calculated is k;
First screening module, for filtering out i object from n object according to the design conditions, wherein, I is less than or equal to n, and i is less than or equal to k;
First computing module, for calculating similarity two-by-two to the i object.
9. device according to claim 8, it is characterised in that the design conditions include at least one of:With In the resource for calculating similarity, the time for calculating similarity, the scale for calculating similarity.
10. device according to claim 8 or claim 9, it is characterised in that first screening module is used for according to described Each object distinguishes the value of corresponding one or more attributes in n object, and i is filtered out from the n object Individual object.
11. device according to claim 10, it is characterised in that first screening module is used for will be one or more The object that the value of attribute drops into preset range is screened from the n object as the i object, Wherein, the preset range is determined according to the value of the i.
12. device according to claim 10, it is characterised in that first screening module includes:
First computing unit, for being calculated according to the value of the corresponding one or more same alike results of each object Obtain each object and distinguish corresponding clustering factor;
First screening unit is individual right for filtering out the i from the n object according to the clustering factor As.
13. device according to claim 12, it is characterised in that first screening unit includes:
Arrangement units, for distinguishing the size of corresponding clustering factor according to each object to described each right As being arranged;
Selecting unit, for selecting the continuous i object from the n object arranged.
14. device according to claim 10, it is characterised in that first screening module includes:
Second computing unit, for being calculated according to the value of the corresponding one or more same alike results of each object Obtain each object and distinguish corresponding clustering factor;
Second screening unit, for right from the n according to the clustering factor and one or more of attributes The i object is filtered out as in.
15. a kind of object screening technique, it is characterised in that including:
Obtain the value that each object in n object distinguishes corresponding one or more attributes;
I is filtered out from the n object according to the value of one or more attributes of each object individual similar Object.
16. method according to claim 15, it is characterised in that according to one or more attributes of each object Value the i similar objects filtered out from the n object include:
The object that the value of one or more attributes is dropped into preset range screens work from the n object For the i object, wherein, the preset range is determined according to the value of the i.
17. method according to claim 15, it is characterised in that according to one or more attributes of each object Value the i similar objects filtered out from the n object include:
Calculated according to the value of the corresponding one or more same alike results of each object and obtain each object point Not corresponding clustering factor;
The i object is filtered out from the n object according to the clustering factor.
18. method according to claim 15, it is characterised in that according to one or more attributes of each object Value the i similar objects filtered out from the n object include:
Calculated according to the value of the corresponding one or more same alike results of each object and obtain each object point Not corresponding clustering factor;
The i are filtered out from the n object according to the clustering factor and one or more of attributes Object.
19. a kind of similarity processing method, it is characterised in that including:
The value of corresponding one or more attributes is distinguished according to each object in n object, from the n object In filter out i object;
Similarity is calculated two-by-two to the i object.
20. a kind of object screening plant, it is characterised in that including:
Second acquisition module, corresponding one or more attributes are distinguished for obtaining each object in n object Value;
Second screening module, the value for one or more attributes according to each object is right from the n I similar objects are filtered out as in.
21. a kind of similarity processing unit, it is characterised in that including:
3rd screening module, for distinguishing corresponding one or more attributes according to each object in n object Value, i object is filtered out from the n object;
Second computing module, for calculating similarity two-by-two to the i object.
CN201610174122.1A 2016-03-24 2016-03-24 Similarity processing method, object screening technique and device Pending CN107229640A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201610174122.1A CN107229640A (en) 2016-03-24 2016-03-24 Similarity processing method, object screening technique and device
TW106106682A TW201800966A (en) 2016-03-24 2017-03-01 Similarity processing method and object screening method and device
PCT/CN2017/076424 WO2017162063A1 (en) 2016-03-24 2017-03-13 Similarity processing method and object screening method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610174122.1A CN107229640A (en) 2016-03-24 2016-03-24 Similarity processing method, object screening technique and device

Publications (1)

Publication Number Publication Date
CN107229640A true CN107229640A (en) 2017-10-03

Family

ID=59899348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610174122.1A Pending CN107229640A (en) 2016-03-24 2016-03-24 Similarity processing method, object screening technique and device

Country Status (3)

Country Link
CN (1) CN107229640A (en)
TW (1) TW201800966A (en)
WO (1) WO2017162063A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190007A (en) * 2018-07-20 2019-01-11 阿里巴巴集团控股有限公司 Data analysing method and device
CN111414949A (en) * 2020-03-13 2020-07-14 杭州海康威视***技术有限公司 Picture clustering method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722567A (en) * 2012-05-30 2012-10-10 杭州遥指科技有限公司 Method and device for screening in-station information
CN103559262A (en) * 2013-11-04 2014-02-05 北京邮电大学 Community-based author and academic paper recommending system and recommending method
CN103617192A (en) * 2013-11-07 2014-03-05 北京奇虎科技有限公司 Method and device for clustering data objects
CN105074664A (en) * 2013-02-11 2015-11-18 亚马逊科技公司 Cost-minimizing task scheduler
CN105184307A (en) * 2015-07-27 2015-12-23 蚌埠医学院 Medical field image semantic similarity matrix generation method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488789B (en) * 2013-10-08 2017-08-18 百度在线网络技术(北京)有限公司 Recommendation method, device and search engine
CN104699725B (en) * 2013-12-10 2018-10-09 阿里巴巴集团控股有限公司 data search processing method and system
CN104978553B (en) * 2014-04-08 2019-05-28 腾讯科技(深圳)有限公司 The method and device of image analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722567A (en) * 2012-05-30 2012-10-10 杭州遥指科技有限公司 Method and device for screening in-station information
CN105074664A (en) * 2013-02-11 2015-11-18 亚马逊科技公司 Cost-minimizing task scheduler
CN103559262A (en) * 2013-11-04 2014-02-05 北京邮电大学 Community-based author and academic paper recommending system and recommending method
CN103617192A (en) * 2013-11-07 2014-03-05 北京奇虎科技有限公司 Method and device for clustering data objects
CN105184307A (en) * 2015-07-27 2015-12-23 蚌埠医学院 Medical field image semantic similarity matrix generation method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190007A (en) * 2018-07-20 2019-01-11 阿里巴巴集团控股有限公司 Data analysing method and device
CN109190007B (en) * 2018-07-20 2022-10-04 创新先进技术有限公司 Data analysis method and device
CN111414949A (en) * 2020-03-13 2020-07-14 杭州海康威视***技术有限公司 Picture clustering method and device, electronic equipment and storage medium
CN111414949B (en) * 2020-03-13 2023-06-27 杭州海康威视***技术有限公司 Picture clustering method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2017162063A1 (en) 2017-09-28
TW201800966A (en) 2018-01-01

Similar Documents

Publication Publication Date Title
CN106326248B (en) The storage method and device of database data
CN109902708A (en) A kind of recommended models training method and relevant apparatus
CN106228386A (en) A kind of information-pushing method and device
CN106503006A (en) The sort method and device of application App neutron applications
CN107545315A (en) Order processing method and device
CN107895038A (en) A kind of link prediction relation recommends method and device
CN106874355A (en) The collaborative filtering method of social networks and user&#39;s similarity is incorporated simultaneously
CN111931053A (en) Item pushing method and device based on clustering and matrix decomposition
CN107437095A (en) Classification determines method and device
CN113761359B (en) Data packet recommendation method, device, electronic equipment and storage medium
CN109784394A (en) A kind of recognition methods, system and the terminal device of reproduction image
CN109087138A (en) Data processing method and system, computer system and readable storage medium storing program for executing
CN112465533A (en) Intelligent product selection method and device and computing equipment
CN108628721A (en) Method for detecting abnormality, device, storage medium and the electronic device of user data value
CN113379530A (en) User risk determination method and device and server
CN107248023A (en) A kind of screening technique and device to mark enterprise list
CN109522919A (en) A kind of data assessment method and device
CN106657062A (en) Method and device for user identification
CN107229640A (en) Similarity processing method, object screening technique and device
CN110288465A (en) Object determines method and device, storage medium, electronic device
CN106503271A (en) The intelligent shop site selection system of subspace Skyline inquiry under mobile Internet and cloud computing environment
CN106681803A (en) Task scheduling method and server
CN110457387A (en) A kind of method and relevant apparatus determining applied to user tag in network
CN109657950A (en) Hierarchy Analysis Method, device, equipment and computer readable storage medium
CN107862412A (en) A kind of data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171003

RJ01 Rejection of invention patent application after publication