CN107229640A - Similarity processing method, object screening technique and device - Google Patents
Similarity processing method, object screening technique and device Download PDFInfo
- Publication number
- CN107229640A CN107229640A CN201610174122.1A CN201610174122A CN107229640A CN 107229640 A CN107229640 A CN 107229640A CN 201610174122 A CN201610174122 A CN 201610174122A CN 107229640 A CN107229640 A CN 107229640A
- Authority
- CN
- China
- Prior art keywords
- value
- similarity
- attributes
- filtered out
- clustering factor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of similarity processing method, object screening technique and device.The similarity processing method includes:Design conditions are obtained, wherein, in the case where design conditions are satisfied, the maximum that can calculate the object number of similarity two-by-two is k;I object is filtered out from n object according to design conditions, wherein, i is less than or equal to n, and i is less than or equal to k;Similarity is calculated two-by-two to i object.Present application addresses the larger caused technical problem of Similarity Measure.
Description
Technical field
The application is related to Similarity Measure field, in particular to a kind of similarity processing method, object screening side
Method and device.
Background technology
In the prior art, the process for calculating cosine similarity is not difficult in itself, but under the background that big data is applied, association
It is the problem of calculating performance to filter the main bottleneck faced together.Applicating cooperation filtering needs to calculate once between individual two-by-two
Similarity, it is assumed that have N number of object, then computation complexity is N2。
Inventor has found that in actual application, calculation scale is than larger.So that scene is recommended by Taobao as an example, such as
Fruit uses the collaborative filtering based on commodity, the online commodity of Taobao 800,000,000, then computation complexity then for 800,000,000 it is flat
Side, this calculation scale is unaffordable.This large-scale calculating can cause the presence of some problems, for example,
Need carry out computation complexity for 800,000,000 square calculating, then be accomplished by substantial amounts of server, if at present layout
Server be insufficient to many, will cause server unanimously in computing at full capacity, will be unable to for other requests
Response, can cause bad consequence to occur.
In addition, inventor also found, in other scenes, it is also possible to there are some requirements to calculating, for example,
Requirement of calculating time etc., time requirement is unable to reach if calculation scale is larger.
For Similarity Measure in correlation technique it is larger caused by the problem of, effective solution party is not yet proposed at present
Case.
The content of the invention
The embodiment of the present application provides a kind of similarity processing method, object screening technique and device, at least to solve phase
The problem of like caused by degree calculation scale is larger.
According to the one side of the embodiment of the present application there is provided a kind of similarity processing method, this method includes:Obtain
Design conditions, wherein, in the case where the design conditions are satisfied, the object number of similarity two-by-two can be calculated
Maximum be k;I object is filtered out from n object according to the design conditions, wherein, i is less than or equal to
N, i are less than or equal to k;Similarity is calculated two-by-two to the i object.
According to the another aspect of the embodiment of the present application, a kind of similarity processing unit is additionally provided, the device includes:The
One acquisition module, for obtaining design conditions, wherein, in the case where the design conditions are satisfied, it can calculate
The maximum of the object number of similarity is k two-by-two;First screening module, for individual from n according to the design conditions
I object is filtered out in object, wherein, i is less than or equal to n, and i is less than or equal to k;Second computing module, for institute
State i object and calculate similarity two-by-two.
According to the one side of the embodiment of the present application, a kind of similarity processing method is additionally provided, this method includes:One
Plant object screening technique, it is characterised in that including:Obtain each object difference in n object corresponding one or many
The value of individual attribute;I are filtered out from the n object according to the value of one or more attributes of each object
Similar object.
According to the one side of the embodiment of the present application, a kind of similarity processing method is additionally provided, this method includes:Root
The value of corresponding one or more attributes is distinguished according to each object in n object, i is filtered out from the n object
Individual object;Similarity is calculated two-by-two to the i object.
According to the one side of the embodiment of the present application, a kind of object screening plant is additionally provided, the device includes:Second
Acquisition module, the value of corresponding one or more attributes is distinguished for obtaining each object in n object;Second screening
Module, the value for one or more attributes according to each object filters out i phase from the n object
As object.
According to the one side of the embodiment of the present application, a kind of similarity processing unit is additionally provided, the device includes:The
Three screening modules, the value for distinguishing corresponding one or more attributes according to each object in n object, from described
I object is filtered out in n object;Second computing module, for calculating similarity two-by-two to the i object.
In the embodiment of the present application, using according to design conditions come the radical of the object of similarity two-by-two of Reduction Computation as far as possible,
The purpose for calculating object is rationally screened according to design conditions so as to reach, Similarity Measure in correlation technique is solved and advises
The problem of caused by mould is larger.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please is used to explain the application, does not constitute the improper restriction to the application.In accompanying drawing
In:
Fig. 1 is a kind of hardware block diagram of the computer equipment of similarity processing method according to the embodiment of the present application;
Fig. 2 is the flow chart of the similarity processing method according to the application first embodiment;
Fig. 3 is the schematic diagram of multiple server load balancings according to the embodiment of the present application;
Fig. 4 is the flow chart of the similarity processing method according to the application second embodiment;
Fig. 5 is the flow chart of the similarity processing method according to the application 3rd embodiment;
Fig. 6 is the schematic diagram of the similarity processing procedure according to the embodiment of the present application;
Fig. 7 is the flow chart of the object screening technique according to the embodiment of the present application;
Fig. 8 is the flow chart of the similarity processing method according to the application fourth embodiment
Fig. 9 is the schematic diagram of the similarity processing unit according to the application first embodiment;
Figure 10 is the schematic diagram of the similarity processing unit according to the application second embodiment;
Figure 11 is the schematic diagram of the similarity processing unit according to the application 3rd embodiment;
Figure 12 is the schematic diagram of the similarity processing unit according to the application fourth embodiment;
Figure 13 is the schematic diagram of the object screening plant according to the embodiment of the present application;
Figure 14 is the schematic diagram of the similarity processing unit according to the embodiment of the application the 5th;And
Figure 15 is a kind of structured flowchart of computer equipment according to the embodiment of the present application.
Embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment
The only embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to
The scope of the application protection.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this
The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except
Here the order beyond those for illustrating or describing is implemented.In addition, term " comprising " and " having " and they
Any deformation, it is intended that covering is non-exclusive to be included, for example, containing series of steps or module or the mistake of unit
Journey, method, system, product or equipment are not necessarily limited to those steps clearly listed or module or unit, but can
Including it is not listing clearly or for the intrinsic other steps of these processes, method, product or equipment or module or
Unit.
Embodiment 1
According to the embodiment of the present application, a kind of embodiment of the method for similarity processing method is additionally provided, it is necessary to illustrate,
It can be performed the step of the flow of accompanying drawing is illustrated in the computer system of such as one group computer executable instructions,
And, although logical order is shown in flow charts, but in some cases, can be with suitable different from herein
Sequence performs shown or described step.
For the ease of description, at this to the invention relates to several terms illustrate:
Collaborative filtering:The similar population with individual is found, therefore calculates the similarity between two individuals, is to cooperate with
Filter core missions.
Pre- cluster:According to business scenario, further the object to be studied is clustered, each individual only with it
Other individuals in same clustering cluster carry out Similarity Measure.
Cosine similarity algorithm:For calculating the similarity between two individuals.In the examples below not to phase
It is improved like degree computational methods, but alphabetical institute's generation in the method for utilizing existing Similarity Measure, below equation
The implication of table will be understood by those skilled in the art, therefore, just repeat no more in the present embodiment.
Existing and future all similarity algorithms can apply the numerical procedure in following the present embodiment.
The formula of conventional similarity calculating method has following several:
Euclidean distance:
Pearson correlation coefficient:
Cosine similarity algorithm:
Tanimoto coefficients:
The embodiment of the method that the embodiment of the present application is provided can carry out computing on the server, preferably make to provide
With experience, operation result inquiry service can also be provided, for example, clothes can be checked by webpage or client
The operation result being engaged on device.Server can be understood as a kind of computer.Certainly, with the development of technology, cloud computing
Having obtained the method provided in increasingly wider application, the embodiment of the present application can also promote the use of in cloud computing.
The computing capability of terminal can also strengthen with the development of technology, be counted when terminal can get corresponding data
During calculation, for example, terminal can include but is not limited to:Mobile phone, tablet personal computer and other portable sets.But
It is that one kind is preferably selected in server disposition the application following examples for now to be.
Under the conditions of current technology, the hardware structure that server, terminal, cloud computing are relied on all be it is similar,
A kind of computer equipment can be regarded as.The embodiment of the present application can be performed in this computer equipment.With technology
Development, computer equipment hardware structure changed, or occurs in that the arithmetic facility of new architecture, and the application is real
Applying example can also implement.Illustrated below by taking the framework of the computing device in Fig. 1 as an example.
Fig. 1 is a kind of hardware block diagram of the computer equipment of similarity processing method according to the embodiment of the present application.
As shown in figure 1, computer equipment 1 can include one or more (one is only shown in figure) processors 102 (place
Reason device 102 can include but is not limited to the processing unit of Micro-processor MCV or PLD FPGA etc.), use
Memory 104 in data storage and the transport module 106 for communication function.Those of ordinary skill in the art can
To understand, the structure shown in Fig. 1 is only signal, and it does not cause to limit to the structure of above-mentioned electronic installation.For example,
Computer equipment 1 may also include than shown in Fig. 1 more either less components or with different from shown in Fig. 1
Configuration.
Memory 104 can be used for similar in the software program and module of storage application software, such as the embodiment of the present application
Corresponding programmed instruction/the module of processing method is spent, processor 102 is stored in the software journey in memory 104 by operation
Sequence and module, so as to perform various function application and data processing, that is, realize the similarity of above-mentioned application program
Processing method.Memory 104 may include high speed random access memory, may also include nonvolatile memory, such as one or
The multiple magnetic storage devices of person, flash memory or other non-volatile solid state memories.In some instances, memory
104 can further comprise the memory remotely located relative to processor 102, and these remote memories can pass through net
Network is connected to computer equipment 1.The example of above-mentioned network include but is not limited to internet, intranet, LAN,
Mobile radio communication and combinations thereof.
Transmitting device 106 is used to data are received or sent via a network.Above-mentioned network instantiation may include
The wireless network that the communication providerses of computer equipment 1 are provided.In an example, transmitting device 106 includes one
Network adapter (Network Interface Controller, NIC), it can pass through base station and other network equipments
It is connected to be communicated with internet.In an example, transmitting device 106 can be radio frequency (Radio
Frequency, RF) module, it is used to wirelessly be communicated with internet.In another embodiment,
Transmitting device 106 can also be a wired interface module, be communicated by wired mode and internet.
Under above-mentioned running environment, similarity processing method as shown in Figure 2 is present embodiments provided.Fig. 2 is basis
The flow chart of the similarity processing method of the application first embodiment, as shown in Fig. 2 the flow can include following step
Suddenly:
Step S202, obtains design conditions, wherein, in the case where design conditions are satisfied, two two-phases can be calculated
Seemingly the maximum of the object number of degree is k.
As an alternative embodiment, design conditions can include at least one of:For calculating similarity
Resource, the time for calculating similarity, the scale for calculating similarity.For example, design conditions can calculate similarity
Time within 3 seconds, or calculate similarity operation times within 1,000,000 times, design conditions can be used for
Calculate similarity resource, calculate similarity time, calculate similarity scale in one or more.Calculate
Condition can also be the condition that other types can be construed as limiting to the calculating of similarity.
The present embodiment is explained by taking computing capability as an example to design conditions.Generally computing capability is by calculating institute
What the resource that can be used was determined, now, when computing capability represents that k takes maximum, the k object that can be calculated
Similarity two-by-two.
The computing capability can be the computing capability of a server, as an alternative embodiment, computing capability
When representing that k takes maximum, the similarity two-by-two for the k object that can be calculated, computing capability can be the meter of server
Calculation ability, for example, shared k individual, in Similarity Measure, it is necessary to any two individual in k object
Between calculate a similarity, the complexity of calculating is k2If the computing capability maximum of server can reach calculating
k2Similarity, then computing capability be k.
For example, a server computational power is that can carry out the calculating that complexity is 10000, then, k value
Maximum is just 100.Either, computing capability can also the direct maximum similarity two-by-two that use can be calculated number
To be indicated, for example, the computing capability of a server is exactly the similarity two-by-two of 100 objects of max calculation.
The computing capability can also the computing capability that is provided of multiple servers, at this point it is possible to according to the meter of the multiple servers
Calculation ability calculates k values, then carries out load balancing according to the computing capability of each server of multiple servers.
Fig. 3 is the schematic diagram of multiple server load balancings according to the embodiment of the present application, as shown in figure 3, server
Group or cloud computing can jointly be completed by multiple servers, wherein, the computing capability of server 1 is 2000, service
The computing capability of device 2 is 3000, and the computing capability of server 3 is 1000, and the computing capability of server 4 is 4000,
Therefore using four servers as from the point of view of server zone or cloud computing, the computing capability of the integrity service device is 1W.It is logical
The calculating of similarity two-by-two of k object can preferably be completed by crossing the computing capability of multiple servers offer.Certainly,
For Cloud Server, it is how to carry out load balancing that developer, which may not be needed excessively to be concerned about inside Cloud Server,
The maximum computing capability that there is provided according to Cloud Server is only needed to calculate.If the computing capability of server occurs
Change, can readjust k values according to the computing capability of the server after change.
In another optional embodiment, k value can be a server or multiple servers or cloud clothes
Business device carries out value during Similarity Measure completely.Certainly, sometimes some or some server it is also required to provide other
Service, for example, the resource of server 20% needs to distribute to other services to be calculated, then meter now
Calculation ability is accomplished by being calculated according to the resource of the server 80%, for multiple servers or Cloud Server
It is identical.
Step S204, i object is filtered out according to design conditions from n object, wherein, i is less than or equal to n, i
Less than or equal to k.
It can be sieved according to computing capability from n object to filter out i object from n object according to design conditions
I object is selected, wherein, i is less than or equal to n, and i is less than or equal to k.
Indicated that when i is equal to k and be desirable with all available computing capabilitys to carry out the calculating of similarity two-by-two.
When actually implementing, the i values less than k can also be selected, thus it is possible, on the one hand, result of calculation can be obtained faster;
On the other hand, it can also reduce and calculate brought resource consumption, the resource of saving can service for carrying out other.
As optional embodiment, if selected for the i values less than k, brought resource can be reduced with computation complexity
Save, be supplied to other to service the resource.
Step S206, similarity is calculated to i object two-by-two.
By above-mentioned steps, an appropriate number of object is screened from substantial amounts of object according to computing capability, two are then carried out
Two Similarity Measures, solve the problem of Similarity Measure in the prior art is larger caused, so as to reduce phase
The complexity calculated like degree.
In above-mentioned steps S204, i object is filtered out from n object according to computing capability.The mode of screening has
Many kinds, if for example, simple in order to reduce calculation scale, then i can be filtered out from n object at random individual
Object is calculated.Such result of calculation is also what is had the certain significance to a certain extent.In one example,
Have much if only ratio substantially similar in 100W answer for wanting to know about certain network surveying, be now not necessarily to
The calculating of similarity two-by-two is all carried out to 100w answer, but therefrom randomly chooses 10w to be calculated, now
Result can also represent the result intentionally got.
Can also not be very high to filter out similarity by way of filtering in other optional embodiment
Object, then the object higher to remaining similarity is calculated again.Illustrated below by way of several examples:
For another example the value of corresponding one or more attributes is distinguished according to each object in n object, from n object
In filter out i object.Each object in n object has one or more attributes, according to each object
The value of one or more attributes is screened.
N object all has the first attribute, and it is individual right that i is filtered out from n object according to the first attribute of n object
As that can screen, can also be segmented according to actual needs or single-point screening in certain interval range.For example, n
Individual object all has the first attribute and the second attribute, according to the first attribute and the second attribute of n object from n object
In filter out i object, filtered out first from n object meet the first attribute selection condition a object, so
Screening meets i object of the second attribute selection condition from a object afterwards, wherein, i≤a.Equally, if needed
I object is filtered out from n object according to more attributes, then can be successively according to each attribute from n object
Middle screening.
, first can be with if selecting 200 objects from 2000 objects by height, three attributes of body weight and age
More than 1.2 meters of object is picked out according to height, 1000 is picked out, then according to body weight, picks out 30 kilograms
Object above, picks out 600, then picks out the age in the object of 8 years old and 12 years old according to the age, pick out 200
Individual object, calculates the similarity for this 200 objects picked out.It can also be selected 8 years old when selecting object according to the age
To the object of 12 years old, the value of screening attribute and attribute can be changed according to specific purposes.By filtering out appropriate object meter
Similarity is calculated, the complexity of calculating can be reduced.
As an alternative embodiment, distinguishing corresponding one or more attributes according to each object in n object
Value, i object is filtered out from n object to be included:The value of one or more attributes is dropped into preset range
Object is screened from n object as i object, wherein, preset range is determined according to i value.
By taking the similarity between calculating Taobao shop as an example, for example, one has 100,000 shops, but computation complexity is too
Height, existing computing capability can only calculate the similarity in 50,000 shops, therefore it is close to want preliminary screening to go out property value,
50,000 higher shops of similarity carry out Similarity Measure, and this 100,000 shops have with multiple attributes, for example,
Two attributes of quantity and every daily sales of daily sale kinds of goods.If only according to the quantity of sale kinds of goods if can be from
50,000 shops are filtered out in 100000 shops, just without using sales volume.If can not only be screened according to the quantity of kinds of goods
Go out 50,000 shops, for example, the shop filtered out according to item quantity has 70,000, then can be further according to sales volume
Further screened, if still can not screened according to sales volume, can be screened further according to other attributes.
Gradually the quantity of increase screening attribute, the number of objects of computing capability is met with acquisition.Wherein, preset range is basis
What i value was determined, according to the difference of i quantity, preset range can be adjusted, for example, screening shop with item quantity
During paving, if limiting commodity amount as 100-200 parts, 60,000 shops are filtered out, then are adjusted to commodity amount
100-150 parts, to filter out 50,000 shops.
For example, selecting 200 objects from 2000 objects by height, three attributes of body weight and age, first may be used
To pick out more than 1.2 meters of object according to height, 1000 are picked out, then according to body weight, 30,000 are picked out
Object more than gram, picks out 600, then picks out object of the age between 8 years old to 12 years old according to the age, chooses
Select 300.More than 200, at this point it is possible to which age attribute is adjusted to 9 years old to 11 years old, then pick out
200 objects, calculate the similarity for this 200 objects picked out.
Screened from n object by the way that the value of one or more attributes to be dropped into the object of preset range and be used as i
Individual object, filters out appropriate number of object, then calculates similarity, can be while ensureing that data result is accurate
Meet the requirement of computing capability, using according to computing capability come the radical of the object of similarity two-by-two of Reduction Computation as far as possible,
The purpose for calculating object is rationally screened according to computing capability so as to reach, Similarity Measure in correlation technique is solved and advises
The problem of caused by mould is larger.
Calculated according to the value of the corresponding one or more same alike results of each object and obtain the corresponding cluster of each object difference
The factor;I object is filtered out from n object according to clustering factor.Fig. 4 is according to the application second embodiment
The flow chart of similarity processing method.As shown in figure 4, the flow may include steps of:
Step S301, calculates according to the value of the corresponding one or more same alike results of each object and obtains each object difference
Corresponding clustering factor.
Step S302, i object is filtered out according to clustering factor from n object.
It is then right from n according to clustering factor by obtaining the corresponding clustering factor of each object by above step
I object is filtered out as in, can be needed to be adjusted clustering factor according to user, make the phase of object filtered out
Like Du Genggao, and then reduce Similarity Measure scale.
In an optional embodiment, calculated according to the value of the corresponding one or more same alike results of each object
To each object, corresponding clustering factor can be respectively:Calculated according to the value weighted sum of multiple same alike results and obtain each
Object distinguishes corresponding clustering factor, and an appropriate number of object is filtered out from multiple objects according to obtained clustering factor
To calculate similarity between any two, that is, i object is filtered out from n object according to clustering factor.Wherein,
Each object may have multiple same alike results, it is also possible to without some attribute, if without some attribute, can
So that the property value is designated as into zero when calculating clustering factor.
For example, 100,000 shops altogether, but computation complexity is too high, existing computing capability can only calculate 50,000
The similarity in shop, therefore to filter out that property value is close, 50,000 higher shops of similarity carry out Similarity Measure,
This 100,000 shops all have multiple attributes, for example, two attributes of quantity and every daily sales of daily sale kinds of goods.
Similarity is calculated in order to obtain the more shops of common trait, can be carried out according to the quantity and sales volume of sale kinds of goods
Weighted sum is calculated as clustering factor, then filters out i object from n object further according to clustering factor.Weighting
It can be that weighting parameters are determined according to specific object with calculating, weighting parameters or calculating can also be pre-entered by user
Model calculates the clustering factor in shop, and i object is then filtered out from n object according to the clustering factor calculated.
, can be with for another example select 200 objects from 2000 objects by height, three attributes of body weight and age
Height, the important coefficient of three attributes of body weight and age are set, that is, corresponding one or many according to each object
The value of individual same alike result, which is calculated, obtains the corresponding clustering factor of each object difference, for example, passing through height × 0.5+ body weight
× 0.2+ age × 0.3 obtains clustering factor, and 2000 objects are arranged according to clustering factor, 200 are therefrom chosen
Individual calculation and object similarity, 200 objects of selection can be continuous or be segmented discontinuous.
Fig. 5 is the flow chart of the similarity processing method according to the application 3rd embodiment.As shown in figure 5, the flow
It may include steps of:
Step S401, the size for distinguishing corresponding clustering factor according to each object is arranged each object.
According to each object respectively corresponding clustering factor size to each object carry out arrangement can be:Calculating
During the corresponding clustering factor of each object, all objects are arranged from small to large according to the size of clustering factor, Huo Zhecong
Minispread is arrived greatly.
Step S402, selects continuous i object from the n object arranged.
As an alternative embodiment, filtering out i object from n object according to clustering factor includes:Press
The size for distinguishing corresponding clustering factor according to each object is arranged each object;From the n object arranged
Select continuous i object.
For example, being weighted and calculating as clustering factor, then according to every according to the quantity and sales volume of sale kinds of goods
The size of the corresponding clustering factor in individual shop is arranged each shop, for example, right from small to large according to clustering factor
Shop is arranged, and continuous i shop is then selected from the n shop arranged, to continuous i shop
Calculate similarity two-by-two.I continuous shops for meeting computing capability are selected from continuously arranged shop, due to choosing
The similarity highest in the i continuous shops gone out, therefore the technical scheme of the embodiment of the present application can reduce Similarity Measure
Complexity.
As an alternative embodiment, distinguishing corresponding one or more attributes according to each object in n object
Value, i object is filtered out from n object to be included:According to the corresponding one or more same alike results of each object
Value calculating obtains each object and distinguishes corresponding clustering factor;It is right from n according to clustering factor and one or more attributes
I object is filtered out as in.
For example, one has 100,000 shops, but computation complexity is too high, and existing computing capability can only calculate 5
The similarity in ten thousand shops, therefore to filter out that property value is close, 50,000 higher shops of similarity carry out similarity
Calculate, this 100,000 shops all have multiple attributes, for example, the quantity and every daily sales two of daily sale kinds of goods
Individual attribute.It can be weighted and calculated as clustering factor according to the quantity and sales volume of sale kinds of goods, then root again
I object is filtered out from n object according to clustering factor and sales volume.For example, weighted sum computational methods are:Sale
Item quantity × 0.9+ sales volume × 0.1.In screening, attribute number and clustering factor can be adjusted according to i quantity
Computational methods, to filter out the object that similarity is higher.
The technical scheme of the application is sketched with reference to an optional embodiment:
Fig. 6 is the schematic diagram of the similarity processing procedure according to the embodiment of the present application.As shown in Figure 6, it is assumed that individual is total
Number is n, is designated as K1-Kn, it is necessary to K during calculating similarity1Respectively with K1-KnCalculate similarity two-by-two, collaboration
Filter algorithm needs to calculate the similarity between two individuals, though in the distributed frame of programming model (MapReduce)
Performed under frame, but same individual KiIt can be assigned in same reduction (Reduce), i.e. the Reduce most matters of fundamental importance
Calculate complexity and reach n2.The technical scheme of the embodiment of the present application can be divided into numeric type cluster and enumeration type clusters two kinds, its
In, first to full individual, by different clustering factors, (herein, clustering factor can be understood as not belonging to together data value cluster
Under the weighted sum of property, such as weighted sum of two attributes, extreme case, clustering factor can be an attribute, now
There is no weighted sum, clustering factor can be just an attribute) sequence, for example, need to find out in business 2m cluster because
Sub- Q similar individuals, all individuals choose individual by factor Q sequence in the object arranged according to size order
KiThe individual and lower neighbouring m individual of upper neighbouring m of line.In another example, it is still desirable to 2m individual, then, piece
Act type cluster gathers the individual of specific identical t enumerated value (it can be appreciated that property value) for same class (enumerated value
Polymerization), usual t and m has mutual restricting relation, and t demonstrates the need for more greatly the individual number with same alike result value can
Can be fewer, therefore, gather smaller for of a sort number of individuals m, can suitably be adjusted when there is strict limitation m value in business
Whole t value.Finally gather the computation complexity after class to be reduced to (2m)2, wherein m<<n.
By above-mentioned technical proposal, pre- cluster treatment technology can calculate number of individuals to lift computational efficiency in reduction, together
When, the processing mode that the individual that differs greatly is rejected in its individual contrast retained with adjacent features does not interfere with data result
Accuracy and reasonability, high similarity (Top the is similar) individual that can be calculated completely to full dose individual coincide.
A kind of optional application mode of the embodiment of the present application:By calculate that shop constituted based on classification it is similar exemplified by, first
A table T_tezheng is built, it there are three fields:Seller_id, cate_id, wgt, represent the ID in shop respectively,
The ID of classification and shop in such characteristic component (being standardized processing) now,
Available by cosine similarity is exactly similarity between two shops.Assuming that some classification logotype cate_id
Under shop number have 100w, its computation complexity is at least 10000w2, but this 10000w2Individual similar individuals are real
It is similar to individual itself also with regard to larger 1000 of similarity, similar individuals outside usually more than 1000 all can be by
Filtering.According to specific shop attribute, many clustering factors can be found out.Continuation is constituted phase with shop classification above
Exemplified by being calculated like degree, for example, in business, it is desirable to which two shop main management one-level industries are consistent, and business capacity will be approached,
Comparativity is just had, therefore shop can be gathered as clustering factor using managing two dimensions of classification and moon transaction value mainly
Class.Two fields are extended on T_tezheng this table, is calculated, assigned to according to the field of the table after extension
Data volume on each reduce can be many less, and the similar individuals that final goal excavates individual are often special with greater need for these
Levy similar constraint.In actual applications, according to different application scenarios different clustering factors can be specified to set
The processing scheme clustered in advance, it is ensured that the requirement for calculating performance is met while data result is accurate.
The embodiment of the present application can pass through SQL (Structured Query Language, referred to as SQL)
Open data processing service (Open Date Processing Service, referred to as ODPS) platform is deployed in realize.
The embodiment of the present application is after the principle of MapReduce processing Join processes is understood in depth, in good time with reference to business scenario,
Further the object to be studied is clustered, each individual with its other individual in same clustering cluster only with entering
Row Similarity Measure, it is final effectively to reduce the connecting key (Key) for being assigned to same Reduce.Assuming that computing capability
Receptible maximum computational complexity is K2As long as, then ensure that the number of individuals under each clustering cluster is no more than K.
By this screening mode, can according to computing capability come the radical of the object of similarity two-by-two of Reduction Computation as far as possible, from
And the purpose rationally screened according to computing capability and calculate object has been reached, solve Similarity Measure scale in correlation technique
The problem of caused by larger.
In the above-described embodiments, it is reduced the complexity of Similarity Measure by the screening of object.This screening side
Method can also be used alone, and any need can use this method when screening, and Fig. 7 is according to the application
The flow chart of the object screening technique of embodiment, as shown in fig. 7, the flow may include steps of:
Step S701, obtains the value that each object in n object distinguishes corresponding one or more attributes.
For example, one has n object, n object all has the value of one or more attributes, obtained in n object
The value of each object corresponding one or more attributes respectively can obtain in n object each attribute of object
Value or obtain the values of the corresponding multiple attributes of each object difference in n object, the value of the attribute of acquisition
The number condition that can be screened according to object determine.
Step S702, filters out i individual similar according to the value of one or more attributes of each object from n object
Object.
After the value for obtaining the corresponding one or more attributes of each object difference in n object, according to each object
The value of one or more attributes filters out i similar objects from n object.
I similar objects are filtered out from n object according to the value of one or more attributes of each object to be:
The object that the value of one or more attributes is dropped into preset range is screened from n object as i object,
Wherein, preset range is determined according to i value.
If first for example, select 200 objects from 2000 objects by height, three attributes of body weight and age
More than 1.2 meters of object can be first picked out according to height, 1000 are picked out, then according to body weight, picked out
More than 30 kilograms of object, picks out 600, then picks out the age in the object of 8 years old and 12 years old according to the age, chooses
200 objects are selected, the similarity for this 200 objects picked out is calculated.Can also when selecting object according to the age
The object of 8 years old to 12 years old is selected, can change the value of screening attribute and attribute to filter out appropriate number according to specific purposes
The object of amount.
I similar objects are filtered out from n object according to the value of one or more attributes of each object to be:
Calculated according to the value of the corresponding one or more same alike results of each object and obtain the corresponding clustering factor of each object difference;
I object is filtered out from n object according to clustering factor.
I similar objects are filtered out from n object according to the value of one or more attributes of each object to be:
Calculated according to the value of the corresponding one or more same alike results of each object and obtain the corresponding clustering factor of each object difference;
I object is filtered out from n object according to clustering factor and one or more attributes.
Object screening technique one in the embodiment of object screening technique degree processing method similar to above in the present embodiment
Cause, will not be repeated here.
In the embodiment of object screening technique, the result of object screening can be not used to calculate similarity, can be only
It is, as data storage or pretreatment, during follow-up, the calculating of similarity can be carried out when needing
Or the analysis of sample.The object screening technique can apply in several scenes, be adjusted for example, can apply in sampling
Check the mark in analysis.
The present embodiment additionally provides a kind of similarity processing method.In the method, the calculating of similarity can be without obtaining
Take design conditions, for example, design conditions are sufficient for the Similarity Measure of entire objects, but during due to actual calculating without
The similarity between whole objects need to be calculated, now, Similarity Measure need not obtain design conditions, and indicate according to right
The property value of elephant is screened, and then calculates similarity.Fig. 8 is handled according to the similarity of the application fourth embodiment
The flow chart of method, as shown in figure 8, the flow may include steps of:
Step S801:The value of corresponding one or more attributes is distinguished according to each object in n object, it is right from n
I object is filtered out as in.
I is filtered out from n object according to the value of the corresponding one or more attributes respectively of each object in n object
Individual object can be filtered out i similar objects or filter out the low object of i similarity, or also may be used
To be arbitrarily to filter out i object, the value of corresponding one or more attributes is distinguished according to each object in n object
Can be object of the value for the one or more attributes for screening object in some interval to screen object.
Step S802:Similarity is calculated two-by-two to i object.
Similarity is calculated two-by-two to i object after i object is filtered out.
In similarity processing method in the present embodiment, the implementation of the process degree processing method similar to above of object screening
Object screening process in example is consistent, will not be repeated here.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as to one it is
The combination of actions of row, but those skilled in the art should know, the application is not limited by described sequence of movement
System, because according to the application, some steps can be carried out sequentially or simultaneously using other.Secondly, art technology
Personnel should also know that embodiment described in this description belongs to preferred embodiment, involved action and module
And necessary to unit not necessarily the application.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The similarity processing method of example can add the mode of required general hardware platform to realize by software, naturally it is also possible to logical
Cross hardware, but the former is more preferably embodiment in many cases.Understood based on such, the technical scheme of the application
The part substantially contributed in other words to prior art can be embodied in the form of software product, the computer
Software product is stored in a storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions are to make
A station terminal equipment (can be mobile phone, computer, server, or network equipment etc.) perform the application each
Method described in embodiment.
Embodiment 2
According to the embodiment of the present application, a kind of similarity processing dress for being used to implement above-mentioned similarity processing method is additionally provided
Put, Fig. 9 is the schematic diagram of the similarity processing unit according to the application first embodiment, as shown in figure 9, the device
Including:First acquisition module 10, the first screening module 20 and the first computing module 30.
First acquisition module 10, for obtaining design conditions, wherein, in the case where the design conditions are satisfied,
The maximum that the object number of similarity two-by-two can be calculated is k.
First screening module 20, for i object to be filtered out from n object according to design conditions, wherein, i is small
In equal to n, i is less than or equal to k.
First computing module 30, for calculating similarity two-by-two to i object.
In the similarity processing unit of the embodiment, the first acquisition module 10 can be used for performing in the embodiment of the present application
Step S202, the step S204 that the first screening module 20 can be used for performing in the embodiment of the present application, first calculates mould
Block 30 can be used for performing the step S206 in the embodiment of the present application.
The embodiment of the present application obtains design conditions by the first acquisition module 10, wherein, it is satisfied in the design conditions
In the case of, the maximum that can calculate the object number of similarity two-by-two is k, and the first screening module 20 is according to calculating
Condition filters out i object from n object, wherein, i is less than or equal to n, and i is less than or equal to k, the first computing module
30 pairs of i objects calculate similarity two-by-two, it is achieved thereby that the technique effect of reduction Similarity Measure complexity, and then
Solve the larger caused technical problem of Similarity Measure.
In a kind of optional embodiment, design conditions include at least one of:For calculate similarity resource,
Calculate the time of similarity, calculate the scale of similarity.
In a kind of optional embodiment, the first screening module 20 is used for right respectively according to each object in n object
The value for the one or more attributes answered, filters out i object from n object.Wherein, it is each in n object
Object all has one or more attributes, and the first screening module 20 is entered according to the value of one or more attributes of each object
Row screening.
The embodiment of above-mentioned similarity processing method is equally applicable in similarity processing unit, for example, n object
All there is the first attribute, the first screening module 20 filters out i according to the first attribute of n object from n object
Individual object, can be screened in certain interval range, can also be segmented according to actual needs or single-point screening.Example again
Such as, n object all has the first attribute and the second attribute, according to the first attribute and the second attribute of n object from n
I object is filtered out in individual object, the first screening module 20 filters out satisfaction the first attribute sieve from n object first
A object of condition is selected, then screening meets i object of the second attribute selection condition from a object, wherein,
i≤a.Equally, then can basis successively if necessary to filtering out i object from n object according to more attributes
Each attribute is screened from n object.Multiple attributes can also be once limited, i are filtered out from n object individual right
As.
In a kind of optional embodiment, the first screening module 20 is pre- for the value of one or more attributes to be dropped into
The object for determining scope is screened from n object as i object, wherein, preset range is true according to i value
Fixed.
In a kind of optional embodiment, Figure 10 is showing according to the similarity processing unit of the application second embodiment
It is intended to.As shown in Figure 10, the similarity processing unit includes:First acquisition module 10, the He of the first screening module 20
First computing module 30.Wherein, the first screening module 20 includes:First computing unit 201 and the first screening unit
202。
The first acquisition module 10 in the embodiment, effect and the sheet of the first screening module 20 and the first computing module 30
Apply for that the effect of the similarity processing unit of first embodiment is identical.It will not be repeated here.
First computing unit 201, is obtained for being calculated according to the value of the corresponding one or more same alike results of each object
Each object distinguishes corresponding clustering factor.
First screening unit 202, for filtering out i object from n object according to clustering factor.
In a kind of optional embodiment, Figure 11 is showing according to the similarity processing unit of the application 3rd embodiment
It is intended to.As shown in figure 11, the similarity processing unit includes:First acquisition module 10, the He of the first screening module 20
First computing module 30.Wherein, the first screening module 20 includes:First computing unit 201 and the first screening unit
202, the first screening unit 202 includes arrangement units 2021 and selecting unit 2022.
Arrangement units 2021, the size for distinguishing corresponding clustering factor according to each object is arranged each object
Row.
Selecting unit 2022, for selecting continuous i object from the n object arranged.
In a kind of optional embodiment, Figure 12 is showing according to the similarity processing unit of the application fourth embodiment
It is intended to.As shown in figure 12, the similarity processing unit includes:First acquisition module 10, the He of the first screening module 20
First computing module 30.Wherein, the first screening module 20 includes:Second computing unit 203 and the second screening unit
204。
Second computing unit 203, is obtained for being calculated according to the value of the corresponding one or more same alike results of each object
Each object distinguishes corresponding clustering factor;
Second screening unit 204, for filtering out i from n object according to clustering factor and one or more attributes
Individual object.
The second computing unit 203 in the embodiment can be identical with the first computing unit 201 in above-described embodiment,
It can also differ, the second screening unit 204 can be identical with the first screening unit 202 in above-described embodiment,
It can differ.
According to the embodiment of the present application, a kind of object screening plant for being used to implement above-mentioned object screening technique is additionally provided,
Figure 13 is the schematic diagram of the object screening plant according to the embodiment of the present application, and as shown in figure 13, the device includes:The
Two acquisition modules 40 and the second screening module 50.
Second acquisition module 40, the value of corresponding one or more attributes is distinguished for obtaining each object in n object.
Second screening module 50, the value for one or more attributes according to each object is filtered out from n object
I similar objects.
In the object screening plant of the embodiment, the second acquisition module 40 can be used for performing the step in the embodiment of the present application
Rapid S701, the second screening module 50 can be used for performing the step S702 in the embodiment of the present application.
According to the embodiment of the present application, a kind of similarity processing dress for being used to implement above-mentioned similarity processing method is additionally provided
Put, Figure 14 is the schematic diagram of the similarity processing unit according to the embodiment of the application the 5th, as shown in figure 14, the dress
Put including:3rd screening module 60 and the second computing module 70.
3rd screening module 60, the value for distinguishing corresponding one or more attributes according to each object in n object,
I object is filtered out from n object.3rd screening module 60 can be with the first screening module in above-described embodiment
20 effect is identical.
Second computing module 70, for calculating similarity two-by-two to i object.
In the similarity processing unit of the embodiment, the 3rd screening module 60 can be used for performing in the embodiment of the present application
Step S801, the second computing module 70 can be used for performing the step S802 in the embodiment of the present application.
Embodiment 3
Embodiments herein also provides a kind of computer equipment, the computer equipment can be computer equipment group in
Any one computer equipment.Alternatively, in the present embodiment, above computer equipment can constitute server
Equipment or the equipment for constituting server cluster, or the equipment for constituting cloud computing.In other words, cloud service
Device is referred to as being computer equipment, need not will only be concerned about in these cloud computing servers for user
The composition of the particular hardware of equipment.Certainly, the development of this terminal computing capability, it is possible to which terminal can also be added to cloud meter
Come among calculating, now, above computer equipment can also be any one mobile terminal in mobile terminal group.
Alternatively, in the present embodiment, above computer equipment can be located in multiple network equipments of computer network
At least one network equipment.
Alternatively, Figure 15 is a kind of structured flowchart of computer equipment according to the embodiment of the present application.As shown in figure 15,
Computer equipment A can include:One or more (one is only shown in figure) processors 101, memory 103,
And transmitting device 105.
The application affairs interaction side that memory 103 can be used in storage software program and module, such as the embodiment of the present application
Method and the corresponding programmed instruction/module of device, processor 101 are stored in the software program in memory 103 by operation
And module, so as to perform various function application and data processing, that is, realize above-mentioned application affairs exchange method.
Memory 103 may include high speed random access memory, can also include nonvolatile memory, such as one or more magnetic
Property storage device, flash memory or other non-volatile solid state memories.In some instances, memory 103 can enter
One step includes the memory remotely located relative to processor 101, and these remote memories can be by network connection extremely
Computer equipment A.The example of above-mentioned network includes but is not limited to internet, intranet, LAN, mobile logical
Letter net and combinations thereof.
Above-mentioned transmitting device 105 is used to data are received or sent via a network.Above-mentioned network instantiation
It may include cable network and wireless network.In an example, transmitting device 105 includes a network adapter NIC,
It can be connected to be communicated with internet or LAN by netting twine and other network equipments with router.One
In individual example, transmitting device 105 is radio frequency module, and it is used to wirelessly be communicated with internet.
In the present embodiment,
Above computer equipment can perform the program code of following steps in the similarity processing method of application program:
Design conditions are obtained, wherein, in the case where design conditions are satisfied, the object of similarity two-by-two can be calculated
The maximum of number is k;I object is filtered out from n object according to design conditions, wherein, i is less than or equal to
N, i are less than or equal to k;Similarity is calculated two-by-two to i object.
Optionally, above computer equipment can also carry out the program code of following steps:Design conditions are included below extremely
It is one of few:Resource, the time of calculating similarity, the scale for calculating similarity for calculating similarity.
Optionally, above computer equipment can also carry out the program code of following steps:Filtered out from n object
I object includes:The value of corresponding one or more attributes is distinguished according to each object in n object, it is right from n
I object is filtered out as in.
Optionally, above computer equipment can also carry out the program code of following steps:According to each in n object
Object distinguishes the value of corresponding one or more attributes, and i object is filtered out from n object to be included:By one or
The object that the value of multiple attributes drops into preset range is screened from n object as i object, wherein, in advance
Determining scope is determined according to i value.
Optionally, above computer equipment can also carry out the program code of following steps:According to each in n object
Object distinguishes corresponding one or more property values, and i object is filtered out from n object to be included:According to each right
The corresponding clustering factor of each object difference is obtained as the value of corresponding one or more same alike results is calculated;According to cluster
The factor filters out i object from n object.
Optionally, above computer equipment can also carry out the program code of following steps:According to clustering factor from n
I object is filtered out in object to be included:Size according to each object corresponding clustering factor respectively is entered to each object
Row arrangement;Continuous i object is selected from the n object arranged.
Optionally, above computer equipment can also carry out the program code of following steps:According to each in n object
Object distinguishes corresponding one or more property values, and i object is filtered out from n object to be included:According to each right
The corresponding clustering factor of each object difference is obtained as the value of corresponding one or more same alike results is calculated;According to cluster
The factor and one or more attributes filter out i object from n object.
Above computer equipment can perform the program code of following steps in the similarity processing method of application program:
Obtain the value that each object in n object distinguishes corresponding one or more attributes;According to the one of each object
Or the value of multiple attributes filters out i similar objects from n object.
Optionally, above computer equipment can also carry out the program code of following steps:According to the one of each object
Or the value of multiple attributes filters out i similar objects from n object and included:The value of one or more attributes is fallen
The object entered to preset range is screened from n object as i object, wherein, preset range is according to i
Value determine.
Optionally, above computer equipment can also carry out the program code of following steps:According to the one of each object
Or the value of multiple attributes filters out i similar objects from n object and included:Corresponding one according to each object
Or the value of multiple same alike results calculates and obtains the corresponding clustering factor of each object difference;It is right from n according to clustering factor
I object is filtered out as in.
Optionally, above computer equipment can also carry out the program code of following steps:According to the one of each object
Or the value of multiple attributes filters out i similar objects from n object and included:Corresponding one according to each object
Or the value of multiple same alike results calculates and obtains the corresponding clustering factor of each object difference;According to clustering factor and one or
Multiple attributes filter out i object from n object.
Above computer equipment can perform the program code of following steps in the similarity processing method of application program:
The value of corresponding one or more attributes is distinguished according to each object in n object, is filtered out from n object
I object;Similarity is calculated two-by-two to i object.
It will appreciated by the skilled person that the structure shown in Figure 15 is only signal, computer equipment A can also
It is that smart mobile phone (such as Android phone, iOS mobile phones), tablet personal computer, palm PC and mobile Internet are set
The terminal devices such as standby (Mobile Internet Devices, MID), PAD.Figure 15 it does not fill to above-mentioned electronics
The structure put causes to limit.For example, computer equipment A may also include the component more or less than shown in Figure 15
(such as network interface, display device), or with the configuration different from shown in Figure 15.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can be with
Completed by program come the device-dependent hardware of command terminal, the program can be stored in a computer-readable storage medium
In matter, storage medium can include:Flash disk, read-only storage (Read-Only Memory, ROM), deposit at random
Take device (Random Access Memory, RAM), disk or CD etc..
Embodiment 4
Embodiments herein additionally provides a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium
It can be used for preserving the program code performed by the similarity processing method that above-described embodiment one is provided.
Alternatively, in the present embodiment, above-mentioned storage medium can be located in computer network Computer device cluster
In any one computer equipment.Above computer equipment can constitute the equipment of server or constitute to take
The equipment of business device cluster, or the equipment for constituting cloud computing.In other words, Cloud Server is it is also assumed that be a group meter
Machine equipment is calculated, only for user by the particular hardware for the equipment that need not be concerned about in these cloud computing servers
Constitute.Certainly, with the development of terminal computing capability, it is possible to which terminal can also be added among cloud computing, now,
Above-mentioned storage medium can also be located in any one mobile terminal in mobile terminal group.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
Design conditions are obtained, wherein, in the case where design conditions are satisfied, the object number of similarity two-by-two can be calculated
Maximum be k;I object is filtered out from n object according to design conditions, wherein, i is less than or equal to n, i
Less than or equal to k;Similarity is calculated two-by-two to i object.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
Design conditions include at least one of:It is similar for the resource for calculating similarity, the time for calculating similarity, calculating
The scale of degree
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
The value of corresponding one or more attributes is distinguished according to each object in n object, i are filtered out from n object
Object.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
The value of corresponding one or more attributes is distinguished according to each object in n object, i are filtered out from n object
Object includes:The value of one or more attributes is dropped into preset range object screened from n object as
I object, wherein, preset range is determined according to i value.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
Corresponding one or more property values are distinguished according to each object in n object, i are filtered out from n object individual right
As including:Each object difference is obtained according to the calculating of the value of the corresponding one or more same alike results of each object corresponding
Clustering factor;I object is filtered out from n object according to clustering factor
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
Filtering out i object from n object according to clustering factor includes:Distinguish corresponding clustering factor according to each object
Size each object is arranged;Continuous i object is selected from the n object arranged.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
Corresponding one or more property values are distinguished according to each object in n object, i are filtered out from n object individual right
As including:Each object difference is obtained according to the calculating of the value of the corresponding one or more same alike results of each object corresponding
Clustering factor;I object is filtered out from n object according to clustering factor and one or more attributes.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
Obtain the value that each object in n object distinguishes corresponding one or more attributes;According to the one of each object or many
The value of individual attribute filters out i similar objects from n object.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
I similar objects are filtered out from n object according to the value of one or more attributes of each object includes:By one
The object that the value of individual or multiple attributes drops into preset range is screened from n object as i object, wherein,
Preset range is determined according to i value.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
I similar objects are filtered out from n object according to the value of one or more attributes of each object includes:According to
The value of the corresponding one or more same alike results of each object, which is calculated, obtains the corresponding clustering factor of each object difference;Root
I object is filtered out from n object according to clustering factor.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
I similar objects are filtered out from n object according to the value of one or more attributes of each object includes:According to
The value of the corresponding one or more same alike results of each object, which is calculated, obtains the corresponding clustering factor of each object difference;Root
I object is filtered out from n object according to clustering factor and one or more attributes.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
The value of corresponding one or more attributes is distinguished according to each object in n object, i are filtered out from n object
Object;Similarity is calculated two-by-two to i object.
Above-mentioned the embodiment of the present application sequence number is for illustration only, and the quality of embodiment is not represented.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment
The part of detailed description, may refer to the associated description of other embodiment.
, can be by other in several embodiments provided herein, it should be understood that disclosed technology contents
Mode realize.Wherein, device embodiment described above is only schematical, such as described unit or module
Division, only a kind of division of logic function can have other dividing mode when actually realizing, such as multiple lists
Member or module or component can combine or be desirably integrated into another system, or some features can be ignored, or not hold
OK.Another, shown or discussed coupling or direct-coupling or communication connection each other can be by some
The INDIRECT COUPLING of interface, module or unit or communication connection, can be electrical or other forms.
The unit illustrated as separating component or module can be or may not be it is physically separate, as
The part that unit or module are shown can be or may not be physical location or module, you can with positioned at a place,
Or can also be distributed in multiple NEs or module.Can select according to the actual needs part therein or
Whole units or module realize the purpose of this embodiment scheme.
In addition, each functional unit or module in the application each embodiment can be integrated in a processing unit or mould
, can also two or more units or module collection in block or unit or module are individually physically present
In Cheng Yi unit or module.Above-mentioned integrated unit or module can both be realized in the form of hardware, can also
Realized in the form of SFU software functional unit or module.
If the integrated unit realized using in the form of SFU software functional unit and as independent production marketing or in use,
It can be stored in a computer read/write memory medium.Understood based on such, the technical scheme essence of the application
On all or part of the part that is contributed in other words to prior art or the technical scheme can be with software product
Form is embodied, and the computer software product is stored in a storage medium, including some instructions are to cause one
Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application
State all or part of step of method.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD
Etc. it is various can be with the medium of store program codes.
Described above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art
For member, on the premise of the application principle is not departed from, some improvements and modifications can also be made, these improve and moistened
Decorations also should be regarded as the protection domain of the application.
Claims (21)
1. a kind of similarity processing method, it is characterised in that including:
Design conditions are obtained, wherein, in the case where the design conditions are satisfied, it can calculate similar two-by-two
The maximum of the object number of degree is k;
I object is filtered out from n object according to the design conditions, wherein, i is less than or equal to n, and i is small
In equal to k;
Similarity is calculated two-by-two to the i object.
2. according to the method described in claim 1, it is characterised in that the design conditions include at least one of:With
In the resource for calculating similarity, the time for calculating similarity, the scale for calculating similarity.
3. method according to claim 1 or 2, it is characterised in that filter out the i from the n object
Individual object includes:
The value of corresponding one or more attributes is distinguished according to each object in the n object, from the n
I object is filtered out in object.
4. method according to claim 3, it is characterised in that right respectively according to each object in the n object
The value for the one or more attributes answered, from the n object filtering out i object includes:
The object that the value of one or more attributes is dropped into preset range screens work from the n object
For the i object, wherein, the preset range is determined according to the value of the i.
5. method according to claim 3, it is characterised in that right respectively according to each object in the n object
The one or more property values answered, from the n object filtering out i object includes:
Calculated according to the value of the corresponding one or more same alike results of each object and obtain each object point
Not corresponding clustering factor;
The i object is filtered out from the n object according to the clustering factor.
6. method according to claim 5, it is characterised in that according to the clustering factor from the n object
Filtering out the i object includes:
The size for distinguishing corresponding clustering factor according to each object is arranged each object;
The continuous i object is selected from the n object arranged.
7. method according to claim 3, it is characterised in that right respectively according to each object in the n object
The one or more property values answered, from the n object filtering out i object includes:
Calculated according to the value of the corresponding one or more same alike results of each object and obtain each object point
Not corresponding clustering factor;
The i are filtered out from the n object according to the clustering factor and one or more of attributes
Object.
8. a kind of similarity processing unit, it is characterised in that including:
First acquisition module, for obtaining design conditions, wherein, in the case where the design conditions are satisfied,
The maximum that the object number of similarity two-by-two can be calculated is k;
First screening module, for filtering out i object from n object according to the design conditions, wherein,
I is less than or equal to n, and i is less than or equal to k;
First computing module, for calculating similarity two-by-two to the i object.
9. device according to claim 8, it is characterised in that the design conditions include at least one of:With
In the resource for calculating similarity, the time for calculating similarity, the scale for calculating similarity.
10. device according to claim 8 or claim 9, it is characterised in that first screening module is used for according to described
Each object distinguishes the value of corresponding one or more attributes in n object, and i is filtered out from the n object
Individual object.
11. device according to claim 10, it is characterised in that first screening module is used for will be one or more
The object that the value of attribute drops into preset range is screened from the n object as the i object,
Wherein, the preset range is determined according to the value of the i.
12. device according to claim 10, it is characterised in that first screening module includes:
First computing unit, for being calculated according to the value of the corresponding one or more same alike results of each object
Obtain each object and distinguish corresponding clustering factor;
First screening unit is individual right for filtering out the i from the n object according to the clustering factor
As.
13. device according to claim 12, it is characterised in that first screening unit includes:
Arrangement units, for distinguishing the size of corresponding clustering factor according to each object to described each right
As being arranged;
Selecting unit, for selecting the continuous i object from the n object arranged.
14. device according to claim 10, it is characterised in that first screening module includes:
Second computing unit, for being calculated according to the value of the corresponding one or more same alike results of each object
Obtain each object and distinguish corresponding clustering factor;
Second screening unit, for right from the n according to the clustering factor and one or more of attributes
The i object is filtered out as in.
15. a kind of object screening technique, it is characterised in that including:
Obtain the value that each object in n object distinguishes corresponding one or more attributes;
I is filtered out from the n object according to the value of one or more attributes of each object individual similar
Object.
16. method according to claim 15, it is characterised in that according to one or more attributes of each object
Value the i similar objects filtered out from the n object include:
The object that the value of one or more attributes is dropped into preset range screens work from the n object
For the i object, wherein, the preset range is determined according to the value of the i.
17. method according to claim 15, it is characterised in that according to one or more attributes of each object
Value the i similar objects filtered out from the n object include:
Calculated according to the value of the corresponding one or more same alike results of each object and obtain each object point
Not corresponding clustering factor;
The i object is filtered out from the n object according to the clustering factor.
18. method according to claim 15, it is characterised in that according to one or more attributes of each object
Value the i similar objects filtered out from the n object include:
Calculated according to the value of the corresponding one or more same alike results of each object and obtain each object point
Not corresponding clustering factor;
The i are filtered out from the n object according to the clustering factor and one or more of attributes
Object.
19. a kind of similarity processing method, it is characterised in that including:
The value of corresponding one or more attributes is distinguished according to each object in n object, from the n object
In filter out i object;
Similarity is calculated two-by-two to the i object.
20. a kind of object screening plant, it is characterised in that including:
Second acquisition module, corresponding one or more attributes are distinguished for obtaining each object in n object
Value;
Second screening module, the value for one or more attributes according to each object is right from the n
I similar objects are filtered out as in.
21. a kind of similarity processing unit, it is characterised in that including:
3rd screening module, for distinguishing corresponding one or more attributes according to each object in n object
Value, i object is filtered out from the n object;
Second computing module, for calculating similarity two-by-two to the i object.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610174122.1A CN107229640A (en) | 2016-03-24 | 2016-03-24 | Similarity processing method, object screening technique and device |
TW106106682A TW201800966A (en) | 2016-03-24 | 2017-03-01 | Similarity processing method and object screening method and device |
PCT/CN2017/076424 WO2017162063A1 (en) | 2016-03-24 | 2017-03-13 | Similarity processing method and object screening method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610174122.1A CN107229640A (en) | 2016-03-24 | 2016-03-24 | Similarity processing method, object screening technique and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107229640A true CN107229640A (en) | 2017-10-03 |
Family
ID=59899348
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610174122.1A Pending CN107229640A (en) | 2016-03-24 | 2016-03-24 | Similarity processing method, object screening technique and device |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN107229640A (en) |
TW (1) | TW201800966A (en) |
WO (1) | WO2017162063A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190007A (en) * | 2018-07-20 | 2019-01-11 | 阿里巴巴集团控股有限公司 | Data analysing method and device |
CN111414949A (en) * | 2020-03-13 | 2020-07-14 | 杭州海康威视***技术有限公司 | Picture clustering method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722567A (en) * | 2012-05-30 | 2012-10-10 | 杭州遥指科技有限公司 | Method and device for screening in-station information |
CN103559262A (en) * | 2013-11-04 | 2014-02-05 | 北京邮电大学 | Community-based author and academic paper recommending system and recommending method |
CN103617192A (en) * | 2013-11-07 | 2014-03-05 | 北京奇虎科技有限公司 | Method and device for clustering data objects |
CN105074664A (en) * | 2013-02-11 | 2015-11-18 | 亚马逊科技公司 | Cost-minimizing task scheduler |
CN105184307A (en) * | 2015-07-27 | 2015-12-23 | 蚌埠医学院 | Medical field image semantic similarity matrix generation method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488789B (en) * | 2013-10-08 | 2017-08-18 | 百度在线网络技术(北京)有限公司 | Recommendation method, device and search engine |
CN104699725B (en) * | 2013-12-10 | 2018-10-09 | 阿里巴巴集团控股有限公司 | data search processing method and system |
CN104978553B (en) * | 2014-04-08 | 2019-05-28 | 腾讯科技(深圳)有限公司 | The method and device of image analysis |
-
2016
- 2016-03-24 CN CN201610174122.1A patent/CN107229640A/en active Pending
-
2017
- 2017-03-01 TW TW106106682A patent/TW201800966A/en unknown
- 2017-03-13 WO PCT/CN2017/076424 patent/WO2017162063A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722567A (en) * | 2012-05-30 | 2012-10-10 | 杭州遥指科技有限公司 | Method and device for screening in-station information |
CN105074664A (en) * | 2013-02-11 | 2015-11-18 | 亚马逊科技公司 | Cost-minimizing task scheduler |
CN103559262A (en) * | 2013-11-04 | 2014-02-05 | 北京邮电大学 | Community-based author and academic paper recommending system and recommending method |
CN103617192A (en) * | 2013-11-07 | 2014-03-05 | 北京奇虎科技有限公司 | Method and device for clustering data objects |
CN105184307A (en) * | 2015-07-27 | 2015-12-23 | 蚌埠医学院 | Medical field image semantic similarity matrix generation method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190007A (en) * | 2018-07-20 | 2019-01-11 | 阿里巴巴集团控股有限公司 | Data analysing method and device |
CN109190007B (en) * | 2018-07-20 | 2022-10-04 | 创新先进技术有限公司 | Data analysis method and device |
CN111414949A (en) * | 2020-03-13 | 2020-07-14 | 杭州海康威视***技术有限公司 | Picture clustering method and device, electronic equipment and storage medium |
CN111414949B (en) * | 2020-03-13 | 2023-06-27 | 杭州海康威视***技术有限公司 | Picture clustering method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2017162063A1 (en) | 2017-09-28 |
TW201800966A (en) | 2018-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106326248B (en) | The storage method and device of database data | |
CN109902708A (en) | A kind of recommended models training method and relevant apparatus | |
CN106228386A (en) | A kind of information-pushing method and device | |
CN106503006A (en) | The sort method and device of application App neutron applications | |
CN107545315A (en) | Order processing method and device | |
CN107895038A (en) | A kind of link prediction relation recommends method and device | |
CN106874355A (en) | The collaborative filtering method of social networks and user's similarity is incorporated simultaneously | |
CN111931053A (en) | Item pushing method and device based on clustering and matrix decomposition | |
CN107437095A (en) | Classification determines method and device | |
CN113761359B (en) | Data packet recommendation method, device, electronic equipment and storage medium | |
CN109784394A (en) | A kind of recognition methods, system and the terminal device of reproduction image | |
CN109087138A (en) | Data processing method and system, computer system and readable storage medium storing program for executing | |
CN112465533A (en) | Intelligent product selection method and device and computing equipment | |
CN108628721A (en) | Method for detecting abnormality, device, storage medium and the electronic device of user data value | |
CN113379530A (en) | User risk determination method and device and server | |
CN107248023A (en) | A kind of screening technique and device to mark enterprise list | |
CN109522919A (en) | A kind of data assessment method and device | |
CN106657062A (en) | Method and device for user identification | |
CN107229640A (en) | Similarity processing method, object screening technique and device | |
CN110288465A (en) | Object determines method and device, storage medium, electronic device | |
CN106503271A (en) | The intelligent shop site selection system of subspace Skyline inquiry under mobile Internet and cloud computing environment | |
CN106681803A (en) | Task scheduling method and server | |
CN110457387A (en) | A kind of method and relevant apparatus determining applied to user tag in network | |
CN109657950A (en) | Hierarchy Analysis Method, device, equipment and computer readable storage medium | |
CN107862412A (en) | A kind of data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171003 |
|
RJ01 | Rejection of invention patent application after publication |