CN107729935A - The recognition methods of similar pictures and device, server, storage medium - Google Patents
The recognition methods of similar pictures and device, server, storage medium Download PDFInfo
- Publication number
- CN107729935A CN107729935A CN201710945888.XA CN201710945888A CN107729935A CN 107729935 A CN107729935 A CN 107729935A CN 201710945888 A CN201710945888 A CN 201710945888A CN 107729935 A CN107729935 A CN 107729935A
- Authority
- CN
- China
- Prior art keywords
- picture
- hash
- code block
- hash code
- similar pictures
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 24
- 238000009472 formulation Methods 0.000 claims abstract description 17
- 239000000203 mixture Substances 0.000 claims abstract description 17
- 230000004044 response Effects 0.000 claims description 6
- 230000000903 blocking effect Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 8
- 230000003287 optical effect Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000005291 magnetic effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/758—Involving statistics of pixels or of feature values, e.g. histogram matching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Library & Information Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses a kind of recognition methods of similar pictures and device, server, storage medium.Wherein, this method includes:The low dimensional characteristic vector Hash codes of every width picture are calculated using the average hash algorithm of picture;It is at least two Hash code blocks according to preset rules universal formulation by the low dimensional characteristic vector Hash codes of every width picture, wherein, the corresponding location index of each Hash code block;In at least two Hash code blocks of every width picture, the picture that same position index has identical Hash code block is divided into same cluster, obtains multiple picture clusters;In each picture cluster, similar pictures are identified according to the distance of the Hash code block between each picture in addition to identical Hash code block corresponding to the same position indexes.The embodiment of the present invention reduces the computation complexity of similar pictures identification, reduces amount of calculation, realizes the efficient calculating of picture similarity.
Description
Technical field
The present embodiments relate to image processing techniques, more particularly to a kind of recognition methods of similar pictures and device, clothes
Business device, storage medium.
Background technology
With the continuous improvement of user and different type website interaction degree, user can upload the picture in oneself hand,
Cause website picture number rapid growth, a large amount of repeat and highly similar picture in same website be present.
The similar pictures between different pictures in mass picture are quickly recognized, can not only remove redundancy image data,
Reduce the storage overhead, and homogeneous or multifarious photo services can be provided the user according to user's request, as picture is searched
Rope.And existing similar pictures identification technology is fallen into a trap, the complexity of nomogram piece similarity is higher so that is identified in mass picture
Similar pictures need very big amount of calculation, without very strong real applicability.
The content of the invention
The embodiment of the present invention provides recognition methods and device, server, the storage medium of a kind of similar pictures, existing to solve
There is the problem of method complexity of identification similar pictures in technology is high, computationally intensive.
In a first aspect, the embodiments of the invention provide a kind of recognition methods of similar pictures, this method includes:
The low dimensional characteristic vector Hash codes of every width picture are calculated using the average hash algorithm of picture;
It is at least two Hash codes according to preset rules universal formulation by the low dimensional characteristic vector Hash codes of every width picture
Block, wherein, the corresponding location index of each Hash code block;
In at least two Hash code blocks of every width picture, same position index is existed to the picture of identical Hash code block
It is divided into same cluster, obtains multiple picture clusters;
In each picture cluster, in addition to the identical Hash code block according to corresponding between each picture except same position index
The distance of Hash code block identify similar pictures.
Further, every width picture includes at least two Hash codes that Hash code length is identical or differs
Block, and include coincidence or misaligned part between the Hash code block.
Further, methods described also includes:
Hash code block according to corresponding between each picture except same position index in addition to identical Hash code block away from
From calculating the similarity between picture in each picture cluster;
The similarity between the picture being calculated in whole picture clusters is counted, the similarity is exceeded to the figure of predetermined threshold value
Piece is identified as similar pictures.
Further, methods described also includes:
In response to the similar pictures searching request of Target Photo, the mesh of the average hash algorithm calculating Target Photo of picture is utilized
Mark low dimensional characteristic vector Hash codes;
According to the preset rules universal formulation it is at least two Hash code blocks by target low dimensional characteristic vector Hash codes,
Wherein, the corresponding location index of each Hash code block;
Target Photo is divided at least one Target Photo cluster in the multiple picture cluster, wherein, Target Photo
Identical Hash code block be present in same position index with Target Photo in the picture in cluster;
In at least one Target Photo cluster, the identical Hash according to corresponding to same position index is removed between each picture
The distance of Hash code block outside code block searches for the similar pictures of Target Photo.
Second aspect, the embodiment of the present invention additionally provide a kind of identification device of similar pictures, and the device includes:
Hash codes computing module, for be calculated using the average hash algorithm of picture the low dimensional feature of every width picture to
Measure Hash codes;
Piecemeal module, for by the low dimensional characteristic vector Hash codes of every width picture, being according to preset rules universal formulation
At least two Hash code blocks, wherein, the corresponding location index of each Hash code block;
Sub-clustering module, at least two Hash code blocks of every width picture, same position index to be existed into identical
The picture of Hash code block is put into same cluster, obtains multiple picture clusters;
First identification module, in each picture cluster, same position index to be corresponding according to being removed between each picture
Identical Hash code block outside the distance of Hash code block identify similar pictures.
Further, the piecemeal module is specifically used for the low dimensional characteristic vector Hash codes of every width picture, according to pre-
If regular universal formulation is to include at least two Hash code blocks that Hash code length is identical or differs, and the Hash codes
Include coincidence or misaligned part between block.
Further, described device also includes:
Computing module, in addition to the identical Hash code block according to corresponding between each picture except same position index
The distance of Hash code block, calculate the similarity between picture in each picture cluster;
Second identification module, it is for counting the similarity between the picture being calculated in whole picture clusters, this is similar
Degree is similar pictures beyond the picture recognition of predetermined threshold value.
Further, described device also includes search module, for searching for the similar pictures of Target Photo;
Search module specifically includes:
Target Hash codes computing unit, for the similar pictures searching request in response to Target Photo, it is averaged using picture
Hash algorithm calculates the target low dimensional characteristic vector Hash codes of Target Photo;
Blocking unit, for according to the preset rules universal formulation being at least by target low dimensional characteristic vector Hash codes
Two Hash code blocks, wherein, the corresponding location index of each Hash code block;
Sub-clustering unit, for Target Photo to be divided at least one Target Photo cluster in the multiple picture cluster,
Wherein, with Target Photo in same position index there is identical Hash code block in the picture in Target Photo cluster;
Search unit, at least one Target Photo cluster, being indexed according to the same position is removed between each picture
The distance of Hash code block outside corresponding identical Hash code block searches for the similar pictures of Target Photo.
The third aspect, the embodiment of the present invention additionally provide a kind of server, including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processing
Device realizes the recognition methods of similar pictures as described above.
Fourth aspect, the embodiment of the present invention additionally provide a kind of computer-readable recording medium, are stored thereon with computer
Program, the program realize the recognition methods of similar pictures as described above when being executed by processor.
The low dimensional characteristic vector of every width picture is calculated by using the average hash algorithm of picture for the embodiment of the present invention
Mass picture is carried out sub-clustering, then with the presence or absence of identical by Hash codes, the Hash code block indexed according to the same position after division
The distance of Hash code block in addition to the identical Hash code block according to corresponding between each picture except same position index identifies phase
Like picture, solve the problems, such as to identify that the method complexity of similar pictures is high, computationally intensive in the prior art, reduce similar diagram
The computation complexity of piece identification, reduces amount of calculation, realizes the efficient calculating of picture similarity.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the recognition methods for similar pictures that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of the recognition methods for similar pictures that the embodiment of the present invention two provides;
Fig. 3 is a kind of structural representation of the identification device for similar pictures that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural representation for server that the embodiment of the present invention four provides.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that in order to just
Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.
Embodiment one
Fig. 1 is a kind of flow chart of the recognition methods for similar pictures that the embodiment of the present invention one provides, and the present embodiment can fit
For identifying the situation of similar pictures, this method can be performed by the identification device of similar pictures, and the device can use soft
The mode of part and/or hardware is realized, such as is configured in server.As shown in figure 1, this method specifically includes:
Step S110, the low dimensional characteristic vector Hash codes of every width picture are calculated using the average hash algorithm of picture.
The identification of similar pictures is carried out in the picture of magnanimity, average hash algorithm (Average Hash, aHash) passes through
The size for comparing each pixel and all pixels point average value in the gray-scale map of every width picture conversion obtains every width picture
Characteristic vector Hash codes.It is exemplary, original picture can be zoomed to 8 × 8 size first, the picture after scaling is converted into
Gray-scale map, it can so obtain the low dimensional characteristic vector Hash codes of 64.The every width picture obtained using aHash algorithms it is low
Each in dimensional characteristics vector is 0 or 1.For similar picture, the Hash codes of generation are divided into multiple Hash
At least one piece is identical in code block.Explanation is needed exist for, the low latitudes feature of picture is being calculated using aHash algorithms
During vectorial Hash codes, the picture after scaling can not also be converted into gray-scale map, can be according to actual conditions on this
Configured.
The algorithm of generation picture Hash codes includes but is not limited to aHash algorithms, can be calculated according to being actually needed using other
Method, such as perceive hash algorithm (Perceptual Hash, pHash) or color histogram etc..
Step S120, it is at least according to preset rules universal formulation by the low dimensional characteristic vector Hash codes of every width picture
Two Hash code blocks, wherein, the corresponding location index of each Hash code block.
In the step, it is according to preset rules universal formulation respectively by the low dimensional characteristic vector Hash codes of each width picture
At least two Hash code blocks.Wherein preset rules are that user enters according to the actual judgement demand of oneself to the Hash codes of every width picture
Row piecemeal is preset, for example, obtained picture Hash codes have 64, then this Hash codes can be divided into 4 pieces, every piece of digit
For 16.If default be necessary to calculate picture when the Hash codes block number mesh differed between two width pictures is less than or equal to 3
Similarity, then it is effective when the Hash codes piecemeal number of every width picture being at least set to 4 pieces.Differed when between two width pictures
Hash codes block number when being more than 3 pieces, just It is not necessary to which the further similarity relatively between two width pictures, determines two width pictures
It is dissimilar.
In the step, optionally, every width picture includes at least two Hash that Hash code length is identical or differs
Code block, and include coincidence or misaligned part between Hash code block.In the above example, 64 Hash codes are divided into
4 Hash code blocks, can not also averagely be divided, i.e., the digit in this 4 Hash code blocks can be identical, can also differ;Also,
If intersection in 4 Hash code blocks be present, the total bit of 4 Hash code blocks is more than 64, if in 4 Hash code blocks
In the absence of intersection, then the total bit of 4 Hash code blocks is still equal to 64., can be with the division rule of Hash code block
Configure and preset according to being actually needed, unified to every width picture will be divided afterwards according to default rule.
It is exemplary for the specific partition process of Hash code block, 64 bitmap piece Hash codes obtained above are averagely drawn
It is divided into 4 Hash code block A, B, C and D, is not present overlaps each other, it is contemplated that the corresponding location index of each Hash code block,
The picture will be corresponding with 4 location indexs, if made using Hash code block A as index block, remaining Hash code block B, C and D
For an entirety, as Hash code block A key assignments, similarly, if using Hash code block B as index block, remaining Hash codes
Block A, C and D are as an entirety, as Hash code block B key assignments, the like share 4 kinds of combinations, i.e. the picture is corresponding
4 key-value pairs.In similar pictures identification process, take one of them 16 Hash code block accurately to be matched if appointing, work as sample
Picture library have in having altogether 2 34 powers (the similar individual Hash codes fingerprint in 1,000,000,000) (the corresponding pictures of each Hash fingerprint),
2 18 powers (value is 262144) individual candidate result is returned for each index Hash code block, substantially reduces and calculates different figures
The amount of calculation of the distance between the Hash code block of piece.
Exemplary, above-mentioned 64 bitmap piece Hash codes can also be carried out to non-average division, divide obtained Hash code block
Between intersection be present.For example, 64 bitmap piece Hash codes can be divided into 4 pieces first, optional 16 conducts one therein
Hash code block F, 4 Hash codes sub-blocks e, f, g and h are then divided into again by remaining 48 in picture Hash codes.It is each to breathe out
Uncommon numeral block is 12, can be combined using Hash code block F with Hash codes sub-block e as an index block, and this index block is 28,
Key assignments of the remaining Hash codes as this index block;Can also analogize by Hash code block F and Hash codes sub-block f, Hash code block F with
Hash codes sub-block g or Hash code block F combines with Hash codes sub-block h is used as an index block, shares four kinds of combinations.It is final right
4X4 kind combinations, i.e. 4X4 key-value pair are shared in this non-equal division methodology one.In similar pictures identification process, to 16 ropes
Draw block parallel search, be not in the situation that Hash code block is omitted.In addition, it is divided into 4 Kazakhstan compared to 64 bitmap piece Hash codes
Uncommon code block, each index key correspond to the situation of 16, carry out matched and searched with the 28 position index blocks that mode obtains non-respectively, return
The number of results returned is less, i.e., the amount of calculation of the distance between different picture Hash code blocks is less.
Step S130, at least two Hash code blocks of every width picture, identical Hash is present into same position index
The picture of code block is divided into same cluster, obtains multiple picture clusters.
In the step, the Hash code block of more every width picture can be according to each in the Hash code block that same position indexes
Partial value is compared, and the value identical Hash code block of each part is considered as into identical Hash code block, then there will be phase
The picture of same Hash code block is divided into same cluster, finally gives multiple picture clusters.After mass picture is clustered, each
Picture number in cluster is relatively fewer, so that the amount of calculation of similarity minimizes between similar pictures.
Step S140, in each picture cluster, the identical Kazakhstan according to corresponding to same position index is removed between each picture
The distance of Hash code block outside uncommon code block identifies similar pictures.
Specifically, after due to mass picture sub-clustering, the picture number of each picture cluster is relative to be reduced, and can reach raising
Calculate the effect of performance, during Similarity Measure is carried out, remove same position between each picture index corresponding to it is identical
Hash code block, respectively using remaining Hash code block in every width picture as an entirety, calculate be left between each picture accordingly
Hash code block distance to identify similar pictures, further improve calculate performance.Wherein, the distance being calculated is smaller, table
Bright two width picture is more similar, i.e., similarity is bigger;Distance is bigger, shows that two width picture differences are bigger, i.e., similarity is smaller.
Optionally, this method also includes:
Hash code block according to corresponding between each picture except same position index in addition to identical Hash code block away from
From calculating the similarity between picture in each picture cluster;
The similarity between the picture being calculated in whole picture clusters is counted, the similarity is exceeded to the figure of predetermined threshold value
Piece is identified as similar pictures.
Wherein, picture sub-clustering is carried out according to the identical Hash code block between picture, there may be phase in different picture clusters
Same picture.Exemplary, picture A and picture B are in multiple picture clusters simultaneously, calculate the picture phase in each picture cluster
Like degree, it will obtain multiple picture A and picture B Similarity value, count the similar of the picture that is calculated in whole picture clusters
After degree, a picture A and picture B Similarity value can be obtained, so as to reach the purpose of duplicate removal simplification.
The low dimensional characteristic vector of every width picture is calculated by using the average hash algorithm of picture for the embodiment of the present invention
Hash codes, the Hash code block indexed using similar salted hash Salted according to the same position after division is with the presence or absence of identical, by magnanimity
Picture carries out sub-clustering, then the Hash in addition to the identical Hash code block according to corresponding between each picture except same position index
The distance identification similar pictures of code block, solve and identify that the method complexity of similar pictures is high, computationally intensive in the prior art
Problem, the computation complexity of similar pictures identification is reduced, amount of calculation is reduced, realizes the efficient calculating of picture similarity.
Embodiment two
Fig. 2 is a kind of flow chart of the recognition methods for similar pictures that the embodiment of the present invention two provides, and the present embodiment is upper
State and further optimize on the basis of embodiment.As shown in Fig. 2 this method specifically includes:
Step S210, in response to the similar pictures searching request of Target Photo, mesh is calculated using the average hash algorithm of picture
Mark on a map the target low dimensional characteristic vector Hash codes of piece.
In the step, specifically, user inputs Target Photo to be searched on webpage or application software, server receives
Target Photo, the similar pictures searching request of Target Photo is responded, the mesh of Target Photo is calculated using the average hash algorithm of picture
Mark low dimensional characteristic vector Hash codes.
Step S220, it is at least two Hash according to preset rules universal formulation by target low dimensional characteristic vector Hash codes
Code block, wherein, the corresponding location index of each Hash code block.
Wherein, dividing the obtained length of each target low dimensional characteristic vector Hash code block can be with identical or not phase
Together, and between Hash code block coincidence or misaligned part are included.
Step S230, Target Photo is divided at least one Target Photo cluster in obtained multiple picture clusters, its
In, identical Hash code block be present in same position index with Target Photo in the picture in Target Photo cluster.
Step S240, at least one Target Photo cluster, according to being removed between each picture, same position index is corresponding
Identical Hash code block outside the distance of Hash code block search for the similar pictures of Target Photo.
Specifically, calculate between Target Photo and picture in each picture cluster except identical corresponding to same position index
The distance between Hash code block outside Hash code block, the distance being calculated is smaller, and corresponding picture similarity is bigger, will
The picture recognition that Similarity value exceedes threshold value is similar pictures, and server returns higher with Target Photo similarity after the completion of search
Picture, and show on user's webpage or application software the search result of similar pictures.Wherein, similarity threshold is that user is pre-
First set, for example, user sets similarity threshold as 98%, then more than or equal to 98% in the picture Similarity value being calculated
Picture will be identified that similar pictures.
The low dimensional characteristic vector of Target Photo is calculated by using the average hash algorithm of picture for the embodiment of the present invention
Hash codes, piecemeal and sub-clustering, Ran Hougen are carried out to the low dimensional characteristic vector Hash codes of Target Photo using similar salted hash Salted
According to the Hash code block removed between picture in each Target Photo cluster outside identical Hash code block corresponding to same position index
Distance identification Target Photo similar pictures, it is achieved thereby that in mass picture similar pictures fast search.
Embodiment three
Fig. 3 be the embodiment of the present invention three provide a kind of similar pictures identification device structural representation, the present embodiment
It is applicable to identify the situation of similar pictures.The device that the present embodiment is provided can perform what any embodiment of the present invention was provided
The recognition methods of similar pictures, possess the corresponding functional module of execution method and beneficial effect.
As shown in figure 3, the identification device of the similar pictures of the present embodiment includes Hash codes computing module 310, piecemeal module
320th, the identification module 340 of sub-clustering module 330 and first.Wherein:
Hash codes computing module 310, the low dimensional for every width picture to be calculated using the average hash algorithm of picture are special
Levy vectorial Hash codes.
Piecemeal module 320, for by the low dimensional characteristic vector Hash codes of every width picture, according to preset rules universal formulation
For at least two Hash code blocks, wherein, the corresponding location index of each Hash code block.
Further, piecemeal module 320 is specifically used for the low dimensional characteristic vector Hash codes of every width picture, according to default
Regular universal formulation is to include at least two Hash code blocks that Hash code length is identical or differs, and between Hash code block
Including coincidence or misaligned part.
Sub-clustering module 330, at least two Hash code blocks of every width picture, same position index being present identical
The picture of Hash code block be put into same cluster, obtain multiple picture clusters.
First identification module 340, in each picture cluster, according to removing same position index pair between each picture
The distance of Hash code block outside the identical Hash code block answered identifies similar pictures.
Optionally, the device also includes computing module and the second identification module, wherein:
Computing module, in addition to the identical Hash code block according to corresponding between each picture except same position index
The distance of Hash code block, calculate the similarity between picture in each picture cluster.
Second identification module, it is for counting the similarity between the picture being calculated in whole picture clusters, this is similar
Degree is similar pictures beyond the picture recognition of predetermined threshold value.
Further, the device also includes search module, for searching for the similar pictures of Target Photo.Wherein,
Search module specifically includes:
Target Hash codes computing unit, for the similar pictures searching request in response to Target Photo, it is averaged using picture
Hash algorithm calculates the target low dimensional characteristic vector Hash codes of Target Photo;
Blocking unit, for by target low dimensional characteristic vector Hash codes according to preset rules universal formulation be at least two
Hash code block, wherein, the corresponding location index of each Hash code block;
Sub-clustering unit, at least one Target Photo cluster being divided into Target Photo in obtained multiple picture clusters
In, wherein, identical Hash code block be present in same position index with Target Photo in the picture in Target Photo cluster;
Search unit, at least one Target Photo cluster, being indexed according to the same position is removed between each picture
The distance of Hash code block outside corresponding identical Hash code block searches for the similar pictures of Target Photo.
The low dimensional characteristic vector of every width picture is calculated by using the average hash algorithm of picture for the embodiment of the present invention
Mass picture is carried out sub-clustering, then with the presence or absence of identical by Hash codes, the Hash code block indexed according to the same position after division
The distance identification of Hash code block outside the identical Hash code block according to corresponding to same position index is removed between each picture is similar
Picture, solve the problems, such as to identify that the method complexity of similar pictures is high, computationally intensive in the prior art, reduce similar pictures
The computation complexity of identification, reduces amount of calculation, realizes the efficient calculating of picture similarity, realizes the quick of similar pictures
Search.
Example IV
Fig. 4 is a kind of structural representation for server that the embodiment of the present invention four provides.Fig. 4 is shown suitable for being used for realizing
The block diagram of the exemplary servers 12 of embodiment of the present invention.The server 12 that Fig. 4 is shown is only an example, should not be to this
The function and use range of inventive embodiments bring any restrictions.
As shown in figure 4, server 12 is showed in the form of universal computing device.The component of server 12 can be included but not
It is limited to:One or more processor 16, storage device 28, connection different system component (including storage device 28 and processor
16) bus 18.
Bus 18 represents the one or more in a few class bus structures, including storage device bus or storage device control
Device, peripheral bus, graphics acceleration port, processor or total using the local of any bus structures in a variety of bus structures
Line.For example, these architectures include but is not limited to industry standard architecture (Industry Subversive
Alliance, ISA) bus, MCA (Micro Channel Architecture, MAC) bus is enhanced
Isa bus, VESA (Video Electronics Standards Association, VESA) local are total
Line and periphery component interconnection (Peripheral Component Interconnect, PCI) bus.
Server 12 typically comprises various computing systems computer-readable recording medium.These media can be it is any being capable of being serviced
The usable medium that device 12 accesses, including volatibility and non-volatile media, moveable and immovable medium.
Storage device 28 can include the computer system readable media of form of volatile memory, such as arbitrary access is deposited
Reservoir (Random Access Memory, RAM) 30 and/or cache memory 32.Server 12 may further include
Other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, storage system 34
It can be used for reading and writing immovable, non-volatile magnetic media (Fig. 4 is not shown, is commonly referred to as " hard disk drive ").Although Fig. 4
Not shown in, the disc driver for being read and write to may move non-volatile magnetic disk (such as " floppy disk ") can be provided, and it is right
Removable anonvolatile optical disk, such as read-only optical disc (Compact Disc Read-Only Memory, CD-ROM), numeral regards
Disk (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical mediums) read-write disc drives
Device.In these cases, each driver can be connected by one or more data media interfaces with bus 18.Storage dress
At least one program product can be included by putting 28, and the program product has one group of (for example, at least one) program module, these journeys
Sequence module is configured to perform the function of various embodiments of the present invention.
Program/utility 40 with one group of (at least one) program module 42, can be stored in such as storage device
In 28, such program module 42 include but is not limited to operating system, one or more application program, other program modules with
And routine data, the realization of network environment may be included in each or certain combination in these examples.Program module 42 is logical
Often perform the function and/or method in embodiment described in the invention.
Server 12 can also be logical with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.)
Letter, can also enable a user to the equipment communication interacted with the server 12 with one or more, and/or with causing the server
12 any equipment (such as network interface card, the modem etc.) communications that can be communicated with one or more of the other computing device.
This communication can be carried out by input/output (I/O) interface 22.Also, server 12 can also pass through network adapter 20
With one or more network (such as LAN (Local Area Network, LAN), wide area network (Wide Area
Network, WAN) and/or public network, such as internet) communication.As shown in figure 4, network adapter 20 by bus 18 with
Other modules communication of server 12.It should be understood that although not shown in the drawings, can combine server 12 uses other hardware
And/or software module, include but is not limited to:Microcode, device driver, redundant processor, external disk drive array, disk
Array (Redundant Arrays of Independent Disks, RAID) system, tape drive and data backup is deposited
Storage system etc..
Processor 16 is stored in the program in storage device 28 by operation, so as to perform various function application and data
Processing, such as realize the recognition methods for the similar pictures that the embodiment of the present invention is provided.
Embodiment five
The embodiment of the present invention five additionally provides a kind of computer-readable recording medium, is stored thereon with computer program, should
The recognition methods of the similar pictures provided such as the embodiment of the present invention is provided when program is executed by processor.
The computer-readable storage medium of the embodiment of the present invention, any of one or more computer-readable media can be used
Combination.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any combination above.The more specifically example (non exhaustive list) of computer-readable recording medium includes:Tool
There are the electrical connections of one or more wires, portable computer diskette, hard disk, random access memory (RAM), read-only storage
(ROM), erasable programmable read only memory (Erasable Programmable Read Only Memory, EPROM, or
Flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device or above-mentioned
Any appropriate combination.In this document, computer-readable recording medium can be any includes or tangible Jie of storage program
Matter, the program can be commanded the either device use or in connection of execution system, device.
Computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for
By instruction execution system, device either device use or program in connection.
The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but it is unlimited
In wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.
It can be write with one or more programming languages or its combination for performing the computer that operates of the present invention
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
Also include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
Fully perform, partly perform on the user computer on the user computer, the software kit independent as one performs, portion
Divide and partly perform or performed completely on remote computer or server on the remote computer on the user computer.
Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or
Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as carried using Internet service
Pass through Internet connection for business).
Pay attention to, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that
The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes,
Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention
It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also
Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.
Claims (10)
- A kind of 1. recognition methods of similar pictures, it is characterised in that including:The low dimensional characteristic vector Hash codes of every width picture are calculated using the average hash algorithm of picture;It is at least two Hash code blocks according to preset rules universal formulation by the low dimensional characteristic vector Hash codes of every width picture, Wherein, the corresponding location index of each Hash code block;In at least two Hash code blocks of every width picture, the picture that same position index is existed to identical Hash code block divides Into same cluster, multiple picture clusters are obtained;In each picture cluster, the Kazakhstan in addition to the identical Hash code block according to corresponding between each picture except same position index The distance of uncommon code block identifies similar pictures.
- 2. according to the method for claim 1, it is characterised in that every width picture include Hash code length it is identical or At least two Hash code blocks differed, and include coincidence or misaligned part between the Hash code block.
- 3. method according to claim 1 or 2, it is characterised in that methods described also includes:The distance of Hash code block in addition to the identical Hash code block according to corresponding between each picture except same position index, meter Calculate the similarity between picture in each picture cluster;The similarity between the picture being calculated in whole picture clusters is counted, picture of the similarity beyond predetermined threshold value is known Wei not similar pictures.
- 4. method according to claim 1 or 2, it is characterised in that methods described also includes:In response to the similar pictures searching request of Target Photo, the target for calculating Target Photo using the average hash algorithm of picture is low Dimensional characteristics vector Hash codes;According to the preset rules universal formulation it is at least two Hash code blocks by target low dimensional characteristic vector Hash codes, its In, the corresponding location index of each Hash code block;Target Photo is divided at least one Target Photo cluster in the multiple picture cluster, wherein, in Target Photo cluster Picture and Target Photo identical Hash code block be present in same position index;In at least one Target Photo cluster, the identical Hash code block according to corresponding to same position index is removed between each picture Outside the distance of Hash code block search for the similar pictures of Target Photo.
- A kind of 5. identification device of similar pictures, it is characterised in that including:Hash codes computing module, the low dimensional characteristic vector for every width picture to be calculated using the average hash algorithm of picture are breathed out Uncommon code;Piecemeal module, for being at least according to preset rules universal formulation by the low dimensional characteristic vector Hash codes of every width picture Two Hash code blocks, wherein, the corresponding location index of each Hash code block;Sub-clustering module, at least two Hash code blocks of every width picture, same position index to be existed into identical Hash The picture of code block is put into same cluster, obtains multiple picture clusters;First identification module, in each picture cluster, the phase according to corresponding to same position index is removed between each picture Similar pictures are identified with the distance of the Hash code block outside Hash code block.
- 6. device according to claim 5, it is characterised in that the piecemeal module is specifically used for the low-dimensional of every width picture Spend characteristic vector Hash codes, according to preset rules universal formulation be include Hash code length it is identical or differ at least two Hash code block, and include coincidence or misaligned part between the Hash code block.
- 7. the device according to claim 5 or 6, it is characterised in that described device also includes:Computing module, for the Hash in addition to the identical Hash code block according to corresponding between each picture except same position index The distance of code block, calculate the similarity between picture in each picture cluster;Second identification module, for counting the similarity between the picture being calculated in whole picture clusters, the similarity is surpassed The picture recognition for going out predetermined threshold value is similar pictures.
- 8. the device according to claim 5 or 6, it is characterised in that described device also includes search module, for searching for mesh Mark on a map the similar pictures of piece;Search module specifically includes:Target Hash codes computing unit, for the similar pictures searching request in response to Target Photo, utilize the average Hash of picture Algorithm calculates the target low dimensional characteristic vector Hash codes of Target Photo;Blocking unit, for by target low dimensional characteristic vector Hash codes according to the preset rules universal formulation be at least two Hash code block, wherein, the corresponding location index of each Hash code block;Sub-clustering unit, for Target Photo to be divided at least one Target Photo cluster in the multiple picture cluster, wherein, Identical Hash code block be present in same position index with Target Photo in the picture in Target Photo cluster;Search unit, at least one Target Photo cluster, same position index to be corresponding according to being removed between each picture Identical Hash code block outside the distance of Hash code block search for the similar pictures of Target Photo.
- A kind of 9. server, it is characterised in that including:One or more processors;Storage device, for storing one or more programs,When one or more of programs are by one or more of computing devices so that one or more of processors are real The now recognition methods of the similar pictures as described in any in claim 1-4.
- 10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The recognition methods of the similar pictures as described in any in claim 1-4 is realized during execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710945888.XA CN107729935B (en) | 2017-10-12 | 2017-10-12 | The recognition methods of similar pictures and device, server, storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710945888.XA CN107729935B (en) | 2017-10-12 | 2017-10-12 | The recognition methods of similar pictures and device, server, storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107729935A true CN107729935A (en) | 2018-02-23 |
CN107729935B CN107729935B (en) | 2019-11-12 |
Family
ID=61210968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710945888.XA Active CN107729935B (en) | 2017-10-12 | 2017-10-12 | The recognition methods of similar pictures and device, server, storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107729935B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108536769A (en) * | 2018-03-22 | 2018-09-14 | 深圳市安软慧视科技有限公司 | Image analysis method, searching method and device, computer installation and storage medium |
CN108595710A (en) * | 2018-05-11 | 2018-09-28 | 杨晓春 | A kind of quick mass picture De-weight method |
CN110399511A (en) * | 2019-07-23 | 2019-11-01 | 中南民族大学 | Image cache method, equipment, storage medium and device based on Redis |
CN110490250A (en) * | 2019-08-19 | 2019-11-22 | 广州虎牙科技有限公司 | A kind of acquisition methods and device of artificial intelligence training set |
CN111079757A (en) * | 2018-10-19 | 2020-04-28 | 北京奇虎科技有限公司 | Clothing attribute identification method and device and electronic equipment |
CN111078914A (en) * | 2019-12-18 | 2020-04-28 | 书行科技(北京)有限公司 | Method and device for detecting repeated pictures |
CN111368122A (en) * | 2020-02-14 | 2020-07-03 | 深圳壹账通智能科技有限公司 | Method and device for removing duplicate pictures |
CN111506756A (en) * | 2019-01-30 | 2020-08-07 | 北京京东尚科信息技术有限公司 | Similar picture searching method and system, electronic device and storage medium |
CN111522989A (en) * | 2020-07-06 | 2020-08-11 | 南京梦饷网络科技有限公司 | Method, computing device, and computer storage medium for image retrieval |
EP3767483A4 (en) * | 2018-03-12 | 2021-12-08 | Tencent Technology (Shenzhen) Company Limited | Method, device, system, and server for image retrieval, and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819582A (en) * | 2012-07-26 | 2012-12-12 | 华数传媒网络有限公司 | Quick searching method for mass images |
CN103984776A (en) * | 2014-06-05 | 2014-08-13 | 北京奇虎科技有限公司 | Repeated image identification method and image search duplicate removal method and device |
CN104112284A (en) * | 2013-04-22 | 2014-10-22 | 阿里巴巴集团控股有限公司 | Method and equipment for detecting similarity of images |
-
2017
- 2017-10-12 CN CN201710945888.XA patent/CN107729935B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819582A (en) * | 2012-07-26 | 2012-12-12 | 华数传媒网络有限公司 | Quick searching method for mass images |
CN104112284A (en) * | 2013-04-22 | 2014-10-22 | 阿里巴巴集团控股有限公司 | Method and equipment for detecting similarity of images |
CN103984776A (en) * | 2014-06-05 | 2014-08-13 | 北京奇虎科技有限公司 | Repeated image identification method and image search duplicate removal method and device |
Non-Patent Citations (2)
Title |
---|
YANG B,等: "Block mean value based image perceptual Hashing", 《PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON IIH-MSP》 * |
辰辰沉沉沉: "较大规模图片 使用phash去重", 《HTTPS://WWW.JIANSHU.COM/P/C87F6F69D51F》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11347787B2 (en) | 2018-03-12 | 2022-05-31 | Tencent Technology (Shenzhen) Company Limited | Image retrieval method and apparatus, system, server, and storage medium |
EP3767483A4 (en) * | 2018-03-12 | 2021-12-08 | Tencent Technology (Shenzhen) Company Limited | Method, device, system, and server for image retrieval, and storage medium |
CN108536769A (en) * | 2018-03-22 | 2018-09-14 | 深圳市安软慧视科技有限公司 | Image analysis method, searching method and device, computer installation and storage medium |
CN108595710B (en) * | 2018-05-11 | 2021-07-13 | 杨晓春 | Rapid massive picture de-duplication method |
CN108595710A (en) * | 2018-05-11 | 2018-09-28 | 杨晓春 | A kind of quick mass picture De-weight method |
CN111079757A (en) * | 2018-10-19 | 2020-04-28 | 北京奇虎科技有限公司 | Clothing attribute identification method and device and electronic equipment |
CN111506756B (en) * | 2019-01-30 | 2024-05-17 | 北京京东尚科信息技术有限公司 | Method and system for searching similar pictures, electronic equipment and storage medium |
CN111506756A (en) * | 2019-01-30 | 2020-08-07 | 北京京东尚科信息技术有限公司 | Similar picture searching method and system, electronic device and storage medium |
CN110399511A (en) * | 2019-07-23 | 2019-11-01 | 中南民族大学 | Image cache method, equipment, storage medium and device based on Redis |
CN110490250A (en) * | 2019-08-19 | 2019-11-22 | 广州虎牙科技有限公司 | A kind of acquisition methods and device of artificial intelligence training set |
CN111078914B (en) * | 2019-12-18 | 2023-04-18 | 书行科技(北京)有限公司 | Method and device for detecting repeated pictures |
CN111078914A (en) * | 2019-12-18 | 2020-04-28 | 书行科技(北京)有限公司 | Method and device for detecting repeated pictures |
CN111368122A (en) * | 2020-02-14 | 2020-07-03 | 深圳壹账通智能科技有限公司 | Method and device for removing duplicate pictures |
CN111368122B (en) * | 2020-02-14 | 2022-09-30 | 深圳壹账通智能科技有限公司 | Method and device for removing duplicate pictures |
CN111522989A (en) * | 2020-07-06 | 2020-08-11 | 南京梦饷网络科技有限公司 | Method, computing device, and computer storage medium for image retrieval |
Also Published As
Publication number | Publication date |
---|---|
CN107729935B (en) | 2019-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107729935B (en) | The recognition methods of similar pictures and device, server, storage medium | |
CN108205655B (en) | Key point prediction method and device, electronic equipment and storage medium | |
CN108229419B (en) | Method and apparatus for clustering images | |
US11727053B2 (en) | Entity recognition from an image | |
CN108280477B (en) | Method and apparatus for clustering images | |
CN108197532A (en) | The method, apparatus and computer installation of recognition of face | |
WO2013155417A2 (en) | Data coreset compression | |
CN113892113A (en) | Human body posture estimation method and device | |
WO2022041188A1 (en) | Accelerator for neural network, acceleration method and device, and computer storage medium | |
CN113722409A (en) | Method and device for determining spatial relationship, computer equipment and storage medium | |
WO2023282847A1 (en) | Detecting objects in a video using attention models | |
JP2024508867A (en) | Image clustering method, device, computer equipment and computer program | |
US9928408B2 (en) | Signal processing | |
CN112966687B (en) | Image segmentation model training method and device and communication equipment | |
CN110619253B (en) | Identity recognition method and device | |
CN110717405B (en) | Face feature point positioning method, device, medium and electronic equipment | |
CN111915689B (en) | Method, apparatus, electronic device, and computer-readable medium for generating an objective function | |
CN114639143B (en) | Portrait archiving method, device and storage medium based on artificial intelligence | |
CN114972146A (en) | Image fusion method and device based on generation countermeasure type double-channel weight distribution | |
CN113721240A (en) | Target association method and device, electronic equipment and storage medium | |
CN113051406A (en) | Character attribute prediction method, device, server and readable storage medium | |
CN110969651B (en) | 3D depth of field estimation method and device and terminal equipment | |
WO2022044104A1 (en) | Image matching apparatus, control method, and non-transitory computer-readable storage medium | |
CN112966606B (en) | Image recognition method, related device and computer program product | |
CN114253992A (en) | Data aggregation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231113 Address after: 311422 Room 125, 1st Floor, Building 197-2, Jiulong Avenue, Yinhu Street, Fuyang District, Hangzhou City, Zhejiang Province Patentee after: Hangzhou Dabei Biotechnology Co.,Ltd. Address before: 310019 Room 204, building A12, No.9 Jiusheng Road, Jianggan District, Hangzhou City, Zhejiang Province Patentee before: HANGZHOU BEIGOU TECHNOLOGY CO.,LTD. |
|
TR01 | Transfer of patent right |