CN107729935A - The recognition methods of similar pictures and device, server, storage medium - Google Patents

The recognition methods of similar pictures and device, server, storage medium Download PDF

Info

Publication number
CN107729935A
CN107729935A CN201710945888.XA CN201710945888A CN107729935A CN 107729935 A CN107729935 A CN 107729935A CN 201710945888 A CN201710945888 A CN 201710945888A CN 107729935 A CN107729935 A CN 107729935A
Authority
CN
China
Prior art keywords
picture
hash
code block
hash code
similar pictures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710945888.XA
Other languages
Chinese (zh)
Other versions
CN107729935B (en
Inventor
高增
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dabei Biotechnology Co ltd
Original Assignee
Hangzhou Buy Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Buy Technology Co Ltd filed Critical Hangzhou Buy Technology Co Ltd
Priority to CN201710945888.XA priority Critical patent/CN107729935B/en
Publication of CN107729935A publication Critical patent/CN107729935A/en
Application granted granted Critical
Publication of CN107729935B publication Critical patent/CN107729935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a kind of recognition methods of similar pictures and device, server, storage medium.Wherein, this method includes:The low dimensional characteristic vector Hash codes of every width picture are calculated using the average hash algorithm of picture;It is at least two Hash code blocks according to preset rules universal formulation by the low dimensional characteristic vector Hash codes of every width picture, wherein, the corresponding location index of each Hash code block;In at least two Hash code blocks of every width picture, the picture that same position index has identical Hash code block is divided into same cluster, obtains multiple picture clusters;In each picture cluster, similar pictures are identified according to the distance of the Hash code block between each picture in addition to identical Hash code block corresponding to the same position indexes.The embodiment of the present invention reduces the computation complexity of similar pictures identification, reduces amount of calculation, realizes the efficient calculating of picture similarity.

Description

The recognition methods of similar pictures and device, server, storage medium
Technical field
The present embodiments relate to image processing techniques, more particularly to a kind of recognition methods of similar pictures and device, clothes Business device, storage medium.
Background technology
With the continuous improvement of user and different type website interaction degree, user can upload the picture in oneself hand, Cause website picture number rapid growth, a large amount of repeat and highly similar picture in same website be present.
The similar pictures between different pictures in mass picture are quickly recognized, can not only remove redundancy image data, Reduce the storage overhead, and homogeneous or multifarious photo services can be provided the user according to user's request, as picture is searched Rope.And existing similar pictures identification technology is fallen into a trap, the complexity of nomogram piece similarity is higher so that is identified in mass picture Similar pictures need very big amount of calculation, without very strong real applicability.
The content of the invention
The embodiment of the present invention provides recognition methods and device, server, the storage medium of a kind of similar pictures, existing to solve There is the problem of method complexity of identification similar pictures in technology is high, computationally intensive.
In a first aspect, the embodiments of the invention provide a kind of recognition methods of similar pictures, this method includes:
The low dimensional characteristic vector Hash codes of every width picture are calculated using the average hash algorithm of picture;
It is at least two Hash codes according to preset rules universal formulation by the low dimensional characteristic vector Hash codes of every width picture Block, wherein, the corresponding location index of each Hash code block;
In at least two Hash code blocks of every width picture, same position index is existed to the picture of identical Hash code block It is divided into same cluster, obtains multiple picture clusters;
In each picture cluster, in addition to the identical Hash code block according to corresponding between each picture except same position index The distance of Hash code block identify similar pictures.
Further, every width picture includes at least two Hash codes that Hash code length is identical or differs Block, and include coincidence or misaligned part between the Hash code block.
Further, methods described also includes:
Hash code block according to corresponding between each picture except same position index in addition to identical Hash code block away from From calculating the similarity between picture in each picture cluster;
The similarity between the picture being calculated in whole picture clusters is counted, the similarity is exceeded to the figure of predetermined threshold value Piece is identified as similar pictures.
Further, methods described also includes:
In response to the similar pictures searching request of Target Photo, the mesh of the average hash algorithm calculating Target Photo of picture is utilized Mark low dimensional characteristic vector Hash codes;
According to the preset rules universal formulation it is at least two Hash code blocks by target low dimensional characteristic vector Hash codes, Wherein, the corresponding location index of each Hash code block;
Target Photo is divided at least one Target Photo cluster in the multiple picture cluster, wherein, Target Photo Identical Hash code block be present in same position index with Target Photo in the picture in cluster;
In at least one Target Photo cluster, the identical Hash according to corresponding to same position index is removed between each picture The distance of Hash code block outside code block searches for the similar pictures of Target Photo.
Second aspect, the embodiment of the present invention additionally provide a kind of identification device of similar pictures, and the device includes:
Hash codes computing module, for be calculated using the average hash algorithm of picture the low dimensional feature of every width picture to Measure Hash codes;
Piecemeal module, for by the low dimensional characteristic vector Hash codes of every width picture, being according to preset rules universal formulation At least two Hash code blocks, wherein, the corresponding location index of each Hash code block;
Sub-clustering module, at least two Hash code blocks of every width picture, same position index to be existed into identical The picture of Hash code block is put into same cluster, obtains multiple picture clusters;
First identification module, in each picture cluster, same position index to be corresponding according to being removed between each picture Identical Hash code block outside the distance of Hash code block identify similar pictures.
Further, the piecemeal module is specifically used for the low dimensional characteristic vector Hash codes of every width picture, according to pre- If regular universal formulation is to include at least two Hash code blocks that Hash code length is identical or differs, and the Hash codes Include coincidence or misaligned part between block.
Further, described device also includes:
Computing module, in addition to the identical Hash code block according to corresponding between each picture except same position index The distance of Hash code block, calculate the similarity between picture in each picture cluster;
Second identification module, it is for counting the similarity between the picture being calculated in whole picture clusters, this is similar Degree is similar pictures beyond the picture recognition of predetermined threshold value.
Further, described device also includes search module, for searching for the similar pictures of Target Photo;
Search module specifically includes:
Target Hash codes computing unit, for the similar pictures searching request in response to Target Photo, it is averaged using picture Hash algorithm calculates the target low dimensional characteristic vector Hash codes of Target Photo;
Blocking unit, for according to the preset rules universal formulation being at least by target low dimensional characteristic vector Hash codes Two Hash code blocks, wherein, the corresponding location index of each Hash code block;
Sub-clustering unit, for Target Photo to be divided at least one Target Photo cluster in the multiple picture cluster, Wherein, with Target Photo in same position index there is identical Hash code block in the picture in Target Photo cluster;
Search unit, at least one Target Photo cluster, being indexed according to the same position is removed between each picture The distance of Hash code block outside corresponding identical Hash code block searches for the similar pictures of Target Photo.
The third aspect, the embodiment of the present invention additionally provide a kind of server, including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processing Device realizes the recognition methods of similar pictures as described above.
Fourth aspect, the embodiment of the present invention additionally provide a kind of computer-readable recording medium, are stored thereon with computer Program, the program realize the recognition methods of similar pictures as described above when being executed by processor.
The low dimensional characteristic vector of every width picture is calculated by using the average hash algorithm of picture for the embodiment of the present invention Mass picture is carried out sub-clustering, then with the presence or absence of identical by Hash codes, the Hash code block indexed according to the same position after division The distance of Hash code block in addition to the identical Hash code block according to corresponding between each picture except same position index identifies phase Like picture, solve the problems, such as to identify that the method complexity of similar pictures is high, computationally intensive in the prior art, reduce similar diagram The computation complexity of piece identification, reduces amount of calculation, realizes the efficient calculating of picture similarity.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the recognition methods for similar pictures that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of the recognition methods for similar pictures that the embodiment of the present invention two provides;
Fig. 3 is a kind of structural representation of the identification device for similar pictures that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural representation for server that the embodiment of the present invention four provides.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that in order to just Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.
Embodiment one
Fig. 1 is a kind of flow chart of the recognition methods for similar pictures that the embodiment of the present invention one provides, and the present embodiment can fit For identifying the situation of similar pictures, this method can be performed by the identification device of similar pictures, and the device can use soft The mode of part and/or hardware is realized, such as is configured in server.As shown in figure 1, this method specifically includes:
Step S110, the low dimensional characteristic vector Hash codes of every width picture are calculated using the average hash algorithm of picture.
The identification of similar pictures is carried out in the picture of magnanimity, average hash algorithm (Average Hash, aHash) passes through The size for comparing each pixel and all pixels point average value in the gray-scale map of every width picture conversion obtains every width picture Characteristic vector Hash codes.It is exemplary, original picture can be zoomed to 8 × 8 size first, the picture after scaling is converted into Gray-scale map, it can so obtain the low dimensional characteristic vector Hash codes of 64.The every width picture obtained using aHash algorithms it is low Each in dimensional characteristics vector is 0 or 1.For similar picture, the Hash codes of generation are divided into multiple Hash At least one piece is identical in code block.Explanation is needed exist for, the low latitudes feature of picture is being calculated using aHash algorithms During vectorial Hash codes, the picture after scaling can not also be converted into gray-scale map, can be according to actual conditions on this Configured.
The algorithm of generation picture Hash codes includes but is not limited to aHash algorithms, can be calculated according to being actually needed using other Method, such as perceive hash algorithm (Perceptual Hash, pHash) or color histogram etc..
Step S120, it is at least according to preset rules universal formulation by the low dimensional characteristic vector Hash codes of every width picture Two Hash code blocks, wherein, the corresponding location index of each Hash code block.
In the step, it is according to preset rules universal formulation respectively by the low dimensional characteristic vector Hash codes of each width picture At least two Hash code blocks.Wherein preset rules are that user enters according to the actual judgement demand of oneself to the Hash codes of every width picture Row piecemeal is preset, for example, obtained picture Hash codes have 64, then this Hash codes can be divided into 4 pieces, every piece of digit For 16.If default be necessary to calculate picture when the Hash codes block number mesh differed between two width pictures is less than or equal to 3 Similarity, then it is effective when the Hash codes piecemeal number of every width picture being at least set to 4 pieces.Differed when between two width pictures Hash codes block number when being more than 3 pieces, just It is not necessary to which the further similarity relatively between two width pictures, determines two width pictures It is dissimilar.
In the step, optionally, every width picture includes at least two Hash that Hash code length is identical or differs Code block, and include coincidence or misaligned part between Hash code block.In the above example, 64 Hash codes are divided into 4 Hash code blocks, can not also averagely be divided, i.e., the digit in this 4 Hash code blocks can be identical, can also differ;Also, If intersection in 4 Hash code blocks be present, the total bit of 4 Hash code blocks is more than 64, if in 4 Hash code blocks In the absence of intersection, then the total bit of 4 Hash code blocks is still equal to 64., can be with the division rule of Hash code block Configure and preset according to being actually needed, unified to every width picture will be divided afterwards according to default rule.
It is exemplary for the specific partition process of Hash code block, 64 bitmap piece Hash codes obtained above are averagely drawn It is divided into 4 Hash code block A, B, C and D, is not present overlaps each other, it is contemplated that the corresponding location index of each Hash code block, The picture will be corresponding with 4 location indexs, if made using Hash code block A as index block, remaining Hash code block B, C and D For an entirety, as Hash code block A key assignments, similarly, if using Hash code block B as index block, remaining Hash codes Block A, C and D are as an entirety, as Hash code block B key assignments, the like share 4 kinds of combinations, i.e. the picture is corresponding 4 key-value pairs.In similar pictures identification process, take one of them 16 Hash code block accurately to be matched if appointing, work as sample Picture library have in having altogether 2 34 powers (the similar individual Hash codes fingerprint in 1,000,000,000) (the corresponding pictures of each Hash fingerprint), 2 18 powers (value is 262144) individual candidate result is returned for each index Hash code block, substantially reduces and calculates different figures The amount of calculation of the distance between the Hash code block of piece.
Exemplary, above-mentioned 64 bitmap piece Hash codes can also be carried out to non-average division, divide obtained Hash code block Between intersection be present.For example, 64 bitmap piece Hash codes can be divided into 4 pieces first, optional 16 conducts one therein Hash code block F, 4 Hash codes sub-blocks e, f, g and h are then divided into again by remaining 48 in picture Hash codes.It is each to breathe out Uncommon numeral block is 12, can be combined using Hash code block F with Hash codes sub-block e as an index block, and this index block is 28, Key assignments of the remaining Hash codes as this index block;Can also analogize by Hash code block F and Hash codes sub-block f, Hash code block F with Hash codes sub-block g or Hash code block F combines with Hash codes sub-block h is used as an index block, shares four kinds of combinations.It is final right 4X4 kind combinations, i.e. 4X4 key-value pair are shared in this non-equal division methodology one.In similar pictures identification process, to 16 ropes Draw block parallel search, be not in the situation that Hash code block is omitted.In addition, it is divided into 4 Kazakhstan compared to 64 bitmap piece Hash codes Uncommon code block, each index key correspond to the situation of 16, carry out matched and searched with the 28 position index blocks that mode obtains non-respectively, return The number of results returned is less, i.e., the amount of calculation of the distance between different picture Hash code blocks is less.
Step S130, at least two Hash code blocks of every width picture, identical Hash is present into same position index The picture of code block is divided into same cluster, obtains multiple picture clusters.
In the step, the Hash code block of more every width picture can be according to each in the Hash code block that same position indexes Partial value is compared, and the value identical Hash code block of each part is considered as into identical Hash code block, then there will be phase The picture of same Hash code block is divided into same cluster, finally gives multiple picture clusters.After mass picture is clustered, each Picture number in cluster is relatively fewer, so that the amount of calculation of similarity minimizes between similar pictures.
Step S140, in each picture cluster, the identical Kazakhstan according to corresponding to same position index is removed between each picture The distance of Hash code block outside uncommon code block identifies similar pictures.
Specifically, after due to mass picture sub-clustering, the picture number of each picture cluster is relative to be reduced, and can reach raising Calculate the effect of performance, during Similarity Measure is carried out, remove same position between each picture index corresponding to it is identical Hash code block, respectively using remaining Hash code block in every width picture as an entirety, calculate be left between each picture accordingly Hash code block distance to identify similar pictures, further improve calculate performance.Wherein, the distance being calculated is smaller, table Bright two width picture is more similar, i.e., similarity is bigger;Distance is bigger, shows that two width picture differences are bigger, i.e., similarity is smaller.
Optionally, this method also includes:
Hash code block according to corresponding between each picture except same position index in addition to identical Hash code block away from From calculating the similarity between picture in each picture cluster;
The similarity between the picture being calculated in whole picture clusters is counted, the similarity is exceeded to the figure of predetermined threshold value Piece is identified as similar pictures.
Wherein, picture sub-clustering is carried out according to the identical Hash code block between picture, there may be phase in different picture clusters Same picture.Exemplary, picture A and picture B are in multiple picture clusters simultaneously, calculate the picture phase in each picture cluster Like degree, it will obtain multiple picture A and picture B Similarity value, count the similar of the picture that is calculated in whole picture clusters After degree, a picture A and picture B Similarity value can be obtained, so as to reach the purpose of duplicate removal simplification.
The low dimensional characteristic vector of every width picture is calculated by using the average hash algorithm of picture for the embodiment of the present invention Hash codes, the Hash code block indexed using similar salted hash Salted according to the same position after division is with the presence or absence of identical, by magnanimity Picture carries out sub-clustering, then the Hash in addition to the identical Hash code block according to corresponding between each picture except same position index The distance identification similar pictures of code block, solve and identify that the method complexity of similar pictures is high, computationally intensive in the prior art Problem, the computation complexity of similar pictures identification is reduced, amount of calculation is reduced, realizes the efficient calculating of picture similarity.
Embodiment two
Fig. 2 is a kind of flow chart of the recognition methods for similar pictures that the embodiment of the present invention two provides, and the present embodiment is upper State and further optimize on the basis of embodiment.As shown in Fig. 2 this method specifically includes:
Step S210, in response to the similar pictures searching request of Target Photo, mesh is calculated using the average hash algorithm of picture Mark on a map the target low dimensional characteristic vector Hash codes of piece.
In the step, specifically, user inputs Target Photo to be searched on webpage or application software, server receives Target Photo, the similar pictures searching request of Target Photo is responded, the mesh of Target Photo is calculated using the average hash algorithm of picture Mark low dimensional characteristic vector Hash codes.
Step S220, it is at least two Hash according to preset rules universal formulation by target low dimensional characteristic vector Hash codes Code block, wherein, the corresponding location index of each Hash code block.
Wherein, dividing the obtained length of each target low dimensional characteristic vector Hash code block can be with identical or not phase Together, and between Hash code block coincidence or misaligned part are included.
Step S230, Target Photo is divided at least one Target Photo cluster in obtained multiple picture clusters, its In, identical Hash code block be present in same position index with Target Photo in the picture in Target Photo cluster.
Step S240, at least one Target Photo cluster, according to being removed between each picture, same position index is corresponding Identical Hash code block outside the distance of Hash code block search for the similar pictures of Target Photo.
Specifically, calculate between Target Photo and picture in each picture cluster except identical corresponding to same position index The distance between Hash code block outside Hash code block, the distance being calculated is smaller, and corresponding picture similarity is bigger, will The picture recognition that Similarity value exceedes threshold value is similar pictures, and server returns higher with Target Photo similarity after the completion of search Picture, and show on user's webpage or application software the search result of similar pictures.Wherein, similarity threshold is that user is pre- First set, for example, user sets similarity threshold as 98%, then more than or equal to 98% in the picture Similarity value being calculated Picture will be identified that similar pictures.
The low dimensional characteristic vector of Target Photo is calculated by using the average hash algorithm of picture for the embodiment of the present invention Hash codes, piecemeal and sub-clustering, Ran Hougen are carried out to the low dimensional characteristic vector Hash codes of Target Photo using similar salted hash Salted According to the Hash code block removed between picture in each Target Photo cluster outside identical Hash code block corresponding to same position index Distance identification Target Photo similar pictures, it is achieved thereby that in mass picture similar pictures fast search.
Embodiment three
Fig. 3 be the embodiment of the present invention three provide a kind of similar pictures identification device structural representation, the present embodiment It is applicable to identify the situation of similar pictures.The device that the present embodiment is provided can perform what any embodiment of the present invention was provided The recognition methods of similar pictures, possess the corresponding functional module of execution method and beneficial effect.
As shown in figure 3, the identification device of the similar pictures of the present embodiment includes Hash codes computing module 310, piecemeal module 320th, the identification module 340 of sub-clustering module 330 and first.Wherein:
Hash codes computing module 310, the low dimensional for every width picture to be calculated using the average hash algorithm of picture are special Levy vectorial Hash codes.
Piecemeal module 320, for by the low dimensional characteristic vector Hash codes of every width picture, according to preset rules universal formulation For at least two Hash code blocks, wherein, the corresponding location index of each Hash code block.
Further, piecemeal module 320 is specifically used for the low dimensional characteristic vector Hash codes of every width picture, according to default Regular universal formulation is to include at least two Hash code blocks that Hash code length is identical or differs, and between Hash code block Including coincidence or misaligned part.
Sub-clustering module 330, at least two Hash code blocks of every width picture, same position index being present identical The picture of Hash code block be put into same cluster, obtain multiple picture clusters.
First identification module 340, in each picture cluster, according to removing same position index pair between each picture The distance of Hash code block outside the identical Hash code block answered identifies similar pictures.
Optionally, the device also includes computing module and the second identification module, wherein:
Computing module, in addition to the identical Hash code block according to corresponding between each picture except same position index The distance of Hash code block, calculate the similarity between picture in each picture cluster.
Second identification module, it is for counting the similarity between the picture being calculated in whole picture clusters, this is similar Degree is similar pictures beyond the picture recognition of predetermined threshold value.
Further, the device also includes search module, for searching for the similar pictures of Target Photo.Wherein,
Search module specifically includes:
Target Hash codes computing unit, for the similar pictures searching request in response to Target Photo, it is averaged using picture Hash algorithm calculates the target low dimensional characteristic vector Hash codes of Target Photo;
Blocking unit, for by target low dimensional characteristic vector Hash codes according to preset rules universal formulation be at least two Hash code block, wherein, the corresponding location index of each Hash code block;
Sub-clustering unit, at least one Target Photo cluster being divided into Target Photo in obtained multiple picture clusters In, wherein, identical Hash code block be present in same position index with Target Photo in the picture in Target Photo cluster;
Search unit, at least one Target Photo cluster, being indexed according to the same position is removed between each picture The distance of Hash code block outside corresponding identical Hash code block searches for the similar pictures of Target Photo.
The low dimensional characteristic vector of every width picture is calculated by using the average hash algorithm of picture for the embodiment of the present invention Mass picture is carried out sub-clustering, then with the presence or absence of identical by Hash codes, the Hash code block indexed according to the same position after division The distance identification of Hash code block outside the identical Hash code block according to corresponding to same position index is removed between each picture is similar Picture, solve the problems, such as to identify that the method complexity of similar pictures is high, computationally intensive in the prior art, reduce similar pictures The computation complexity of identification, reduces amount of calculation, realizes the efficient calculating of picture similarity, realizes the quick of similar pictures Search.
Example IV
Fig. 4 is a kind of structural representation for server that the embodiment of the present invention four provides.Fig. 4 is shown suitable for being used for realizing The block diagram of the exemplary servers 12 of embodiment of the present invention.The server 12 that Fig. 4 is shown is only an example, should not be to this The function and use range of inventive embodiments bring any restrictions.
As shown in figure 4, server 12 is showed in the form of universal computing device.The component of server 12 can be included but not It is limited to:One or more processor 16, storage device 28, connection different system component (including storage device 28 and processor 16) bus 18.
Bus 18 represents the one or more in a few class bus structures, including storage device bus or storage device control Device, peripheral bus, graphics acceleration port, processor or total using the local of any bus structures in a variety of bus structures Line.For example, these architectures include but is not limited to industry standard architecture (Industry Subversive Alliance, ISA) bus, MCA (Micro Channel Architecture, MAC) bus is enhanced Isa bus, VESA (Video Electronics Standards Association, VESA) local are total Line and periphery component interconnection (Peripheral Component Interconnect, PCI) bus.
Server 12 typically comprises various computing systems computer-readable recording medium.These media can be it is any being capable of being serviced The usable medium that device 12 accesses, including volatibility and non-volatile media, moveable and immovable medium.
Storage device 28 can include the computer system readable media of form of volatile memory, such as arbitrary access is deposited Reservoir (Random Access Memory, RAM) 30 and/or cache memory 32.Server 12 may further include Other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, storage system 34 It can be used for reading and writing immovable, non-volatile magnetic media (Fig. 4 is not shown, is commonly referred to as " hard disk drive ").Although Fig. 4 Not shown in, the disc driver for being read and write to may move non-volatile magnetic disk (such as " floppy disk ") can be provided, and it is right Removable anonvolatile optical disk, such as read-only optical disc (Compact Disc Read-Only Memory, CD-ROM), numeral regards Disk (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical mediums) read-write disc drives Device.In these cases, each driver can be connected by one or more data media interfaces with bus 18.Storage dress At least one program product can be included by putting 28, and the program product has one group of (for example, at least one) program module, these journeys Sequence module is configured to perform the function of various embodiments of the present invention.
Program/utility 40 with one group of (at least one) program module 42, can be stored in such as storage device In 28, such program module 42 include but is not limited to operating system, one or more application program, other program modules with And routine data, the realization of network environment may be included in each or certain combination in these examples.Program module 42 is logical Often perform the function and/or method in embodiment described in the invention.
Server 12 can also be logical with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.) Letter, can also enable a user to the equipment communication interacted with the server 12 with one or more, and/or with causing the server 12 any equipment (such as network interface card, the modem etc.) communications that can be communicated with one or more of the other computing device. This communication can be carried out by input/output (I/O) interface 22.Also, server 12 can also pass through network adapter 20 With one or more network (such as LAN (Local Area Network, LAN), wide area network (Wide Area Network, WAN) and/or public network, such as internet) communication.As shown in figure 4, network adapter 20 by bus 18 with Other modules communication of server 12.It should be understood that although not shown in the drawings, can combine server 12 uses other hardware And/or software module, include but is not limited to:Microcode, device driver, redundant processor, external disk drive array, disk Array (Redundant Arrays of Independent Disks, RAID) system, tape drive and data backup is deposited Storage system etc..
Processor 16 is stored in the program in storage device 28 by operation, so as to perform various function application and data Processing, such as realize the recognition methods for the similar pictures that the embodiment of the present invention is provided.
Embodiment five
The embodiment of the present invention five additionally provides a kind of computer-readable recording medium, is stored thereon with computer program, should The recognition methods of the similar pictures provided such as the embodiment of the present invention is provided when program is executed by processor.
The computer-readable storage medium of the embodiment of the present invention, any of one or more computer-readable media can be used Combination.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any combination above.The more specifically example (non exhaustive list) of computer-readable recording medium includes:Tool There are the electrical connections of one or more wires, portable computer diskette, hard disk, random access memory (RAM), read-only storage (ROM), erasable programmable read only memory (Erasable Programmable Read Only Memory, EPROM, or Flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device or above-mentioned Any appropriate combination.In this document, computer-readable recording medium can be any includes or tangible Jie of storage program Matter, the program can be commanded the either device use or in connection of execution system, device.
Computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for By instruction execution system, device either device use or program in connection.
The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.
It can be write with one or more programming languages or its combination for performing the computer that operates of the present invention Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Also include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with Fully perform, partly perform on the user computer on the user computer, the software kit independent as one performs, portion Divide and partly perform or performed completely on remote computer or server on the remote computer on the user computer. Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as carried using Internet service Pass through Internet connection for business).
Pay attention to, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes, Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.

Claims (10)

  1. A kind of 1. recognition methods of similar pictures, it is characterised in that including:
    The low dimensional characteristic vector Hash codes of every width picture are calculated using the average hash algorithm of picture;
    It is at least two Hash code blocks according to preset rules universal formulation by the low dimensional characteristic vector Hash codes of every width picture, Wherein, the corresponding location index of each Hash code block;
    In at least two Hash code blocks of every width picture, the picture that same position index is existed to identical Hash code block divides Into same cluster, multiple picture clusters are obtained;
    In each picture cluster, the Kazakhstan in addition to the identical Hash code block according to corresponding between each picture except same position index The distance of uncommon code block identifies similar pictures.
  2. 2. according to the method for claim 1, it is characterised in that every width picture include Hash code length it is identical or At least two Hash code blocks differed, and include coincidence or misaligned part between the Hash code block.
  3. 3. method according to claim 1 or 2, it is characterised in that methods described also includes:
    The distance of Hash code block in addition to the identical Hash code block according to corresponding between each picture except same position index, meter Calculate the similarity between picture in each picture cluster;
    The similarity between the picture being calculated in whole picture clusters is counted, picture of the similarity beyond predetermined threshold value is known Wei not similar pictures.
  4. 4. method according to claim 1 or 2, it is characterised in that methods described also includes:
    In response to the similar pictures searching request of Target Photo, the target for calculating Target Photo using the average hash algorithm of picture is low Dimensional characteristics vector Hash codes;
    According to the preset rules universal formulation it is at least two Hash code blocks by target low dimensional characteristic vector Hash codes, its In, the corresponding location index of each Hash code block;
    Target Photo is divided at least one Target Photo cluster in the multiple picture cluster, wherein, in Target Photo cluster Picture and Target Photo identical Hash code block be present in same position index;
    In at least one Target Photo cluster, the identical Hash code block according to corresponding to same position index is removed between each picture Outside the distance of Hash code block search for the similar pictures of Target Photo.
  5. A kind of 5. identification device of similar pictures, it is characterised in that including:
    Hash codes computing module, the low dimensional characteristic vector for every width picture to be calculated using the average hash algorithm of picture are breathed out Uncommon code;
    Piecemeal module, for being at least according to preset rules universal formulation by the low dimensional characteristic vector Hash codes of every width picture Two Hash code blocks, wherein, the corresponding location index of each Hash code block;
    Sub-clustering module, at least two Hash code blocks of every width picture, same position index to be existed into identical Hash The picture of code block is put into same cluster, obtains multiple picture clusters;
    First identification module, in each picture cluster, the phase according to corresponding to same position index is removed between each picture Similar pictures are identified with the distance of the Hash code block outside Hash code block.
  6. 6. device according to claim 5, it is characterised in that the piecemeal module is specifically used for the low-dimensional of every width picture Spend characteristic vector Hash codes, according to preset rules universal formulation be include Hash code length it is identical or differ at least two Hash code block, and include coincidence or misaligned part between the Hash code block.
  7. 7. the device according to claim 5 or 6, it is characterised in that described device also includes:
    Computing module, for the Hash in addition to the identical Hash code block according to corresponding between each picture except same position index The distance of code block, calculate the similarity between picture in each picture cluster;
    Second identification module, for counting the similarity between the picture being calculated in whole picture clusters, the similarity is surpassed The picture recognition for going out predetermined threshold value is similar pictures.
  8. 8. the device according to claim 5 or 6, it is characterised in that described device also includes search module, for searching for mesh Mark on a map the similar pictures of piece;
    Search module specifically includes:
    Target Hash codes computing unit, for the similar pictures searching request in response to Target Photo, utilize the average Hash of picture Algorithm calculates the target low dimensional characteristic vector Hash codes of Target Photo;
    Blocking unit, for by target low dimensional characteristic vector Hash codes according to the preset rules universal formulation be at least two Hash code block, wherein, the corresponding location index of each Hash code block;
    Sub-clustering unit, for Target Photo to be divided at least one Target Photo cluster in the multiple picture cluster, wherein, Identical Hash code block be present in same position index with Target Photo in the picture in Target Photo cluster;
    Search unit, at least one Target Photo cluster, same position index to be corresponding according to being removed between each picture Identical Hash code block outside the distance of Hash code block search for the similar pictures of Target Photo.
  9. A kind of 9. server, it is characterised in that including:
    One or more processors;
    Storage device, for storing one or more programs,
    When one or more of programs are by one or more of computing devices so that one or more of processors are real The now recognition methods of the similar pictures as described in any in claim 1-4.
  10. 10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The recognition methods of the similar pictures as described in any in claim 1-4 is realized during execution.
CN201710945888.XA 2017-10-12 2017-10-12 The recognition methods of similar pictures and device, server, storage medium Active CN107729935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710945888.XA CN107729935B (en) 2017-10-12 2017-10-12 The recognition methods of similar pictures and device, server, storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710945888.XA CN107729935B (en) 2017-10-12 2017-10-12 The recognition methods of similar pictures and device, server, storage medium

Publications (2)

Publication Number Publication Date
CN107729935A true CN107729935A (en) 2018-02-23
CN107729935B CN107729935B (en) 2019-11-12

Family

ID=61210968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710945888.XA Active CN107729935B (en) 2017-10-12 2017-10-12 The recognition methods of similar pictures and device, server, storage medium

Country Status (1)

Country Link
CN (1) CN107729935B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536769A (en) * 2018-03-22 2018-09-14 深圳市安软慧视科技有限公司 Image analysis method, searching method and device, computer installation and storage medium
CN108595710A (en) * 2018-05-11 2018-09-28 杨晓春 A kind of quick mass picture De-weight method
CN110399511A (en) * 2019-07-23 2019-11-01 中南民族大学 Image cache method, equipment, storage medium and device based on Redis
CN110490250A (en) * 2019-08-19 2019-11-22 广州虎牙科技有限公司 A kind of acquisition methods and device of artificial intelligence training set
CN111079757A (en) * 2018-10-19 2020-04-28 北京奇虎科技有限公司 Clothing attribute identification method and device and electronic equipment
CN111078914A (en) * 2019-12-18 2020-04-28 书行科技(北京)有限公司 Method and device for detecting repeated pictures
CN111368122A (en) * 2020-02-14 2020-07-03 深圳壹账通智能科技有限公司 Method and device for removing duplicate pictures
CN111506756A (en) * 2019-01-30 2020-08-07 北京京东尚科信息技术有限公司 Similar picture searching method and system, electronic device and storage medium
CN111522989A (en) * 2020-07-06 2020-08-11 南京梦饷网络科技有限公司 Method, computing device, and computer storage medium for image retrieval
EP3767483A4 (en) * 2018-03-12 2021-12-08 Tencent Technology (Shenzhen) Company Limited Method, device, system, and server for image retrieval, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819582A (en) * 2012-07-26 2012-12-12 华数传媒网络有限公司 Quick searching method for mass images
CN103984776A (en) * 2014-06-05 2014-08-13 北京奇虎科技有限公司 Repeated image identification method and image search duplicate removal method and device
CN104112284A (en) * 2013-04-22 2014-10-22 阿里巴巴集团控股有限公司 Method and equipment for detecting similarity of images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819582A (en) * 2012-07-26 2012-12-12 华数传媒网络有限公司 Quick searching method for mass images
CN104112284A (en) * 2013-04-22 2014-10-22 阿里巴巴集团控股有限公司 Method and equipment for detecting similarity of images
CN103984776A (en) * 2014-06-05 2014-08-13 北京奇虎科技有限公司 Repeated image identification method and image search duplicate removal method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YANG B,等: "Block mean value based image perceptual Hashing", 《PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON IIH-MSP》 *
辰辰沉沉沉: "较大规模图片 使用phash去重", 《HTTPS://WWW.JIANSHU.COM/P/C87F6F69D51F》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11347787B2 (en) 2018-03-12 2022-05-31 Tencent Technology (Shenzhen) Company Limited Image retrieval method and apparatus, system, server, and storage medium
EP3767483A4 (en) * 2018-03-12 2021-12-08 Tencent Technology (Shenzhen) Company Limited Method, device, system, and server for image retrieval, and storage medium
CN108536769A (en) * 2018-03-22 2018-09-14 深圳市安软慧视科技有限公司 Image analysis method, searching method and device, computer installation and storage medium
CN108595710B (en) * 2018-05-11 2021-07-13 杨晓春 Rapid massive picture de-duplication method
CN108595710A (en) * 2018-05-11 2018-09-28 杨晓春 A kind of quick mass picture De-weight method
CN111079757A (en) * 2018-10-19 2020-04-28 北京奇虎科技有限公司 Clothing attribute identification method and device and electronic equipment
CN111506756B (en) * 2019-01-30 2024-05-17 北京京东尚科信息技术有限公司 Method and system for searching similar pictures, electronic equipment and storage medium
CN111506756A (en) * 2019-01-30 2020-08-07 北京京东尚科信息技术有限公司 Similar picture searching method and system, electronic device and storage medium
CN110399511A (en) * 2019-07-23 2019-11-01 中南民族大学 Image cache method, equipment, storage medium and device based on Redis
CN110490250A (en) * 2019-08-19 2019-11-22 广州虎牙科技有限公司 A kind of acquisition methods and device of artificial intelligence training set
CN111078914B (en) * 2019-12-18 2023-04-18 书行科技(北京)有限公司 Method and device for detecting repeated pictures
CN111078914A (en) * 2019-12-18 2020-04-28 书行科技(北京)有限公司 Method and device for detecting repeated pictures
CN111368122A (en) * 2020-02-14 2020-07-03 深圳壹账通智能科技有限公司 Method and device for removing duplicate pictures
CN111368122B (en) * 2020-02-14 2022-09-30 深圳壹账通智能科技有限公司 Method and device for removing duplicate pictures
CN111522989A (en) * 2020-07-06 2020-08-11 南京梦饷网络科技有限公司 Method, computing device, and computer storage medium for image retrieval

Also Published As

Publication number Publication date
CN107729935B (en) 2019-11-12

Similar Documents

Publication Publication Date Title
CN107729935B (en) The recognition methods of similar pictures and device, server, storage medium
CN108205655B (en) Key point prediction method and device, electronic equipment and storage medium
CN108229419B (en) Method and apparatus for clustering images
US11727053B2 (en) Entity recognition from an image
CN108280477B (en) Method and apparatus for clustering images
CN108197532A (en) The method, apparatus and computer installation of recognition of face
WO2013155417A2 (en) Data coreset compression
CN113892113A (en) Human body posture estimation method and device
WO2022041188A1 (en) Accelerator for neural network, acceleration method and device, and computer storage medium
CN113722409A (en) Method and device for determining spatial relationship, computer equipment and storage medium
WO2023282847A1 (en) Detecting objects in a video using attention models
JP2024508867A (en) Image clustering method, device, computer equipment and computer program
US9928408B2 (en) Signal processing
CN112966687B (en) Image segmentation model training method and device and communication equipment
CN110619253B (en) Identity recognition method and device
CN110717405B (en) Face feature point positioning method, device, medium and electronic equipment
CN111915689B (en) Method, apparatus, electronic device, and computer-readable medium for generating an objective function
CN114639143B (en) Portrait archiving method, device and storage medium based on artificial intelligence
CN114972146A (en) Image fusion method and device based on generation countermeasure type double-channel weight distribution
CN113721240A (en) Target association method and device, electronic equipment and storage medium
CN113051406A (en) Character attribute prediction method, device, server and readable storage medium
CN110969651B (en) 3D depth of field estimation method and device and terminal equipment
WO2022044104A1 (en) Image matching apparatus, control method, and non-transitory computer-readable storage medium
CN112966606B (en) Image recognition method, related device and computer program product
CN114253992A (en) Data aggregation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231113

Address after: 311422 Room 125, 1st Floor, Building 197-2, Jiulong Avenue, Yinhu Street, Fuyang District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Dabei Biotechnology Co.,Ltd.

Address before: 310019 Room 204, building A12, No.9 Jiusheng Road, Jianggan District, Hangzhou City, Zhejiang Province

Patentee before: HANGZHOU BEIGOU TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right