WO2019128254A1 - 图像分析方法、装置、电子设备及可读存储介质 - Google Patents

图像分析方法、装置、电子设备及可读存储介质 Download PDF

Info

Publication number
WO2019128254A1
WO2019128254A1 PCT/CN2018/100249 CN2018100249W WO2019128254A1 WO 2019128254 A1 WO2019128254 A1 WO 2019128254A1 CN 2018100249 W CN2018100249 W CN 2018100249W WO 2019128254 A1 WO2019128254 A1 WO 2019128254A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target object
feature
analyzed
features
Prior art date
Application number
PCT/CN2018/100249
Other languages
English (en)
French (fr)
Inventor
张雷
Original Assignee
浙江宇视科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江宇视科技有限公司 filed Critical 浙江宇视科技有限公司
Priority to EP18894025.8A priority Critical patent/EP3734496A4/en
Priority to US16/770,433 priority patent/US20200402242A1/en
Publication of WO2019128254A1 publication Critical patent/WO2019128254A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/752Contour matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of image analysis technologies, and in particular, to an image analysis method, apparatus, electronic device, and readable storage medium.
  • one of the purposes of the present application is to provide an image analysis method, apparatus, electronic device, and readable storage medium, which can effectively remove environmental interference and obtain more accurate image retrieval results. Provide clues for quickly locating and finding target objects.
  • the embodiment of the present application provides an image analysis method, which is applied to an electronic device, and the method includes:
  • the method before obtaining the image to be analyzed, the method further includes:
  • the manner of configuring the full convolution network includes:
  • the image sample set including a plurality of image samples
  • Each area of the target object in each image sample is calibrated, and the calibrated image sample is input to the full convolution network training to obtain a trained full convolution network.
  • the method before obtaining the image to be analyzed, the method further includes:
  • the manner of configuring the convolutional neural network includes:
  • the image sample set including a plurality of image samples
  • Each image sample is input into a convolutional neural network and trained using a Softmax regression function to obtain a trained convolutional neural network.
  • the step of extracting features of each of the minimum circumscribed geometric frame corresponding regions based on the pre-configured convolutional neural network includes:
  • the frame corresponds to the characteristics of the area.
  • the method further includes:
  • each image in the pre-stored image library to obtain image features corresponding to each image in the pre-stored image library.
  • the manner of obtaining the minimum circumscribed geometric frame of each area includes:
  • comparing the target object feature with the image feature of each image in the pre-stored image library, and outputting the image analysis result of the image to be analyzed according to the comparison result including:
  • the calculation formula for calculating the cosine distance between the target object feature and the image feature of each image in the pre-stored image library is:
  • f i , f j represent the features extracted by the image i, j
  • 2 represents a two norm
  • d ( ⁇ ) represents between the target object feature and the image feature of each image in the pre-stored image library. Cosine distance.
  • the sorting formula for sorting each image in the pre-stored image library based on the cosine distance and generating a sorting result is:
  • n is the number of images of the sort result
  • represents a pre-stored image library
  • the embodiment of the present application further provides an image analyzing apparatus, which is applied to an electronic device, and the device includes:
  • Obtaining a module configured to obtain an image to be analyzed, where the target object is included in the image to be analyzed.
  • a segmentation module configured to segment each region of the target object from the image to be analyzed based on a pre-configured full convolution network.
  • the obtaining module is configured to obtain a minimum circumscribed geometric frame of each area.
  • an extracting module configured to extract features of corresponding regions of each of the minimum circumscribed geometric frames based on the pre-configured convolutional neural network, and connect features of each of the minimum circumscribed geometrical frames to obtain target object features of the target object.
  • a comparison module configured to compare the target object feature with an image feature of each image in the pre-stored image library, and output an image analysis result of the image to be analyzed according to the comparison result.
  • the device further includes:
  • a second training module configured to configure the convolutional neural network;
  • the second training module is specifically configured to receive an image sample set, the image sample set includes a plurality of image samples; and input each image sample into a convolution In the neural network, the Softmax regression function is used for training, and the trained convolutional neural network is obtained.
  • the extraction module is specifically configured to input image data in each of the minimum circumscribed geometric frames into the trained convolutional neural network model for processing, and the convolutional neural network model will be A plurality of features obtained by one layer are featured as corresponding regions of the respective minimum circumscribed geometric frames.
  • the device further includes:
  • the image library feature processing module is configured to process each image in the pre-stored image library through the full convolution network and the convolutional neural network to obtain image features corresponding to each image in the pre-stored image library.
  • the acquiring module is specifically configured to acquire a minimum circumscribed rectangle of the respective regions, or obtain a minimum circumscribed circle of the respective regions.
  • the comparison module is specifically configured to separately calculate a cosine distance between the target object feature and an image feature of each image in the pre-stored image library; and each of the pre-stored image libraries based on the cosine distance
  • the images are sorted and a sort result is generated, the sort result being an image analysis result of the image to be analyzed.
  • the calculation formula for calculating the cosine distance between the target object feature and the image feature of each image in the pre-stored image library is:
  • f i , f j represent the features extracted by the image i, j
  • 2 represents a two norm
  • d ( ⁇ ) represents between the target object feature and the image feature of each image in the pre-stored image library. Cosine distance.
  • the sorting formula for sorting each image in the pre-stored image library based on the cosine distance and generating a sorting result is:
  • n is the number of images of the sort result
  • represents a pre-stored image library
  • An embodiment of the present application further provides an electronic device, where the electronic device includes:
  • An image analysis device the device being stored in the storage medium and comprising a software function module executed by the processor, the device comprising:
  • Obtaining a module configured to obtain an image to be analyzed, where the target object is included in the image to be analyzed.
  • a segmentation module configured to segment each region of the target object from the image to be analyzed based on a pre-configured full convolution network.
  • the obtaining module is configured to obtain a minimum circumscribed geometric frame of each area.
  • an extracting module configured to extract features of corresponding regions of each of the minimum circumscribed geometric frames based on the pre-configured convolutional neural network, and connect features of each of the minimum circumscribed geometrical frames to obtain target object features of the target object.
  • a comparison module configured to compare the target object feature with an image feature of each image in the pre-stored image library, and output an image analysis result of the image to be analyzed according to the comparison result.
  • the embodiment of the present application further provides a readable storage medium, where the readable storage medium stores a computer program, and when the computer program is executed, the image analysis method described above is implemented.
  • An embodiment of the present application provides an image analysis method, apparatus, electronic device, and readable storage medium.
  • an image to be analyzed is obtained, where the image to be analyzed includes a target object, and then based on a pre-configured full convolution network.
  • Deriving each region of the target object in the analysis image, and acquiring a minimum circumscribed geometric frame of each region, and then extracting features of respective minimum circumscribed geometric frame corresponding regions based on a pre-configured convolutional neural network, and each The features of the corresponding area of the minimum circumscribing geometrical frame are connected to obtain the target object features of the target object, and finally, the target object features are compared with the image features of each image in the pre-stored image library, and the image to be analyzed is output according to the comparison result.
  • Image analysis results Thereby, the environmental interference can be effectively removed, and a more accurate image retrieval result can be obtained, thereby providing clues for quickly locating and finding the target object.
  • FIG. 1 is a schematic flowchart of an image analysis method according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of image segmentation according to an embodiment of the present application.
  • FIG. 5 is a schematic flow chart of each sub-step included in step S250 shown in FIG. 1;
  • FIG. 6 is a schematic block diagram of an electronic device for implementing the above image analysis method according to an embodiment of the present application.
  • Icons 100 - electronic device; 110 - storage medium; 120 - processor; 200 - image analysis device; 210 - acquisition module; 220 - segmentation module; 230 - acquisition module; 240 - extraction module; 250 - comparison module.
  • the inventor of the present application finds that the image retrieval methods of the prior art mainly include the following:
  • the image to be searched is divided into sub-pictures to obtain a plurality of sub-images; image feature extraction is performed on each of the plurality of sub-images to obtain feature vectors of the specified sub-images; And determining, according to the feature vector of each sub-image in each sub-image group to be matched of the image, and the feature vector of each of the designated sub-images, determining the similarity between the image and the image to be retrieved.
  • the scheme simply divides the image into multiple regions, is susceptible to occlusion, image misalignment and other factors, resulting in the image selection features not aligned, thus affecting the retrieval accuracy.
  • the image category features and self-encoding features are calculated to ensure the similarity of the image search results in the image categories, and the automatic coding algorithm is used to generate low-level image coding features to ensure that the images are similar in content, and then the self-encoding features are mixed.
  • the method further fuses the classification features and image self-encoding features to reduce the dimension, which makes the search results more fast and stable.
  • the inventors have carefully studied and found that the scheme performs image retrieval by combining category features and coding features, and reduces features, but needs to extract two different features, which has low operability and limits the application prospect of the scheme.
  • the visual vocabulary dictionary is established, and the visual saliency map is obtained by using the visual saliency feature fusion algorithm of the image. Then the foreground target image and the background area map of the image are obtained according to the saliency map segmentation, and the foreground target image and the background region map are respectively extracted. Image features and texture features for image retrieval.
  • the scheme obtains the foreground and the background through the saliency map, and the extracted color and texture features are easily interfered by the background and the complex environment, and the complexity of establishing the visual vocabulary dictionary is limited, which limits the scheme. Application prospects.
  • the inventor of the present application proposes the following technical solutions, which can effectively remove environmental interference, obtain more accurate image retrieval results, and provide clues for quickly locating and finding target objects.
  • FIG. 1 is a schematic flowchart of an image analysis method provided by an embodiment of the present application. It should be noted that the image analysis method provided by the embodiment of the present application is not limited to the specific order described in FIG. 1 and the following. The method can be implemented by the following steps:
  • Step S210 obtaining an image to be analyzed.
  • the manner of obtaining the image to be analyzed is not limited, for example, it may be acquired from the current shooting scene in real time by the monitoring device, or may be imported from an external terminal, or downloaded from a server.
  • the object to be analyzed includes a target object, and the target object is an object that needs to analyze a feature.
  • the target object may be a pedestrian or a specific object in an image.
  • Step S220 segmenting the respective regions of the target object from the image to be analyzed based on a pre-configured full convolution network.
  • the configuration process of the full convolution network is first described.
  • the configuration manner of the full convolution network can be implemented as follows:
  • an image sample set is received, wherein the image sample set includes a plurality of image samples, each of which includes a target object.
  • each area of the target object in each image sample is calibrated, and the calibrated image sample is input to perform full convolution network training to obtain a trained full convolution network.
  • each part of the body of the pedestrian such as a head area, an upper body area, and a lower body area (may also be a multi-area such as a head area, a left arm area, and a right arm)
  • the region, the upper body region, the right leg region, the left leg region, and the like are all marked with different pixel values, and the regions of different pixel values belong to different regions.
  • the left image in each group of images is the original image sample, and the middle image is the labeled various regions of the body. Then, the labeled image samples are used to train the Fully Convolutional Network (FCN) to obtain a full convolutional network with better network parameters after training.
  • FCN Fully Convolutional Network
  • the image to be analyzed is input into the full convolution network, and each region of the pedestrian can be segmented.
  • Step S230 acquiring a minimum circumscribed geometric frame of each area.
  • the pedestrian is divided into different regions by the above-described full convolution network segmentation.
  • it is necessary to remove the influence of the background image in the image data of each region as much as possible.
  • the minimum circumscribed geometric frame of each region can be used as the data extraction range.
  • the manner of obtaining the minimum circumscribed geometrical frame of each of the regions may be obtaining a minimum circumscribed rectangular frame of the respective regions, or acquiring a minimum circumscribed circle of the respective regions, and the like.
  • each area of the pedestrian in the rightmost image of each picture is marked by a minimum circumscribed rectangular frame.
  • the minimum circumscribed rectangular frame of each area is obtained, which can effectively remove the background interference and the like, and provide accurate partial parts of the pedestrian.
  • an orthogonal coordinate system including an x-axis and a y-axis may be established for an image to be recognized, and after each region on the target is identified, for each of the regions, obtaining the region is covered.
  • the coordinate value of the pixel in the orthogonal coordinate system, the minimum value x min on the x-axis, the maximum value x max on the x-axis, the minimum value y min on the y-axis, and the y-axis of each pixel point coordinate value are obtained.
  • the maximum value y min on A rectangle consisting of (x min , y min ), (x min , y max ), (x max , y min ), and (x max , y max ) is then used as the minimum circumscribed rectangle of the region.
  • the minimum external geometrical frame can also be implemented by using any other regular geometrical shape.
  • the minimum external rectangular frame is used.
  • Step S240 extracting features of each of the minimum circumscribed geometrical frame corresponding regions based on the pre-configured convolutional neural network, and connecting the features of each of the minimum circumscribed geometrical frame corresponding regions to obtain the target object features of the target object.
  • the configuration process of the convolutional neural network is first described.
  • the configuration manner of the convolutional neural network can be implemented as follows:
  • an image sample set is received, the image sample set including a plurality of image samples.
  • each image sample is input into the convolutional neural network for training, and the Softmax regression function is used for training to obtain the trained convolutional neural network.
  • the image data of each of the minimum circumscribed geometrical frames is input into the trained convolutional neural network model for processing, and A plurality of features obtained by the last layer of the convolutional neural network model are used as features of the corresponding regions of the respective minimum circumscribed geometric frames.
  • a 300-dimensional feature of the last layer of neural networks in the convolutional neural network can be extracted as an image feature of the image sample.
  • the inventors of the present invention have carefully studied and found that in the prior art, the image is divided according to a fixed size ratio, and then the features of each region are extracted, and finally the regions are connected to perform image retrieval.
  • the target object for example, pedestrian
  • the horizontal line is a fixed-scale dividing line, and each picture is divided into the first from top to bottom.
  • the first region is mainly extracted as a pedestrian head region feature in the first image and the third image, but in the second image is not the pedestrian head feature
  • the convolutional neural network is completed. After taking the target object as a pedestrian as an example, firstly, based on the convolutional neural network, features are extracted from the head region, the upper body region, and the lower body region of the pedestrian, and then extracted into the head region, the upper body region, and the lower body region of the pedestrian. On the basis of the three regions, the convolutional neural network is used to extract features, and finally the features of the three regions are connected together to obtain multi-dimensional features.
  • the extracted head region is characterized by a 100-dimensional feature
  • the upper body region is characterized by a 100-dimensional feature
  • the lower body region is characterized by a 100-dimensional feature
  • the three regions are connected together to obtain a 300-dimensional feature.
  • the 300-dimensional feature is the image feature of the target object (pedestrian).
  • the area feature is set to zero. Thereby, feature alignment is achieved, and image retrieval accuracy can be effectively improved.
  • Step S250 comparing the target object feature with the image feature of each image in the pre-stored image library, and outputting the image analysis result of the image to be analyzed according to the comparison result.
  • each image in the pre-stored image library may be processed in advance through the full convolution network and the convolutional neural network to obtain image features corresponding to each image in the pre-stored image library.
  • step S250 may be implemented by the following sub-steps:
  • Sub-step S251 respectively calculating a cosine distance between the target object feature and an image feature of each image in the pre-stored image library.
  • the pre-stored image library includes a plurality of images.
  • the target object feature is compared with the image feature of each image in the pre-stored image library.
  • the cosine distance also called Cosine Similarity
  • the calculation formula is:
  • f i , f j represent the features extracted by the image i, j
  • 2 represents a two norm
  • d ( ⁇ ) represents between the target object feature and the image feature of each image in the pre-stored image library.
  • Cosine distance The value of the cosine distance is between [-1, 1], and the value of the cosine distance is closer to 1, which means that the two features are closer; the value of the cosine distance is closer to -1, the opposite of the two features. The value of the cosine distance is close to 0, indicating that the two features are less correlated and have no comparability.
  • the cosine distance between the feature of the target object and the image feature of each image in the pre-stored image library can be calculated.
  • Sub-step S252 sorting each image in the pre-stored image library based on the cosine distance, and generating a sort result, the sort result being an image analysis result of the image to be analyzed.
  • each image in the pre-stored image library may be sorted by the following formula:
  • n is the number of images of the sort result
  • represents a pre-stored image library.
  • n can be set according to actual needs. For example, if n is 3, it indicates that the final sorting result includes an image with a cosine distance between the target object feature and an image feature of each image in the pre-stored image library being the top three. Therefore, after extracting corresponding features from the target object, a higher image retrieval result can be obtained, thereby providing an effective clue for quickly locating and finding the target object.
  • FIG. 6 is a schematic diagram of an electronic device 100 for implementing the image analysis method provided by an embodiment of the present application.
  • the electronic device 100 may be, but not limited to, a computer device having image analysis and processing capabilities, such as a personal computer (PC), a notebook computer, a monitoring device, and a server.
  • PC personal computer
  • notebook computer notebook computer
  • monitoring device monitoring device
  • server server
  • the electronic device 100 further includes an image analysis device 200, a storage medium 110, and a processor 120.
  • the image analysis apparatus 200 includes at least one software that can be stored in the storage medium 110 or is solidified in an operating system (OS) of the electronic device 100 in the form of software or firmware. functional module.
  • the processor 120 is configured to execute an executable software module stored in the storage medium 110, for example, a software function module, a computer program, and the like included in the image analysis device 200.
  • the image analysis apparatus 200 may also be integrated in the operating system as part of the operating system.
  • the image analysis device 200 includes:
  • the obtaining module 210 is configured to obtain an image to be analyzed, where the target object is included in the image to be analyzed.
  • the segmentation module 220 is configured to segment the regions of the target object from the image to be analyzed based on a pre-configured full convolution network.
  • the obtaining module 230 is configured to acquire a minimum circumscribed geometric frame of the respective regions.
  • the extracting module 240 is configured to extract features of corresponding regions of each of the minimum circumscribed geometrical frames based on the pre-configured convolutional neural network, and connect features of each of the minimum circumscribed geometrical frame corresponding regions to obtain target object features of the target object .
  • the comparison module 250 is configured to compare the target object feature with the image feature of each image in the pre-stored image library, and output an image analysis result of the image to be analyzed according to the comparison result.
  • the image analysis apparatus 200 may further include a first training module.
  • the first training module is configured to configure the full convolution network; the first training module is specifically configured to receive an image sample set, the image sample set includes a plurality of image samples; and a target in each image sample Each area of the object is calibrated, and the calibrated image samples are input to a full convolution network for training, and a trained full convolution network is obtained.
  • the image analysis apparatus 200 may further include a second training module.
  • the second training module is configured to configure the convolutional neural network; the second training module is specifically configured to receive an image sample set, the image sample set includes a plurality of image samples; and input each image sample into a volume In the neural network, the Softmax regression function is used for training, and the trained convolutional neural network is obtained.
  • the extracting module is specifically configured to input image data in each of the minimum circumscribed geometric frames into the trained convolutional neural network model for processing, and the convolutional nerve will be processed from the convolutional neural network.
  • a plurality of features obtained by the last layer of the network model are used as features of the corresponding regions of the respective minimum circumscribed geometric frames.
  • the image analysis apparatus 200 may further include an image library feature processing module.
  • the image library feature processing module is configured to process each image in the pre-stored image library through the full convolution network and the convolutional neural network to obtain image features corresponding to each image in the pre-stored image library .
  • the obtaining module 230 is configured to acquire a minimum circumscribed rectangle of the respective regions, or obtain a minimum circumscribed circle of the respective regions.
  • the comparison module 250 is specifically configured to separately calculate a cosine distance between the target object feature and an image feature of each image in the pre-stored image library; and pre-store the image based on the cosine distance
  • Each image in the library is sorted, and a sorting result is generated, which is the image analysis result of the image to be analyzed.
  • the calculating a formula for calculating a cosine distance between the target object feature and an image feature of each image in the pre-stored image library is:
  • f i , f j represent the features extracted by the image i, j
  • 2 represents a two norm
  • d ( ⁇ ) represents between the target object feature and the image feature of each image in the pre-stored image library. Cosine distance.
  • the sorting formula for sorting each image in the pre-stored image library based on the cosine distance and generating a sorting result is:
  • n is the number of images of the sort result
  • represents a pre-stored image library
  • the embodiments of the present application provide an image analysis method, apparatus, electronic device, and readable storage medium.
  • an image to be analyzed is obtained, where the image to be analyzed includes a target object, and then is based on a pre-configured full
  • the convolution network segments each region of the target object from the image to be analyzed, and obtains a minimum circumscribed geometric frame of each region, and then extracts features of corresponding regions of each minimum circumscribed geometric frame based on a pre-configured convolutional neural network.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • each functional module in each embodiment of the present application may be integrated to form a separate part, or each module may exist separately, or two or more modules may be integrated to form a separate part.
  • An embodiment of the present application provides an image analysis method, apparatus, electronic device, and readable storage medium.
  • an image to be analyzed is obtained, where the image to be analyzed includes a target object, and then based on a pre-configured full convolution network.
  • Deriving each region of the target object in the analysis image, and acquiring a minimum circumscribed geometric frame of each region, and then extracting features of respective minimum circumscribed geometric frame corresponding regions based on a pre-configured convolutional neural network, and each The features of the corresponding area of the minimum circumscribing geometrical frame are connected to obtain the target object features of the target object, and finally, the target object features are compared with the image features of each image in the pre-stored image library, and the image to be analyzed is output according to the comparison result.
  • Image analysis results Thereby, the environmental interference can be effectively removed, and a more accurate image retrieval result can be obtained, thereby providing clues for quickly locating and finding the target object.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供一种图像分析方法、装置、电子设备及可读存储介质。该方法包括:获得待分析图像,所述待分析图像中包括有目标对象;基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域;获取各个区域的最小外接几何框;基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征;将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出待分析图像的图像分析结果。由此,能够有效去除环境干扰,得到较精确的图像检索结果,进而为快速定位和查找目标对象提供线索。

Description

图像分析方法、装置、电子设备及可读存储介质
相关申请的交叉引用
本申请要求于2017年12月26日提交中国专利局的申请号为201711428999X、名称为“图像分析方法、装置、电子设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像分析技术领域,具体而言,涉及一种图像分析方法、装置、电子设备及可读存储介质。
背景技术
在一些图像分析的应用场景中,常需要根据用户或现场提供的图像快速检索出特定目标出现的位置、时间等,以对特定目标进行追踪。但是,在监控场景下一般容易受到一些环境因素影响,例如光线差、被遮挡或者其他检测不准确等因素,从而导致检索精度低,难以确定特定目标。
发明内容
为了克服现有技术中的至少一个不足,本申请的目的之一在于提供一种图像分析方法、装置、电子设备及可读存储介质,能够有效去除环境干扰,得到较精确的图像检索结果,进而为快速定位和查找目标对象提供线索。
本申请实施例提供一种图像分析方法,应用于电子设备,所述方法包括:
获得待分析图像,所述待分析图像中包括有目标对象;
基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域;
获取所述各个区域的最小外接几何框;
基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个所述最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征;
将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出所述待分析图像的图像分析结果。
在本申请实施例中,在获得待分析图像之前,所述方法还包括:
配置所述全卷积网络;
所述配置所述全卷积网络的方式,包括:
接收图像样本集,所述图像样本集中包括有多个图像样本;
对每个图像样本中的目标对象的各个区域进行标定,并将标定好的图像样本输入到进行全卷积网络训练,得到训练后的全卷积网络。
在本申请实施例中,在获得待分析图像之前,所述方法还包括:
配置所述卷积神经网络;
所述配置所述卷积神经网络的方式,包括:
接收图像样本集,所述图像样本集中包括有多个图像样本;
将每个图像样本输入到卷积神经网络中使用Softmax回归函数进行训练,得到训练后的卷积神经网络。
在本申请实施例中,所述基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征的步骤,包括:
将所述各最小外接几何框中的图像数据输入所述训练后的卷积神经网络模型进行处理,将从所述卷积神经网络模型最后一层获得的多个特征作为所述各最小外接几何框对应区域的特征。
在本申请实施例中,所述方法还包括:
通过所述全卷积网络及所述卷积神经网络对所述预存图像库中每张图像进行处理,获得所述预存图像库中每张图像对应的图像特征。
在本申请实施例中,所述获取所述各个区域的最小外接几何框的方式,包括:
获取所述各个区域的最小外接矩形框;或者
获取所述各个区域的最小外接圆。
在本申请实施例中,所述将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出所述待分析图像的图像分析结果,包括:
分别计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离;
基于所述余弦距离对预存图像库中每张图像进行排序,并生成排序结果,所述排序结果为所述待分析图像的图像分析结果。
在本申请实施例中,所述计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离的计算公式为:
Figure PCTCN2018100249-appb-000001
其中,f i,f j表示图像i,j提取的特征,||·|| 2表示二范数,d(·)表示所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离。
在本申请实施例中,所述基于所述余弦距离对预存图像库中每张图像进行排序,并生成排序结果的排序公式为:
Figure PCTCN2018100249-appb-000002
其中,n为所述排序结果的图像数量,Ω表示预存图像库。
本申请实施例还提供一种图像分析装置,应用于电子设备,所述装置包括:
获得模块,用于获得待分析图像,所述待分析图像中包括有目标对象。
分割模块,用于基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域。
获取模块,用于获取所述各个区域的最小外接几何框。
提取模块,用于基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个所述最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征。
比较模块,用于将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出所述待分析图像的图像分析结果。
在本申请实施例中,所述装置还包括:
第二训练模块,配置成配置所述卷积神经网络;所述第二训练模块具体配置成接收图像样本集,所述图像样本集中包括有多个图像样本;将每个图像样本输入到卷积神经网络中使用Softmax回归函数进行训练,得到训练后的卷积神经网络。
在本申请实施例中,所述提取模块具体配置成将所述各最小外接几何框中的图像数据输入所述训练后的卷积神经网络模型进行处理,将从所述卷积神经网络模型最后一层获得的多个特征作为所述各最小外接几何框对应区域的特征。
在本申请实施例中,所述装置还包括:
图像库特征处理模块,配置成通过所述全卷积网络及所述卷积神经网络对所述预存图像库中每张图像进行处理,获得所述预存图像库中每张图像对应的图像特征。
在本申请实施例中,所述获取模块具体配置成获取所述各个区域的最小外接矩形框,或者获取所述各个区域的最小外接圆。
在本申请实施例中,所述比较模块具体配置成分别计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离;基于所述余弦距离对预存图像库 中每张图像进行排序,并生成排序结果,所述排序结果为所述待分析图像的图像分析结果。
在本申请实施例中,所述计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离的计算公式为:
Figure PCTCN2018100249-appb-000003
其中,f i,f j表示图像i,j提取的特征,||·|| 2表示二范数,d(·)表示所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离。
在本申请实施例中,所述基于所述余弦距离对预存图像库中每张图像进行排序,并生成排序结果的排序公式为:
Figure PCTCN2018100249-appb-000004
其中,n为所述排序结果的图像数量,Ω表示预存图像库。
本申请实施例还提供一种电子设备,所述电子设备包括:
存储介质;
处理器;以及
图像分析装置,所述装置存储于所述存储介质中并包括由所述处理器执行的软件功能模块,所述装置包括:
获得模块,用于获得待分析图像,所述待分析图像中包括有目标对象。
分割模块,用于基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域。
获取模块,用于获取所述各个区域的最小外接几何框。
提取模块,用于基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个所述最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征。
比较模块,用于将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出所述待分析图像的图像分析结果。
本申请实施例还提供一种可读存储介质,所述可读存储介质中存储有计算机程序,所述计算机程序被执行时实现上述的图像分析方法。
相对于现有技术而言,本申请具有以下有益效果:
本申请实施例提供一种图像分析方法、装置、电子设备及可读存储介质,首先,获得待分析图像,所述待分析图像中包括有目标对象,接着基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域,并获取各个区域的最小外接几何框,然后,基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征,最后,将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出待分析图像的图像分析结果。由此,能够有效去除环境干扰,得到较精确的图像检索结果,进而为快速定位和查找目标对象提供线索。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它相关的附图。
图1为本申请实施例提供的图像分析方法的一种流程示意图;
图2为本申请实施例提供的一种图像分割示意图;
图3为现有技术中的区域分割示意图;
图4为本申请实施例提供的区域分割示意图;
图5为图1中所示的步骤S250包括的各个子步骤的一种流程示意图;
图6为本申请实施例提供的用于实现上述图像分析方法的电子设备的方框示意图。
图标:100-电子设备;110-存储介质;120-处理器;200-图像分析装置;210-获得模块;220-分割模块;230-获取模块;240-提取模块;250-比较模块。
具体实施方式
本申请发明人在实现本申请实施例提供的技术方案过程中,发现现有技术的图像检索方法主要包括下述几种:
第一,将待检索图像进行子图划分,得到多个子图像;对所述多个子图像中的各指定子图像进行图像特征提取,得到所述各指定子图像的特征向量;针对图像库中每个图像,基于该图像的每个待匹配子图像组中的各子图像的特征向量,以及所述各指定子图像的特征向量,确定该图像与所述待检索图像的相似度。然而,发明人仔细研究发现,该方案对图像进行简单的划分成多区域,易受到遮挡,图像不对齐等因素干扰,导致图像选取的特征不能对齐,从而影响检索精度。
第二,计算图像类别特征与自编码特征,保障以图搜图结果在图像类别上的相似,并利用自动编码算法产生低层次的图像编码特征,保障图像在内容上相似,再 混合自编码特征方法将分类特征,图像自编码特征进一步融合,降低维度,使得搜索结果更加快速、稳定。然而,发明人仔细研究发现,该方案通过结合类别特征与编码特征进行图像检索,并对特征进行降维,但需提取两种不同特征,可操作性低,限制了该方案的应用前景。
第三,建立视觉词汇字典,利用图像的视觉显著性特征融合算法得到视觉显著性图,再根据显著图分割得到图像的前景目标图和背景区域图,对前景目标图和背景区域图分别提取图像的色彩特征和纹理特征,进行图像检索。然而,发明人仔细研究发现,该方案通过显著性图得到前景与背景,同时提取的色彩与纹理特征,都易受到背景和复杂环境的干扰,且建立视觉词汇字典复杂度高,限制了该方案的应用前景。
以上现有技术中的方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本申请实施例针对上述问题所提出的解决方案,都应该是发明人在本申请过程中对本申请做出的贡献。
鉴于上述问题,本申请发明人提出下述技术方案,能够有效去除环境干扰,得到较精确的图像检索结果,进而为快速定位和查找目标对象提供线索。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
请参阅图1,为本申请实施例提供的图像分析方法的一种流程示意图。所应说明的是,本申请实施例提供的图像分析方法不以图1及以下所述的具体顺序为限制。所述方法可以通过如下步骤实现:
步骤S210,获得待分析图像。
本实施例中,对所述待分析图像的获得方式不作限制,例如可以通过监控设备实时从当前拍摄场景中获取,或者也可以从外部终端导入获取,又或者从服务器中 下载获取等。其中,所述待分析图像中包括有目标对象,所述目标对象为需要分析特征的对象,例如,所述目标对象可以是图像中的行人或特定物件等。
步骤S220,基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域。
具体地,在对所述步骤S220作进一步阐述之前,首先对所述全卷积网络的配置过程进行说明。作为一种实施方式,所述全卷积网络的配置方式可以通过如下方式实现:
首先,接收图像样本集,其中,所述图像样本集中包括有多个图像样本,每个图像样本中都包括有目标对象。
然后,对每个图像样本中的目标对象的各个区域进行标定,并将标定好的图像样本输入到进行全卷积网络训练,得到训练后的全卷积网络。详细地,以所述目标对象为行人为例,可以将行人身体的各部分区域,例如头部区域、上身区域、下身区域(也可以为多区域,例如头部区域、左胳膊区域、右胳膊区域、上身区域、右腿区域、左腿区域等)都标注为不同像素值,不同像素值的区域即属于不同的区域。如图2(a)到图2(c)所示,每组图像中的左图为原始图像样本,中间图为标注的身体各个区域。然后,利用标注好的图像样本进行全卷积网络(Fully Convolutional Network,FCN)的训练,得到训练之后较好网络参数的全卷积网络。
在上述训练之后的全卷积网络的基础上,将所述待分析图像输入至所述全卷积网络中,可以分割出行人的各个区域。
步骤S230,获取所述各个区域的最小外接几何框。
具体地,通过上述全卷积网络分割将行人分割成不同区域,为了提高识别的准确性,需要尽量去除各区域图像数据中背景图像的影响。
经发明人仔细研究,由于各区域的形状是不规则的,因此在本实施例中可以通过各区域的最小外接几何框作为数据提取范围。其中,获取所述各个区域的最小外接几何框的方式可以是获取所述各个区域的最小外接矩形框,或者获取所述各个区域的最小外接圆等等。在一种实施方式中,以最小外接矩形框为例,参照图2(a)到图2(c)所示,每张图片的最右边图像中行人的各个区域由最小外接矩形框标出,从而得到所述各个区域的最小外接矩形框,能够有效去除背景干扰等问题,提供准确的行人各部分区域。
具体地,本实施例中,可以针对待识别图像建立包括x轴和y轴的正交坐标系,在识别出所述目标上的各个区域后,针对每个所述区域,获取该区域所涵盖的像素 在所述正交坐标系中的坐标值,获得各像素点坐标值中x轴上的最小值x min、x轴上的最大值x max、y轴上的最小值y min和y轴上的最大值y min。然后将由(x min,y min)、(x min,y max)、(x max,y min)及(x max,y max)四个点组成矩形作为该区域的最小外接矩形。
值得说明的是,上述最小外接几何框也可以采用其它任意规则几何图形实现,作为优选,本实施例中采用最小外接矩形框实现。
步骤S240,基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个所述最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征。
具体地,在对所述步骤S240作进一步阐述之前,首先对所述卷积神经网络的配置过程进行说明。作为一种实施方式,所述卷积神经网络的配置方式可以通过如下方式实现:
首先,接收图像样本集,所述图像样本集中包括有多个图像样本。
然后,将每个图像样本输入到卷积神经网络中进行训练,使用Softmax回归函数进行训练,得到训练后的卷积神经网络。
在进行基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征时,将所述各最小外接几何框中的图像数据输入所述训练后的卷积神经网络模型进行处理,将从所述卷积神经网络模型最后一层获得的多个特征作为所述各最小外接几何框对应区域的特征。例如,可以提取所述卷积神经网络中最后一层神经网络的300维特征作为该图像样本的图像特征。
经本申请发明人仔细研究发现,现有技术中主要通过对图像按照固定大小比例进行划分,然后提取出各个区域的特征,最后将各个区域特征连接起来进行图像检索。但是由于检测算法等原因,图像中的目标对象(例如,行人)在图像中存在位置差异,如图3所示,横线为固定比例分割线,自上而下分别将每张图片分割为第一区域、第二区域和第三区域,其中,第一区域在第一幅图像和第三幅图像中主要提取为行人头部区域特征,但在第二幅图中并不是该行人头部特征,这样会导致该图像在图像库中较难检索出,从而在后续对特征进行比较时会严重影响图像检索指标。
鉴于上述问题,经发明人长期研究,提出通过下述分割的方法定位出所述目标对象的各个区域位置,详细地,如图4中的矩形框所示,在所述卷积神经网络训练完成后,以所述目标对象为行人为例,首先基于该卷积神经网络对行人的头部区域、 上半身区域、下半身区域分别提取特征,然后在提取到行人的头部区域、上身区域、下身区域的基础上,对这三个区域分别利用所述卷积神经网络提取特征,最后将这三个区域的特征连接在一起,得到多维特征。例如,提取出的头部区域的特征为100维特征,上身区域的特征为100维特征,下身区域的特征为100维特征,那么将这三个区域连接在一起即可以得到300维特征,所述300维特征即作为所述目标对象(行人)的图像特征。另外,若不存在某个区域,例如图2(c)不存在下半身区域,则将该区域特征置为零。由此,实现了特征对齐,能够有效提高图像检索精度。
步骤S250,将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出所述待分析图像的图像分析结果。
在本实施例中,可以预先通过所述全卷积网络及所述卷积神经网络对所述预存图像库中每张图像进行处理,获得所述预存图像库中每张图像对应的图像特征。
详细地,作为一种实施方式,请参阅图5,所述步骤S250可以通过如下子步骤实现:
子步骤S251,分别计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离。
本实施例中,所述预存图像库中包括有多张图像。在得到所述目标对象特征后,对所述目标对象特征与所述预存图像库中每张图像的图像特征进行比较。具体地,可以通过计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离(也称余弦相似度(Cosine Similarity)),并基于所述余弦距离来进行比较,具体计算公式为:
Figure PCTCN2018100249-appb-000005
其中,f i,f j表示图像i,j提取的特征,||·|| 2表示二范数,d(·)表示所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离。余弦距离的值的范围在[-1,1]之间,余弦距离的值值越趋近于1,代表两个特征越相近;余弦距离的值越趋近于-1,两个特征越相反;余弦距离的值接近于0,表示两个特征相关性较小,没有可比性。
通过上述公式,即可计算出计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离。
子步骤S252,基于所述余弦距离对预存图像库中每张图像进行排序,并生成排 序结果,所述排序结果为所述待分析图像的图像分析结果。
本实施例中,在计算得到所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离后,可以通过如下公式对预存图像库中每张图像进行排序:
Figure PCTCN2018100249-appb-000006
其中,n为所述排序结果的图像数量,Ω表示预存图像库。其中,n可以根据实际需求进行设置,例如,若n为3,则表明最终排序结果包括所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离为前三的图像。由此,可以在对目标对象提取出对应的特征后,得到较高的图像检索结果,进而为快速定位和查找目标对象提供有效的线索。
进一步地,如图6所示,是本申请实施例提供的用于实现所述图像分析方法的电子设备100的示意图。本实施例中,所述电子设备100可以是,但不限于,个人电脑(Personal Computer,PC)、笔记本电脑、监控设备、服务器等具备图像分析及处理能力的计算机设备。
所述电子设备100还包括图像分析装置200、存储介质110以及处理器120。本申请实施例中,图像分析装置200包括至少一个可以软件或固件(Firmware)的形式存储于所述存储介质110中或固化在所述电子设备100的操作***(Operating System,OS)中的软件功能模块。所述处理器120配置成执行所述存储介质110中存储的可执行软件模块,例如,所述图像分析装置200所包括的软件功能模块及计算机程序等。本实施例中,所述图像分析装置200也可以集成于所述操作***中,作为所述操作***的一部分。具体地,所述图像分析装置200包括:
获得模块210,配置成获得待分析图像,所述待分析图像中包括有目标对象。
分割模块220,配置成基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域。
获取模块230,配置成获取所述各个区域的最小外接几何框。
提取模块240,配置成基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个所述最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征。
比较模块250,配置成将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出所述待分析图像的图像分析结果。
可选地,在本实施例中,所述图像分析装置200还可以包括第一训练模块。
所述第一训练模块配置成配置所述全卷积网络;所述第一训练模块具体配置成接收图像样本集,所述图像样本集中包括有多个图像样本;对每个图像样本中的目标对象的各个区域进行标定,并将标定好的图像样本输入到全卷积网络进行训练,得到训练后的全卷积网络。
可选地,在本实施例中,所述图像分析装置200还可以包括第二训练模块。
所述第二训练模块配置成配置所述卷积神经网络;所述第二训练模块具体配置成接收图像样本集,所述图像样本集中包括有多个图像样本;将每个图像样本输入到卷积神经网络中使用Softmax回归函数进行训练,得到训练后的卷积神经网络。
可选地,在本实施例中,所述提取模块具体配置成将所述各最小外接几何框中的图像数据输入所述训练后的卷积神经网络模型进行处理,将从所述卷积神经网络模型最后一层获得的多个特征作为所述各最小外接几何框对应区域的特征。
可选地,在本实施例中,所述图像分析装置200还可以包括图像库特征处理模块。
所述图像库特征处理模块配置成通过所述全卷积网络及所述卷积神经网络对所述预存图像库中每张图像进行处理,获得所述预存图像库中每张图像对应的图像特征。
可选地,在本实施例中,所述获取模块230具体配置成获取所述各个区域的最小外接矩形框,或者获取所述各个区域的最小外接圆。
可选地,在本实施例中,所述比较模块250具体配置成分别计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离;基于所述余弦距离对预存图像库中每张图像进行排序,并生成排序结果,所述排序结果为所述待分析图像的图像分析结果。
可选地,在本实施例中,所述计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离的计算公式为:
Figure PCTCN2018100249-appb-000007
其中,f i,f j表示图像i,j提取的特征,||·|| 2表示二范数,d(·)表示所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离。
可选地,在本实施例中,所述基于所述余弦距离对预存图像库中每张图像进行排序,并生成排序结果的排序公式为:
Figure PCTCN2018100249-appb-000008
其中,n为所述排序结果的图像数量,Ω表示预存图像库。
可以理解的是,本实施例中的各功能模块的具体操作方法可参照上述方法实施例中相应步骤的详细描述,在此不再重复赘述。
综上所述,本申请实施例提供一种图像分析方法、装置、电子设备及可读存储介质,首先,获得待分析图像,所述待分析图像中包括有目标对象,接着基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域,并获取各个区域的最小外接几何框,然后,基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征,最后,将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出待分析图像的图像分析结果。由此,能够有效去除环境干扰,得到较精确的图像检索结果,进而为快速定位和查找目标对象提供线索。
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置和方法实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本申请的多个实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。
另外,在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。
需要说明的是,在本文中,术语"包括"、"包含"或者其任何其它变体意在涵盖非排它性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其它要素,或者是还包括为这种过程、方法、 物品或者设备所固有的要素。在没有更多限制的情况下,由语句"包括一个……"限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其它的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化囊括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。
工业实用性
本申请实施例提供一种图像分析方法、装置、电子设备及可读存储介质,首先,获得待分析图像,所述待分析图像中包括有目标对象,接着基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域,并获取各个区域的最小外接几何框,然后,基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征,最后,将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出待分析图像的图像分析结果。由此,能够有效去除环境干扰,得到较精确的图像检索结果,进而为快速定位和查找目标对象提供线索。

Claims (20)

  1. 一种图像分析方法,其特征在于,应用于电子设备,所述方法包括:
    获得待分析图像,所述待分析图像中包括有目标对象;
    基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域;
    获取所述各个区域的最小外接几何框;
    基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个所述最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征;
    将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出所述待分析图像的图像分析结果。
  2. 根据权利要求1所述的图像分析方法,其特征在于,在获得待分析图像之前,所述方法还包括:
    配置所述全卷积网络;
    所述配置所述全卷积网络的方式,包括:
    接收图像样本集,所述图像样本集中包括有多个图像样本;
    对每个图像样本中的目标对象的各个区域进行标定,并将标定好的图像样本输入到全卷积网络进行训练,得到训练后的全卷积网络。
  3. 根据权利要求1所述的图像分析方法,其特征在于,在获得待分析图像之前,所述方法还包括:
    配置所述卷积神经网络;
    所述配置所述卷积神经网络的方式,包括:
    接收图像样本集,所述图像样本集中包括有多个图像样本;
    将每个图像样本输入到卷积神经网络中使用Softmax回归函数进行训练,得到训练后的卷积神经网络。
  4. 根据权利要求3所述的图像分析方法,其特征在于,所述基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征的步骤,包括:
    将所述各最小外接几何框中的图像数据输入所述训练后的卷积神经网络模型进行处理,将从所述卷积神经网络模型最后一层获得的多个特征作为所述各最小外接几何框对应区域的特征。
  5. 根据权利要求1所述的图像分析方法,其特征在于,所述方法还包括:
    通过所述全卷积网络及所述卷积神经网络对所述预存图像库中每张图像进行处理,获得所述预存图像库中每张图像对应的图像特征。
  6. 根据权利要求1所述的图像分析方法,其特征在于,所述获取所述各个区域的最小外接几何框的方式,包括:
    获取所述各个区域的最小外接矩形框;或者
    获取所述各个区域的最小外接圆。
  7. 根据权利要求1-6中任意一项所述的图像分析方法,其特征在于,所述将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出所述待分析图像的图像分析结果,包括:
    分别计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离;
    基于所述余弦距离对预存图像库中每张图像进行排序,并生成排序结果,所述排序结果为所述待分析图像的图像分析结果。
  8. 根据权利要求7所述的图像分析方法,其特征在于,所述计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离的计算公式为:
    Figure PCTCN2018100249-appb-100001
    其中,f i,f j表示图像i,j提取的特征,||·|| 2表示二范数,d(·)表示所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离。
  9. 根据权利要求7所述的图像分析方法,其特征在于,所述基于所述余弦距离对预存图像库中每张图像进行排序,并生成排序结果的排序公式为:
    Figure PCTCN2018100249-appb-100002
    其中,n为所述排序结果的图像数量,Ω表示预存图像库。
  10. 一种图像分析装置,其特征在于,应用于电子设备,所述装置包括:
    获得模块,配置成获得待分析图像,所述待分析图像中包括有目标对象;
    分割模块,配置成基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域;
    获取模块,配置成获取所述各个区域的最小外接几何框;
    提取模块,配置成基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个所述最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征;
    比较模块,配置成将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出所述待分析图像的图像分析结果。
  11. 根据权利要求10所述的图像分析装置,其特征在于,所述装置还包括:
    第一训练模块,配置成配置所述全卷积网络;所述第一训练模块具体配置成接收图像样本集,所述图像样本集中包括有多个图像样本;对每个图像样本中的目标对象的各个区域进行标定,并将标定好的图像样本输入到全卷积网络进行训练,得到训练后的全卷积网络。
  12. 根据权利要求10所述的图像分析装置,其特征在于,所述装置还包括:
    第二训练模块,配置成配置所述卷积神经网络;所述第二训练模块具体配置成接收图像样本集,所述图像样本集中包括有多个图像样本;将每个图像样本输入到卷积神经网络中使用Softmax回归函数进行训练,得到训练后的卷积神经网络。
  13. 根据权利要求12所述的图像分析装置,其特征在于,所述提取模块具体配置成将所述各最小外接几何框中的图像数据输入所述训练后的卷积神经网络模型进行处理,将从所述卷积神经网络模型最后一层获得的多个特征作为所述各最小外接几何框对应区域的特征。
  14. 根据权利要求10所述的图像分析装置,其特征在于,所述装置还包括:
    图像库特征处理模块,配置成通过所述全卷积网络及所述卷积神经网络对所述预存图像库中每张图像进行处理,获得所述预存图像库中每张图像对应的图像特征。
  15. 根据权利要求10所述的图像分析装置,其特征在于,所述获取模块具体配置成获取所述各个区域的最小外接矩形框,或者获取所述各个区域的最小外接圆。
  16. 根据权利要求10-15中任意一项所述的图像分析装置,其特征在于,所述比较模块具体配置成分别计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离;基于所述余弦距离对预存图像库中每张图像进行排序,并生成排序结果,所述排序结果为所述待分析图像的图像分析结果。
  17. 根据权利要求16所述的图像分析装置,其特征在于,所述计算所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离的计算公式为:
    Figure PCTCN2018100249-appb-100003
    其中,f i,f j表示图像i,j提取的特征,||·|| 2表示二范数,d(·)表示所述目标对象特征与预存图像库中每张图像的图像特征之间的余弦距离。
  18. 根据权利要求16所述的图像分析装置,其特征在于,所述基于所述余弦距离对预存图像库中每张图像进行排序,并生成排序结果的排序公式为:
    Figure PCTCN2018100249-appb-100004
    其中,n为所述排序结果的图像数量,Ω表示预存图像库。
  19. 一种电子设备,其特征在于,所述电子设备包括:
    存储介质;
    处理器;以及
    图像分析装置,所述装置存储于所述存储介质中并包括由所述处理器执行的软件功能模块,所述装置包括:
    获得模块,配置成获得待分析图像,所述待分析图像中包括有目标对象;
    分割模块,配置成基于预先配置的全卷积网络从所述待分析图像中分割出所述目标对象的各个区域;
    获取模块,配置成获取所述各个区域的最小外接几何框;
    提取模块,配置成基于预先配置的卷积神经网络提取各个最小外接几何框对应区域的特征,并将每个所述最小外接几何框对应区域的特征进行连接得到所述目标对象的目标对象特征;
    比较模块,配置成将所述目标对象特征与预存图像库中每张图像的图像特征进行比较,并根据比较结果输出所述待分析图像的图像分析结果。
  20. 一种可读存储介质,其特征在于,所述可读存储介质中存储有计算机程序,所述计算机程序被执行时实现权利要求1-9中任意一项所述的图像分析方法。
PCT/CN2018/100249 2017-12-26 2018-08-13 图像分析方法、装置、电子设备及可读存储介质 WO2019128254A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18894025.8A EP3734496A4 (en) 2017-12-26 2018-08-13 IMAGE ANALYSIS APPARATUS AND METHOD, AND ELECTRONIC DEVICE AND READABLE INFORMATION MEDIA
US16/770,433 US20200402242A1 (en) 2017-12-26 2018-08-13 Image analysis method and apparatus, and electronic device and readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711428999.XA CN109960988A (zh) 2017-12-26 2017-12-26 图像分析方法、装置、电子设备及可读存储介质
CN201711428999.X 2017-12-26

Publications (1)

Publication Number Publication Date
WO2019128254A1 true WO2019128254A1 (zh) 2019-07-04

Family

ID=67021806

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/100249 WO2019128254A1 (zh) 2017-12-26 2018-08-13 图像分析方法、装置、电子设备及可读存储介质

Country Status (4)

Country Link
US (1) US20200402242A1 (zh)
EP (1) EP3734496A4 (zh)
CN (1) CN109960988A (zh)
WO (1) WO2019128254A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465904A (zh) * 2019-09-06 2021-03-09 上海晶赞融宣科技有限公司 图像目标定位方法、装置、计算机设备和存储介质
CN112991385A (zh) * 2021-02-08 2021-06-18 西安理工大学 一种基于不同度量准则的孪生网络目标跟踪方法
CN114140787A (zh) * 2021-12-09 2022-03-04 腾讯科技(深圳)有限公司 图像处理方法、装置、设备、存储介质及计算机产品

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490171B (zh) * 2019-08-26 2022-06-21 睿云联(厦门)网络通讯技术有限公司 一种危险姿态识别方法、装置、计算机设备及存储介质
CN111723724B (zh) * 2020-06-16 2024-04-02 东软睿驰汽车技术(沈阳)有限公司 一种路面障碍物识别方法和相关装置
CN112611761B (zh) * 2020-11-27 2023-03-31 常州柯柏电子科技有限公司 一种高反物体表面缺陷检测方法及***
CN112991375B (zh) * 2021-02-08 2024-01-23 上海通办信息服务有限公司 将任意形状图像区域重塑为n个矩形区域的方法及其***
CN113096170B (zh) * 2021-06-09 2022-01-25 北京世纪好未来教育科技有限公司 文本图像配准方法、装置、设备、存储介质
CN115810014B (zh) * 2023-02-07 2023-05-16 菲特(天津)检测技术有限公司 一种基于图片中子图块的电极帽端面缺陷检测方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355188A (zh) * 2015-07-13 2017-01-25 阿里巴巴集团控股有限公司 图像检测方法及装置
CN106874894A (zh) * 2017-03-28 2017-06-20 电子科技大学 一种基于区域全卷积神经网络的人体目标检测方法
WO2017101036A1 (en) * 2015-12-16 2017-06-22 Intel Corporation Fully convolutional pyramid networks for pedestrian detection

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150347860A1 (en) * 2014-05-30 2015-12-03 Apple Inc. Systems And Methods For Character Sequence Recognition With No Explicit Segmentation
CN105631413A (zh) * 2015-12-23 2016-06-01 中通服公众信息产业股份有限公司 一种基于深度学习的跨场景行人搜索方法
CN106096531B (zh) * 2016-05-31 2019-06-14 安徽省云力信息技术有限公司 一种基于深度学习的交通图像多类型车辆检测方法
CN106204646A (zh) * 2016-07-01 2016-12-07 湖南源信光电科技有限公司 基于bp神经网络的多运动目标跟踪方法
CN106934396A (zh) * 2017-03-09 2017-07-07 深圳市捷顺科技实业股份有限公司 一种车牌检索方法及***
CN107194341B (zh) * 2017-05-16 2020-04-21 西安电子科技大学 Maxout多卷积神经网络融合人脸识别方法和***

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355188A (zh) * 2015-07-13 2017-01-25 阿里巴巴集团控股有限公司 图像检测方法及装置
WO2017101036A1 (en) * 2015-12-16 2017-06-22 Intel Corporation Fully convolutional pyramid networks for pedestrian detection
CN106874894A (zh) * 2017-03-28 2017-06-20 电子科技大学 一种基于区域全卷积神经网络的人体目标检测方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3734496A4

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465904A (zh) * 2019-09-06 2021-03-09 上海晶赞融宣科技有限公司 图像目标定位方法、装置、计算机设备和存储介质
CN112991385A (zh) * 2021-02-08 2021-06-18 西安理工大学 一种基于不同度量准则的孪生网络目标跟踪方法
CN112991385B (zh) * 2021-02-08 2023-04-28 西安理工大学 一种基于不同度量准则的孪生网络目标跟踪方法
CN114140787A (zh) * 2021-12-09 2022-03-04 腾讯科技(深圳)有限公司 图像处理方法、装置、设备、存储介质及计算机产品

Also Published As

Publication number Publication date
EP3734496A4 (en) 2021-11-24
CN109960988A (zh) 2019-07-02
US20200402242A1 (en) 2020-12-24
EP3734496A1 (en) 2020-11-04

Similar Documents

Publication Publication Date Title
WO2019128254A1 (zh) 图像分析方法、装置、电子设备及可读存储介质
CN109284729B (zh) 基于视频获取人脸识别模型训练数据的方法、装置和介质
WO2017190656A1 (zh) 行人再识别方法和装置
US9367766B2 (en) Text line detection in images
US10573018B2 (en) Three dimensional scene reconstruction based on contextual analysis
US10373380B2 (en) 3-dimensional scene analysis for augmented reality operations
US9142011B2 (en) Shadow detection method and device
CN107833213B (zh) 一种基于伪真值自适应法的弱监督物体检测方法
CN107392141B (zh) 一种基于显著性检测和lsd直线检测的机场提取方法
WO2019071664A1 (zh) 结合深度信息的人脸识别方法、装置及存储介质
CN108734185B (zh) 图像校验方法和装置
US9639943B1 (en) Scanning of a handheld object for 3-dimensional reconstruction
CN105956059A (zh) 基于情绪识别的信息推荐方法和装置
US9613266B2 (en) Complex background-oriented optical character recognition method and device
WO2017181892A1 (zh) 前景分割方法及装置
WO2019136897A1 (zh) 图像处理方法、装置、电子设备及存储介质
US10042899B2 (en) Automatic registration
US8989505B2 (en) Distance metric for image comparison
CN112200056A (zh) 人脸活体检测方法、装置、电子设备及存储介质
Lahiani et al. Hand pose estimation system based on Viola-Jones algorithm for android devices
CN110675442A (zh) 一种结合目标识别技术的局部立体匹配方法及***
CN112686122A (zh) 人体及影子的检测方法、装置、电子设备、存储介质
CN112084365A (zh) 基于OpenCV和CUDA加速的网络相机的实时图像检索方法
US11080286B2 (en) Method and system for merging multiple point cloud scans
KR101357581B1 (ko) 깊이 정보 기반 사람의 피부 영역 검출 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18894025

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018894025

Country of ref document: EP

Effective date: 20200727