CN110020624B

CN110020624B - Image recognition method, terminal device and storage medium

Info

Publication number: CN110020624B
Application number: CN201910276528.4A
Authority: CN
Inventors: 郝嘉星; 杨森; 席雷平; 卞建鹏; 杨亚波
Original assignee: Shijiazhuang Tiedao University; Army Engineering University of PLA
Current assignee: Shijiazhuang Tiedao University; Army Engineering University of PLA
Priority date: 2019-04-08
Filing date: 2019-04-08
Publication date: 2023-04-18
Anticipated expiration: 2039-04-08
Also published as: CN110020624A

Abstract

The application is applicable to the technical field of image processing, and provides an image identification method, terminal equipment and a storage medium, wherein the method comprises the following steps: acquiring a first image sequence; the first image sequence comprises a plurality of continuous first image frames in a preset time period, which are acquired by a first camera device; extracting corresponding target features according to the first image sequence; and identifying a target corresponding to the first image sequence according to the target characteristic. According to the image identification method, the terminal device and the storage medium, the target identification and search of the image sequence are realized by extracting the target characteristics corresponding to the image sequence, manual work can be abandoned, full-automatic image identification is realized, and the problem that the target search work efficiency is low in the current image monitoring or video monitoring is solved.

Description

Image recognition method, terminal device and storage medium

Technical Field

The present application belongs to the field of image processing technologies, and in particular, to an image recognition method, a terminal device, and a storage medium.

Background

Image monitoring or video monitoring is currently the most commonly used monitoring means. At present, the target search of image monitoring or video monitoring is mostly performed in a manual search mode, that is, the required target is judged and identified by continuously observing the monitor image through human eyes. For example, in the field of unmanned aerial vehicle reconnaissance, a common monitoring means depends on manually checking images or videos acquired and transmitted back by an unmanned aerial vehicle. Because the work efficiency of manually searching the target is lower, the reconnaissance efficiency of the image or video monitoring equipment such as the unmanned aerial vehicle is greatly reduced.

Disclosure of Invention

In view of this, embodiments of the present application provide an image recognition method, a terminal device, and a storage medium, so as to solve the problem of low target search efficiency in current image monitoring or video monitoring.

According to a first aspect, an embodiment of the present application provides an image recognition method, including: acquiring a first image sequence; the first image sequence comprises a plurality of continuous first image frames in a preset time period, which are acquired by a first camera device; extracting corresponding target features according to the first image sequence; and identifying the target corresponding to the first image sequence according to the target characteristic.

According to the image identification method provided by the embodiment of the application, the target identification and search of the image sequence are realized by extracting the target characteristics corresponding to the image sequence, manual work can be abandoned, full-automatic image identification is realized, and the problem that the target search work efficiency is low in the current image monitoring or video monitoring is solved.

With reference to the first aspect, in some embodiments of the present application, the extracting corresponding target features from the first image sequence includes: selecting any first image frame in the first image sequence as a target image frame; selecting any pixel except the edge pixel in the target image frame as a target pixel; determining whether the target pixel is a target feature; wherein the determining whether the target pixel is a target feature comprises: extracting a contrast pixel corresponding to the target pixel according to the target pixel and the first image sequence; judging whether the target pixel belongs to the target feature or not according to the target pixel and the comparison pixel; and when the target pixel is judged to belong to the target feature according to the target pixel and the comparison pixel, adding the target pixel into the target feature.

According to the image identification method provided by the embodiment of the application, the pixels except the edge pixels in the target image frame are compared with the corresponding comparison pixels, and the target features in the target image frame are extracted. The target features may be used for image target recognition in subsequent steps.

With reference to the first aspect, in some embodiments of the present application, the extracting corresponding target features from the first image sequence further includes: judging whether all pixels except the edge pixels in the target image frame are used as target pixels or not; when all pixels except the edge pixels in the target image frame are not taken as target pixels, repeatedly executing the step of selecting any pixel except the edge pixels in the target image frame as the target pixel; and determining whether the target pixel is the target feature until all pixels except the edge pixel in the target image frame are used as the target pixel.

According to the image identification method provided by the embodiment of the application, the pixels except the edge pixels in the target image frame are respectively compared with the corresponding comparison pixels, and the target features in the target image frame are extracted. Because the traversal mode is adopted when the target features are extracted, all pixels except edge pixels in the target image frame are checked one by one, the integrity of the target features can be ensured, the target features are prevented from being omitted, and the accuracy and the reliability of the subsequent steps for identifying the image sequence according to the target features are improved.

With reference to the first aspect, in some embodiments of the present application, the extracting, according to the target pixel and the first image sequence, a contrast pixel corresponding to the target pixel includes: extracting each pixel adjacent to the target pixel in the target image frame, and adding each pixel adjacent to the target pixel in the target image frame into the comparison pixel; extracting any two first image frames in the first image sequence except the target image frame; and respectively extracting corresponding pixels of the target pixel in any two first image frames and each pixel adjacent to the corresponding pixel, and adding the corresponding pixel and each pixel adjacent to the corresponding pixel into the comparison pixel.

According to the image identification method provided by the embodiment of the application, the contrast pixels corresponding to the target pixels are collected in the target image frame and other image frames in the image sequence, and the image sequence can be fully utilized to extract the target characteristics. Because the image sequence consists of a plurality of continuous images, the target characteristics extracted by the image identification method provided by the embodiment of the application can contain more abundant characteristic information compared with the target diagnosis based on single image frame extraction, and the accuracy and the reliability of image identification are favorably improved.

With reference to the first aspect, in some embodiments of the present application, the determining whether the target pixel belongs to the target feature according to the target pixel and the comparison pixel includes: selecting a target pixel with the first gray value larger than the second gray value, and adding target characteristics; the first gray value is a gray value of a target pixel, and the second gray value is a maximum value of gray values of contrast pixels corresponding to the target pixel.

According to the image identification method provided by the embodiment of the application, the gray value of the target pixel is compared with the maximum value in the gray values of the contrast pixels, so that the target feature in the target pixel is screened out, and data support is provided for image identification by using the target feature in the subsequent steps.

With reference to the first aspect, in some embodiments of the present application, after the identifying the target corresponding to the first image sequence according to the target feature, the image identification method further includes: the spatial coordinates of the target are calculated.

The image identification method provided by the embodiment of the application is not limited to identification of the target in the image, and the space coordinate of the target in the image can be obtained through calculation, so that convenience is provided for overhaul or maintenance of the target in the image.

With reference to the first aspect, in some embodiments of the present application, the calculating spatial coordinates of the target includes: acquiring a second image sequence; the second image sequence comprises a plurality of continuous second image frames within a preset time period and acquired by a second camera device; extracting first surface coordinates of the target and second surface coordinates of the target; the first face coordinates of the target are face coordinates of the target in any first image frame in the first image sequence; the second surface coordinates of the target are surface coordinates of the target in a corresponding second image frame in the second image sequence; and calculating the space coordinate of the target according to the first surface coordinate and the second surface coordinate.

According to the image identification method provided by the embodiment of the application, the two groups of image sequences are respectively collected through the two camera devices, and then the space coordinates of the target in the image are calculated by utilizing the binocular vision principle, so that the image identification method has the advantages of accuracy and quickness in calculation and the like.

According to a second aspect, an embodiment of the present application provides a terminal device, including: the image acquisition unit is used for acquiring a first image sequence; the first image sequence comprises a plurality of continuous first image frames within a preset time period and acquired by a first camera device; the characteristic extraction unit is used for extracting corresponding target characteristics according to the first image sequence; and the characteristic identification unit is used for identifying the target corresponding to the first image sequence according to the target characteristic.

According to a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method according to the first aspect or any implementation manner of the first aspect.

According to a fourth aspect, embodiments of the present application provide a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method according to the first aspect or any implementation manner of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present application;

fig. 2 is a schematic flowchart of a specific example of an image recognition method provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of another specific example of an image recognition method provided in an embodiment of the present application;

FIG. 4 is a diagram of one specific example of a target pixel and its corresponding contrast pixel;

fig. 5 is a schematic structural diagram of a specific example of a terminal device provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of another specific example of the terminal device provided in the embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

Fig. 1 is a schematic view of an application scenario in an embodiment of the present application. In fig. 1, an unmanned aerial vehicle 100 carries an imaging device 101 to perform aerial reconnaissance. The drone 100 may send videos or images it takes to the server 200 in real time during the course of performing an aerial reconnaissance mission; or after the return journey, when the video or image shot by the drone 100 needs to be analyzed, the video or image shot by the drone 100 may be sent to the server 200.

In some embodiments, as shown in fig. 2, the server 200 may process the video or images captured by the drone 100 to identify objects therein by:

step S101: a first sequence of images is acquired. Specifically, the first image sequence includes a plurality of first image frames which are acquired by the first camera within a preset time period. When the camera device 101 carried by the unmanned aerial vehicle 100 is a binocular camera, any one of the cameras can be used as a first camera device, correspondingly, the image collected by the camera is a first image, and the video collected by the camera is a first video. Correspondingly, another camera on the binocular camera can be used as a second camera device, correspondingly, the image collected by the camera is a second image, and the video collected by the camera is a second video.

The first camera device may capture a plurality of temporally consecutive first images in a continuous shooting manner, and the plurality of temporally consecutive first images may form a first image sequence. After the server 200 acquires the first image sequence, the first image sequence may be processed to identify an object therein.

The first camera device can also shoot a first video and decompose the first video into a plurality of corresponding first images which are continuous in time, so as to obtain a first image sequence corresponding to the first video. After the server 200 acquires the first image sequence, the first image sequence may be processed to identify an object therein. In addition, the server 200 may also directly acquire a first video captured by the first camera, and the server 200 decomposes the first video to obtain a corresponding first image sequence.

Step S102: and extracting corresponding target features according to the first image sequence. In a specific embodiment, as shown in fig. 3, the process of step S102 can be implemented by the following several sub-steps:

step S1021: and selecting any first image frame in the first image sequence as a target image frame. Because each first image frame in the first image sequence is a plurality of continuous image frames, each first image frame in the first image sequence has approximately the same data content, when a target image frame is selected and a target feature is extracted through the target image frame, any first image frame in the first image sequence can be arbitrarily determined to be the target image frame, and the target feature extraction cannot be greatly influenced.

Step S1022: and selecting any pixel except the edge pixel in the target image frame as a target pixel.

Step S1023: it is determined whether the target pixel is the target feature. Specifically, the comparison pixel corresponding to the target pixel may be extracted first according to the target pixel and the first image sequence. And secondly, judging whether the target pixel belongs to the target feature or not according to the target pixel and the comparison pixel. When the target pixel is judged to belong to the target feature according to the target pixel and the comparison pixel, executing step S1024; when it is determined that the target pixel does not belong to the target feature based on the target pixel and the comparison pixel, no operation is performed.

Step S1024: adding the target pixel to the target feature.

Step S1025: and judging whether all pixels except the edge pixels in the target image frame are taken as target pixels or not. When the pixels in the target image frame except the edge pixels are not taken as the target pixels, the step S1022 is returned to, so as to repeatedly perform the steps of selecting any one of the pixels in the target image frame except the edge pixels as the target pixels and determining whether the target pixels are the target features, until the pixels in the target image frame except the edge pixels are taken as the target pixels.

In practical applications, when the pixel corresponding to the x in fig. 4 is selected as the target pixel, the comparison pixels corresponding to the target pixel are the pixels corresponding to the o in fig. 4. For extracting the contrast pixel corresponding to the target pixel, a specific method may be:

and extracting each pixel adjacent to the target pixel in the target image frame, and adding each pixel adjacent to the target pixel in the target image frame into the comparison pixel.

And extracting any two first image frames in the first image sequence except the target image frame. In practical application, two first image frames adjacent to a target image frame in a first image sequence can be selected; two first image frames except for the target image frame may also be arbitrarily selected from the first image sequence, which is not limited in the embodiment of the present application.

And respectively extracting corresponding pixels of the target pixel in the two first image frames and each pixel adjacent to the corresponding pixel, and adding the corresponding pixel and each pixel adjacent to the corresponding pixel into the contrast pixel. When two first image frames adjacent to the target image frame in the first image sequence are selected, corresponding pixels of the target pixel can be respectively extracted from the two first image frames adjacent to the target image frame; extracting each pixel adjacent to the corresponding pixel from two first image frames adjacent to the target image frame; finally, the corresponding pixel and each pixel adjacent to the corresponding pixel are added to the contrast pixel.

After extracting each target pixel and its corresponding contrast pixel in the target image frame, the maximum value of the gray value in the contrast pixel corresponding to each target pixel may be further calculated, and the maximum value of the gray value is used as the second gray value. The gradation value of each target pixel is set to a first gradation value. And when the first gray value is larger than the corresponding second gray value, adding the target pixel corresponding to the first gray value into the target feature.

For a specific target pixel, respectively comparing whether the gray value of the target pixel is greater than the gray value of each corresponding comparison pixel; and when the gray value of the target pixel is greater than the gray values of the corresponding contrast pixels, adding the target pixel corresponding to the first gray value into the target feature.

Step S103: and identifying the target corresponding to the first image sequence according to the target characteristic. In a specific embodiment, the target features may be classified by a deep learning CapsNet model, so as to identify a target corresponding to the first image sequence.

Before the deep learning CapsNet model is used for classifying the target features, the deep learning CapsNet model needs to be trained, and the deep learning CapsNet model can be trained by using a training set. The training set comprises a plurality of target-specific or category-specific reference images, the input size of which is 28 × 28.Primary caps are the vector neuron layers of the CapsNet. Digitcaps are obtained by Primarycaps through a dynamic routing algorithm and are composed of 10 vectors, the 10 vectors are subjected to modulus calculation, and the vector with the maximum modulus value is classified with the maximum probability of the pictures. The capsNet neuron model is changed into a vector, so that the influence on a target image caused by different angles, light intensity, distances and other problems can be well solved, the output layer and each scroll base layer are connected, the problem that the recognition probability of a target object is increased only by two types instead of two types in the prior art is solved, and the object is conveniently classified according to what percentage of the probability belongs to. After the feature values of the reference image are extracted by the method described in step S102, the network layer of the network is extracted by using the feature values, and then the model full-link layer is added to continue training the network until the training target is reached. The number of training iterations may be chosen to be 3000 and the learning rate 0.05. And after the training is finished, testing the deep learning CapsNet model by using a test set. Through testing, the image recognition rate of the trained deep learning CapsNet model can reach more than 95%.

In practical applications, when the target corresponding to the first image sequence is not identified according to the target feature and the deep learning CapsNet model, step S102 may be repeatedly performed on the first image sequence to re-extract the target feature of the first image sequence, and the target corresponding to the first image sequence is identified again according to the re-extracted target feature and the deep learning CapsNet model. If the object corresponding to the first image sequence is not identified again, the first image sequence is considered not to contain the object to be identified, and the first image sequence can be stored in a database for use in subsequent checking or expanding the type of object identification.

Specifically, when the target feature of the first image sequence is re-extracted, the target image frame needs to be replaced, so as to avoid annihilation of the corresponding target feature due to the target image frame itself, for example, poor definition.

Furthermore, when the target corresponding to the first image sequence is not identified according to the target feature and the deep learning CapsNet model, step S102 may be further performed on the second image sequence to extract the target feature of the second image sequence, and identify the target corresponding to the second image sequence according to the target feature of the second image sequence and the deep learning CapsNet model. Because the first image sequence and the second image sequence are acquired by the binocular camera at the same time, the first image sequence and the second image sequence have approximately the same content, and the target feature of the second image sequence can be identified when the target feature of the first image sequence fails to be identified, so that missing detection is avoided.

Optionally, as shown in fig. 3, the following steps may be added after step S103:

step S104: spatial coordinates of the target are calculated. In a specific embodiment, the process of step S104 can be implemented by the following sub-steps:

step S1041: a second sequence of images is acquired. Specifically, the second image sequence includes a plurality of second image frames which are acquired by the second camera within a preset time period.

Step S1042: first face coordinates of the object and second face coordinates of the object are extracted. Specifically, the first face coordinates of the target are face coordinates of the target in any first image frame in the first image sequence; the second face coordinates of the object are face coordinates of the object in a corresponding second image frame in the second image sequence. The first image frame for extracting the first face coordinates and the second image frame for extracting the second face coordinates should be such that the pictures taken by the binocular camera at the same time.

Step S1043: and calculating the space coordinates of the target according to the first surface coordinates and the second surface coordinates. In practical application, the left camera O-XYZ can be arranged at the origin of a world coordinate system and is not rotated, and the image coordinate system is O _l -X ₁ Y ₁ Effective focal length of f _l (ii) a Coordinate system of right camera O _r XYZ, image coordinate system O _r -X _r Y _r Effective focal length of f _r . From the projection model of the camera, the following relationship can be obtained:

O-XYZ coordinate system and O _r -X _r Y _r Z _r The positional relationship between the coordinate systems may be transformed by a spatial transformation matrix M _lr Expressed as:

similarly, for spatial points in the O-XYZ coordinate system, the correspondence between two camera face points can be expressed as:

the spatial point three-dimensional coordinates can then be expressed as:

therefore, the left and right computer internal parameters/focal length f can be obtained by the computer calibration technology _r And f _l And the image coordinates of the space points in the left camera and the right camera can reconstruct the three-dimensional space coordinates of the measured point.

A position detection device may be provided on the drone 100 to acquire the GPS coordinates of the drone 100. The GPS coordinates of the target in the first image sequence may be further calculated from the GPS coordinates of the drone 100 and the spatial coordinates of the target in the first image sequence with respect to the drone 100 or the camera 101 carried by the drone 100.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.

An embodiment of the present application further provides a terminal device, as shown in fig. 5, where the terminal device may include: an image acquisition unit 501, a feature extraction unit 502, and a feature recognition unit 503.

The image acquisition unit 501 is configured to acquire a first image sequence, where the first image sequence includes a plurality of consecutive first image frames within a preset time period acquired by a first camera; the specific working process can be referred to as step S101 in the above method embodiment.

The feature extraction unit 502 is configured to extract a corresponding target feature according to the first image sequence; the specific working process of the method can be referred to as step S102 in the above method embodiment.

The feature recognition unit 503 is configured to recognize a target corresponding to the first image sequence according to the target feature; the specific working process can be referred to step S103 in the above method embodiment.

Alternatively, the terminal device shown in fig. 5 may be additionally provided with the coordinate calculation unit 504. The coordinate calculation unit 504 is used to calculate the spatial coordinates of the target, and the specific working process thereof can be referred to as that described in step S104 in the above method embodiment.

Fig. 6 is a schematic diagram of another terminal device provided in an embodiment of the present application. As shown in fig. 6, the terminal device 600 of this embodiment includes: a processor 601, a memory 602 and a computer program 603, such as an image recognition method program, stored in said memory 602 and executable on said processor 601. The processor 601, when executing the computer program 603, implements the steps in the various embodiments of the image recognition method described above, such as the steps S101 to S103 shown in fig. 2. Alternatively, the processor 601, when executing the computer program 603, implements the functions of each module/unit in each apparatus embodiment described above, such as the functions of the image acquisition unit 501, the feature extraction unit 502, and the feature identification unit 503 shown in fig. 5.

The computer program 603 may be partitioned into one or more modules/units that are stored in the memory 602 and executed by the processor 601 to complete the application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 603 in the terminal device 600. For example, the computer program 603 may be partitioned into a synchronization module, a summarization module, an acquisition module, a return module (a module in a virtual device).

The terminal device 600 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 601, a memory 602. Those skilled in the art will appreciate that fig. 6 is merely an example of a terminal device 600 and does not constitute a limitation of terminal device 600 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

Processor 601 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 602 may be an internal storage unit of the terminal device 600, such as a hard disk or a memory of the terminal device 600. The memory 602 may also be an external storage device of the terminal device 600, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 600. Further, the memory 602 may also include both an internal storage unit and an external storage device of the terminal device 600. The memory 602 is used for storing the computer programs and other programs and data required by the terminal device. The memory 602 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one type of logical function division, and other division manners may be available in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present application, and they should be construed as being included in the present application.

Claims

1. An image recognition method, comprising:

acquiring a first image sequence; the first image sequence comprises a plurality of first image frames which are collected by a first camera device and are continuous in time within a preset time period; the first image sequence further comprises shooting a first video through the first camera and decomposing the first video into a plurality of corresponding temporally continuous first images;

extracting corresponding target features according to the first image sequence which is continuous in time;

identifying a target corresponding to the first image sequence according to the target feature;

the identifying the target corresponding to the first image sequence according to the target feature includes:

training a deep learning CapsNet model by utilizing a training set, wherein the training set comprises a plurality of reference images determined by targets or categories; after the characteristic value of the reference image is extracted, extracting the network layer of the deep learning CapsNet model by using the characteristic value of the reference image, and then adding the full-connection layer of the deep learning CapsNet model to continue training the network of the deep learning CapsNet model;

inputting the first image sequence into a trained deep learning CapsNet model, classifying the target features through the trained deep learning CapsNet model, and identifying a target corresponding to the first image sequence;

when the target corresponding to the first image sequence is not identified according to the target features and the deep learning CapsNet model, re-extracting the target features of the first image sequence, and identifying the target corresponding to the first image sequence according to the re-extracted target features and the deep learning CapsNet model; when the target feature of the first image sequence is re-extracted, the target image frame needs to be replaced.

2. The image recognition method of claim 1, wherein the extracting corresponding target features from the first image sequence comprises:

selecting any first image frame in the first image sequence as a target image frame;

selecting any pixel except the edge pixel in the target image frame as a target pixel;

determining whether the target pixel is a target feature;

wherein the determining whether the target pixel is a target feature comprises:

extracting a contrast pixel corresponding to the target pixel according to the target pixel and the first image sequence;

judging whether the target pixel belongs to the target feature or not according to the target pixel and the comparison pixel;

and when the target pixel is judged to belong to the target feature according to the target pixel and the comparison pixel, adding the target pixel into the target feature.

3. The image recognition method of claim 2, wherein the extracting corresponding target features from the first sequence of images further comprises:

judging whether all pixels except the edge pixels in the target image frame are used as target pixels or not;

when all pixels except the edge pixels in the target image frame are not taken as target pixels, repeatedly executing the step of selecting any pixel except the edge pixels in the target image frame as the target pixel; and determining whether the target pixel is the target feature until all pixels except the edge pixel in the target image frame are used as the target pixel.

4. The image recognition method according to claim 2 or 3, wherein the extracting of the contrast pixel corresponding to the target pixel from the target pixel and the first image sequence comprises:

extracting each pixel adjacent to the target pixel in the target image frame, and adding each pixel adjacent to the target pixel in the target image frame into the comparison pixel;

extracting any two first image frames in the first image sequence except the target image frame;

and respectively extracting corresponding pixels of the target pixel in any two first image frames and each pixel adjacent to the corresponding pixel, and adding the corresponding pixel and each pixel adjacent to the corresponding pixel into the comparison pixel.

5. The image recognition method of claim 4, wherein the determining whether the target pixel belongs to the target feature according to the target pixel and the comparison pixel comprises:

selecting a target pixel with the first gray value larger than the second gray value, and adding target characteristics; the first gray value is a gray value of a target pixel, and the second gray value is a maximum value of gray values of contrast pixels corresponding to the target pixel.

6. The image recognition method of claim 1, wherein after the recognition of the object corresponding to the first image sequence according to the object feature, the image recognition method further comprises:

when the target feature of the first image sequence fails to be identified, identifying the target feature of the second image sequence, and calculating the space coordinate of the target;

the calculating the spatial coordinates of the target comprises:

acquiring a second image sequence; the second image sequence comprises a plurality of continuous second image frames in a preset time period, which are acquired by a second camera device;

extracting a first face coordinate of the target and a second face coordinate of the target; the first face coordinates of the target are face coordinates of the target in any first image frame in the first image sequence; second face coordinates of the target are face coordinates of the target in a corresponding second image frame in the second sequence of images;

and calculating the space coordinate of the target according to the first surface coordinate and the second surface coordinate.

7. A terminal device, comprising:

the image acquisition unit is used for acquiring a first image sequence; the first image sequence comprises a plurality of first image frames which are collected by a first camera device and are continuous in time within a preset time period; the first image sequence further comprises a first video captured by the first camera and decomposed into a corresponding plurality of temporally consecutive first images;

the feature extraction unit is used for extracting corresponding target features according to the first image sequence which is continuous in time;

the characteristic identification unit is used for identifying a target corresponding to the first image sequence according to the target characteristic;

training a deep learning CapsNet model by using a training set, wherein the training set comprises a plurality of reference images determined by targets or categories; after the characteristic value of the reference image is extracted, extracting the network layer of the deep learning CapsNet model by using the characteristic value of the reference image, and then adding the full-connection layer of the deep learning CapsNet model to continue training the network of the deep learning CapsNet model;

when the target corresponding to the first image sequence is not identified according to the target feature and the deep learning CapsNet model, re-extracting the target feature of the first image sequence, and identifying the target corresponding to the first image sequence according to the re-extracted target feature and the deep learning CapsNet model; when the target feature of the first image sequence is re-extracted, the target image frame needs to be replaced.

8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method according to any one of claims 1 to 6.