CN109325405A - A kind of mask method of lens type, device and equipment - Google Patents

A kind of mask method of lens type, device and equipment Download PDF

Info

Publication number
CN109325405A
CN109325405A CN201810910154.2A CN201810910154A CN109325405A CN 109325405 A CN109325405 A CN 109325405A CN 201810910154 A CN201810910154 A CN 201810910154A CN 109325405 A CN109325405 A CN 109325405A
Authority
CN
China
Prior art keywords
frame image
image
candidate object
lens type
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810910154.2A
Other languages
Chinese (zh)
Inventor
刘思阳
冯巍
蒋紫东
冯忠伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201810910154.2A priority Critical patent/CN109325405A/en
Publication of CN109325405A publication Critical patent/CN109325405A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a kind of mask method of lens type, device and equipment, which comprises carries out pumping frame, delta frame image according to default frame-skipping parameter to video to be clipped;Every frame image is input to convolutional neural networks to handle, obtains the area image of the candidate object of reference in every frame image;Extract the characteristic point of the area image of the candidate object of reference in every frame image;The lens type of current frame image is identified according to the characteristic distance of the characteristic point of the area image of candidate object of reference;The lens type is marked on the current frame image.The present invention is in the post-production of video, for video source material according to the lens type of the characteristic distance identification current frame image of the characteristic point of candidate object of reference, and lens type mark is marked in current frame image, eliminate a large amount of artificial mark work in post-production, facilitate camera lens required for later period editing personnel's quick-searching, substantially shorten the post-production duration, reduces cost of manufacture, improve post-production efficiency.

Description

A kind of mask method of lens type, device and equipment
Technical field
The present invention relates to technical field of image processing, more particularly to a kind of mask method of lens type, device and set It is standby.
Background technique
With the rapid development of production of film and TV technology, a very important responsibility, camera lens have been undertaken in post-production again Production.Video program (such as telecine etc.) is in recording process because of different seats in the plane, different shooting angles, different scapes etc. Factor generates a large amount of video source material, and common process flow is that primary editor can carry out these video source materials " just Cut ", so-called " preliminary shearing " is exactly to carry out primary editing to video source material to cut part by primary editor's fast browsing one time Useless video clip, and remaining segment is added into label, so that advanced editor carries out " fine pruning " below: Same Scene Multiple segments put together editing, the information of different moods, transmitting multiplicity can be thus expressed with language of lens.
Currently, may be simultaneously present dozens or even hundreds of seat in the plane during performance recording while shooting, shoot There may be up to a hundred hours video source materials for one hour program, and editor primary in this way will be to the video of a hours up to a hundred Editing is carried out, the time of primary editor is not only wasted, also reduces the working efficiency of primary editor.
Therefore, the bulk pruning time of later period video production how is reduced, improving working efficiency is to have technology to be solved at present Problem.
Summary of the invention
The embodiment of the present invention is existing to solve the technical problem to be solved is that providing a kind of mask method of lens type In technology due in later period video production the preliminary shearing time it is long, lead to the at high cost of editing, the technical issues of low efficiency.
Correspondingly, the embodiment of the invention also provides a kind of annotation equipment of lens type and equipment, it is above-mentioned to guarantee The realization and application of method.
To solve the above-mentioned problems, the present invention is achieved through the following technical solutions:
First aspect provides a kind of mask method of lens type, comprising:
Pumping frame, delta frame image are carried out according to default frame-skipping parameter to video to be clipped;
Every frame image is input to convolutional neural networks to handle, obtains the region of the candidate object of reference in every frame image Image;
Extract the characteristic point of the area image of the candidate object of reference in every frame image;
According to the shot cluster of the characteristic distance identification current frame image of the characteristic point of the area image of the candidate object of reference Type;
The lens type is marked on the current frame image.
Optionally, described that every frame image is passed through into convolutional neural networks, obtain the area of the candidate object of reference in every frame image Area image, comprising:
Every frame image is input to the first convolutional neural networks and carries out candidate region image roughing processing, obtains candidate reference The location information of object;
By the image in each candidate outlined region of reference position information be input to the second convolutional neural networks into The selected processing of row candidate's object of reference, obtains the area image of candidate object of reference.
Optionally, the characteristic point of the area image for extracting the candidate object of reference in every frame image includes:
The area image of the candidate object of reference in every frame image is input to third convolutional neural networks according to setting Object of reference reference point carries out feature extraction, obtains multiple characteristic points of candidate object of reference.
Optionally, the characteristic distance of the characteristic point according to the candidate object of reference determines the shot cluster of current frame image Type, comprising:
Calculate the characteristic distance of multiple characteristic points of the candidate object of reference;
Determine the ratio of the characteristic distance Yu image frame height;
The ratio is compared with default lens type parameter area;
The lens type in current frame image is identified according to comparison result.
Optionally, when the obtained candidate object of reference has multiple, the multiple spies for calculating the candidate object of reference Levying the characteristic distance put includes:
According to video type, the characteristic distance of multiple characteristic points of each candidate object of reference is calculated separately;
Maximum characteristic distance is chosen from the characteristic distance or calculates the average characteristics distance of all characteristic distances;
Determine the ratio of the maximum characteristic distance or average characteristics distance and image frame height;
The ratio is compared with default lens type parameter area;
The lens type in current frame image is determined according to comparison result.
Second aspect provides a kind of annotation equipment of lens type, comprising:
Generation module, for carrying out pumping frame, delta frame image according to default frame-skipping parameter to video to be clipped;
Processing module is handled for every frame image to be input to convolutional neural networks, obtains the time in every frame image Select the area image of object of reference;
Extraction module, the characteristic point of the area image for extracting the candidate object of reference in every frame image;
Identification module, the characteristic distance for the characteristic point according to the area image of the candidate object of reference identify present frame The lens type of image;
Labeling module, for marking the lens type on the current frame image.
Optionally, the processing module includes:
Roughing processing module carries out the roughing of candidate region image for every frame image to be input to the first convolutional neural networks Processing, obtains the location information of candidate object of reference;
Selected processing module, for the image in each candidate outlined region of reference position information to be input to the Two convolutional neural networks carry out the selected processing of candidate object of reference, obtain the area image of candidate object of reference.
Optionally, the extraction module, specifically for the area image of the candidate object of reference in every frame image is defeated Enter to third convolutional neural networks and carry out feature extraction according to setting object of reference reference point, obtains multiple features of candidate object of reference Point.
Optionally, the identification module includes:
First computing module, the characteristic distance of the characteristic point for calculating the candidate object of reference;
First determining module, for determining the ratio of the characteristic distance that the computing module calculates and image frame height;
First comparison module, for the ratio to be compared with default lens type parameter area;
First identification module identifies the corresponding camera lens of current frame image for the comparison result according to the comparison module Type.
Optionally, when the candidate object of reference that the extraction module obtains has multiple, the identification module includes:
Second computing module, for calculating separately the spy of multiple characteristic points of each candidate object of reference according to video type Levy distance;
Module is chosen, for choosing maximum characteristic distance from the characteristic distance or calculating the flat of all characteristic distances Equal characteristic distance;
Second determining module, for determining the maximum characteristic distance or average characteristics distance and current frame image height Ratio;
Second comparison module, for the ratio to be compared with default lens type parameter area;
Second identification module, for identifying the lens type in current frame image according to comparison result.
The third aspect provides a kind of network equipment, comprising: memory, processor and is stored on the memory and can be The computer program run on the processor realizes such as claim 1 when the computer program is executed by the processor To lens type described in any one of 5 mask method the step of.
Fourth aspect provides a kind of computer readable storage medium, and calculating is stored on the computer readable storage medium Machine program, the lens type as described in any one of claims 1 to 5 is realized when the computer program is executed by processor Step in mask method
Compared with prior art, the embodiment of the present invention includes following advantages:
In the embodiment of the present invention, pumping frame first carried out according to default frame-skipping parameter to video to be clipped, delta frame image, so Afterwards, every frame image is input to convolutional neural networks to handle, obtains the area image of the candidate object of reference in every frame image; Extract the characteristic point of the area image of the candidate object of reference in every frame image;Finally, according to the administrative division map of the candidate object of reference The lens type of the characteristic distance identification current frame image of the characteristic point of picture.In other words, in the present embodiment, in the later period of video In production, current frame image is identified according to the characteristic distance of the characteristic point of the area image of candidate object of reference for video source material Lens type, and mark lens type mark in current frame image, eliminate in post-production a large amount of artificial mark work Make, facilitate camera lens required for later period editing personnel's quick-searching, substantially shortens the post-production duration, drop to a certain extent Low cost of manufacture improves post-production efficiency.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The application can be limited.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the mask method of lens type provided in an embodiment of the present invention;
Fig. 2 is the schematic diagram that a kind of medium shot provided in an embodiment of the present invention identifies sample;
Fig. 3 is the schematic diagram that a kind of close-up shot provided in an embodiment of the present invention identifies sample;
A kind of Fig. 4 schematic diagram of long shot identification sample provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of the annotation equipment of lens type provided in an embodiment of the present invention;
Fig. 6 is a kind of another structural schematic diagram of the annotation equipment of lens type provided in an embodiment of the present invention;
Fig. 7 is a kind of another structural schematic diagram of the annotation equipment of lens type provided in an embodiment of the present invention;
Fig. 8 is a kind of another structural schematic diagram of the annotation equipment of lens type provided in an embodiment of the present invention;
Fig. 9 is a kind of structural schematic diagram of application example provided in an embodiment of the present invention.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
Referring to Fig. 1, being a kind of flow chart of the mask method of lens type provided in an embodiment of the present invention, the method Applied to later period video production, it can specifically include following steps:
Step 101: pumping frame, delta frame image are carried out according to default frame-skipping parameter to video to be clipped;
In the step, before taking out frame, frame-skipping (skip_frame) parameter is first set at the terminal, then by taking out frame algorithm Pumping frame is carried out according to default frame-skipping parameter to video to be clipped, generates multiple frame images.Wherein, preset frame-skipping parameter be according to What business demand was manually arranged, frame-skipping number is bigger, and processing speed is faster, but effect is poorer, to do and weigh between speed and effect.
That is, first carrying out pumping frame, delta frame image according to frame-skipping (frame skip) parameter preset, and will give birth to At frame Image Adjusting be pixel be (w, h) image, wherein w be pre-set image width, h be pre-set image height.
Step 102: every frame image being input to convolutional neural networks and is handled, the candidate reference in every frame image is obtained The area image of object;
Specific treatment process includes: in the step
1) every frame image is first input to the first convolutional neural networks and carries out candidate region image roughing processing, obtain candidate The location information of object of reference;
Wherein, it should be noted that every frame image is handled by roughing, the location information of candidate object of reference is obtained, big Be under partial picture it is multiple, also have certainly be zero probability, for example, looking for that not have the case where good friend's object of reference in original image Under, roughing processing is carried out, the location information of zero candidates object of reference is obtained.
In the embodiment, roughing processing that is to say that primary election is handled, the first convolutional neural networks in the step (CNN, Convolutional NeuralNetwork) are as follows: the convolutional neural networks in the candidate object of reference roughing stage, the convolutional Neural Network is formed by 6 layers, i.e. L1, L2, L3, L4, L5 and L6.The L1 is 10 3*3 convolution kernel compositions, the convolutional layer that step-length is 1; L2 is the pond 2*2 core composition, the pond layer that step-length is 1;L3 is 16 3*3 convolution kernel compositions, the convolutional layer that step-length is 1;L4 is 32 3*3 convolution kernel compositions, the convolutional layer that step-length is 1;L5 is 2 1*1 convolution kernel compositions, the convolutional layer that step-length is 1;L6 is 4 A 1*1 convolution kernel composition, the convolutional layer that step-length is 1.Wherein, in convolutional neural networks, in L1 to L4, the output conduct of preceding layer The input of later layer, the output of L4 is respectively as L5, the input of L6, input of the output of L5 as L6, the i.e. output of L4 and L5 As the input of L6, the output of L5 be judge current region whether be object of reference probability, the output of L6 is the position for selecting object of reference Area information is set, i.e., the coordinate information of rectangle upper left and bottom right where candidate object of reference.
Wherein, L5 output is a numerical value, indicates whether the probability for object of reference, it is exactly to join that probability, which is more than or equal to threshold value, According to the probability of object, less than not being, and threshold value is obtained by training.
In the embodiment, if the resolution ratio of every frame image of the first convolutional neural networks of input is 1024*786, then Input the first convolutional neural networks is exactly the matrix of 1024*768*3 (* 3 be because of tri- channels RGB, is equally known), warp It crosses after L1 and exports the eigenmatrix of 1024*768*10, be sent into the eigenmatrix that L2 generates 1024*768*10, be sent into L3 output The eigenmatrix of 1024*768*16, is sent into the eigenmatrix of L4 output 1024*768*32, which is sent into L5, L6, L5 are defeated The vector (a, b) of a 1*4 out, a indicate that the area coordinate of L6 output is the probability of object of reference, and b indicates that the region of L6 output is sat Mark the probability rather than referring to object;L6 exports the vector (c, d, e, f) of a 1*4, and (c, d) is that the rectangle upper left corner is sat where object of reference Mark, (e, f) are rectangle bottom right angular coordinate where object of reference.
In the step, every frame image is input to by the first convolutional neural networks (convolution being made of multiple convolutional layers Neural network), export multiple candidate objects of reference location information (in the embodiment, location information with two coordinates, (x1, Y1), for (x2, y2), two coordinates respectively indicate the upper left point coordinate and lower-right most point of the rectangle frame where candidate object of reference Coordinate);The first convolutional network composition during this is fairly simple, and when handling whole image, operation is relatively fewer.Output is time The location information for selecting object of reference, the input as the second complicated below convolutional neural networks.In the step, to candidate region figure As carry out roughing processing, can make be not in every frame image candidate region image without the second convolutional Neural of subsequent complexity Network operations reduce a large amount of operation, so that the model reasoning time be greatly decreased.
2) image in the outlined region of each candidate reference position information is input to the second convolutional neural networks The selected processing of candidate object of reference is carried out, the area image of candidate object of reference is obtained.
Wherein, in the step, the second convolutional neural networks are the convolutional neural networks in the candidate object of reference selected stage, should Network is formed by 7 layers: L1, L2, L3, L4, L5, L6 and L7.The L1 is made of 28 3*3 convolution kernels, the convolution that step-length is 1 Layer;L2 is the pond 3*3 core, the pond layer that step-length is 2;L3 is made of 48 3*3 convolution kernels, the convolutional layer that step-length is 1;L4 is 3* 3 pond cores, the pond layer that step-length is 2;L5 is made of 64 2*2 convolution kernels, the convolutional layer that step-length is 1;L6 output for 128 it is complete Articulamentum;L7 is made of 2 1*1 convolution kernels, the convolutional layer that step-length is 1.Wherein, in the second convolutional neural networks, in L1 to L7, Input of the output as later layer of preceding layer, what the input of L1 was that network exports in the candidate object of reference roughing stage is candidate ginseng According to the location information of object, i.e., the image that candidate reference position area information is outlined, L7 first judges that the image of current region is The no probability for object of reference, if it is, what is exported is the area image of candidate object of reference.
Wherein, L7 first judge current region image whether be object of reference probability, if probability be more than or equal to threshold value if It is the probability of object of reference, less than not being if probability is more than or equal to, to export the area image of candidate object of reference, and threshold value is It is obtained by training.
That is, the network model in the stage is similarly convolutional neural networks, the convolutional Neural in the opposite roughing stage The convolutional neural networks model of network, selected stage is increasingly complex, and network inputs are that the object of reference that roughing phase Network enters and leaves is sat The image for the rectangular area that punctuate outlines exports as Boolean, wherein Boolean is one in " true " True or " vacation " False It is a, that is, judge whether image currently entered is object of reference, if so, being very, to continue to retain the image, if not, being vacation, then It is rejected from candidate set, to be carried out to the candidate object of reference of roughing phase Network output selected.
In the step, after through the first convolutional neural networks, the coordinate of multiple candidate objects of reference is obtained, reference substance is chosen The big coordinate of probability top n intercepts picture, the input as the second convolutional neural networks, it is assumed that the current later period choosing for being sent into network The resolution ratio of object of reference is 400*200, then the matrix for being sent into network is are as follows: 400*200*3 (* 3 be because of tri- channels RGB, Equally it is known), the eigenmatrix of 400*200*28 is exported after L1, is sent into the eigenmatrix that L2 generates 200*100*28, It is sent into the eigenmatrix of L3 output 200*100*48, is sent into the eigenmatrix of L4 output 100*50*48, L5 is sent into and exports 100* The eigenmatrix of 50*64 is sent into the eigenmatrix of L6 output 100*1, is sent into the vector (g, h) of L7 output 2*1, wherein g is indicated Input picture is the probability of object of reference, and h indicates input picture rather than referring to the probability of object;By object of reference corresponding to preceding M big h Picture is sent into next network, the i.e. input as third convolutional neural networks.
Step 103: extracting the characteristic point of the area image of the candidate object of reference in every frame image;
In the step, the area image of the candidate object of reference of the second convolutional neural networks output is input to third volume Product neural network carries out feature extraction according to setting object of reference reference point, obtains multiple characteristic points of candidate object of reference.Wherein, special Definition and the number for levying point are related with the object of reference type of pre-selection, and by taking face as an example, there are five reference points, and eyes two, nose One, the corners of the mouth two etc..
Wherein, third convolutional neural networks are convolutional neural networks described in the candidate object of reference feature point extraction stage, The third convolutional neural networks are formed by 9 layers, i.e. L1, L2, L3, L4, L5, L6, L7, L8 and L9, wherein the L1 is by 32 3*3 convolution kernel composition, the convolutional layer that step-length is 1;L2 is the pond 3*3 core, the pond layer that step-length is 2;L3 is by 64 3*3 convolution kernels For composition, the convolutional layer that step-length is 1;L4 is the pond 3*3 core, the pond layer that step-length is 2;L5 is to form by 64 3*3 convolution kernels, The convolutional layer that step-length is 1;L6 is the pond 2*2 core, the pond layer that step-length is 2;L7 is made of 128 2*2 convolution kernels, step-length 1 Convolutional layer;The full articulamentum that the output of L8 is 256, the full articulamentum that L9 output is 10.In L1 to L9, the output of preceding layer is made For the input of later layer, the input of L1 is the rectangle frame of object of reference is outlined by network output in the candidate object of reference selected stage Image, the output of L9 is the characteristic point coordinate in present image, and the present embodiment is by taking five coordinates of face as an example.
In the step, it is assumed that the resolution ratio of the image by the output of the second convolutional neural networks is 400*200, then sending Enter third convolutional neural networks is also the matrix of 400*200*3, and the eigenmatrix of 400*200*32 is exported after L1, is passed through The eigenmatrix that 200*100*32 is exported after L2, the eigenmatrix of 200*100*64 is exported after L3, is exported after L4 The eigenmatrix of 100*50*64 exports the eigenmatrix of 100*50*64 after L5, and the spy of 500*25*64 is exported after L6 Matrix is levied, the eigenmatrix of 100*50*128 is exported after L7, the eigenmatrix of 256*1 is exported after L8, after L4 The vector (I, j, k, l, m, n, o, p, q, r) of 10*1 is exported, every two is coordinate (left eye ball, right eye ball, the nose of one group of sign feature Point, the left corners of the mouth, the right corners of the mouth).
It should be noted that the network model or convolutional neural networks in the stage, but it is more increasingly complex than the first two, it is defeated Enter to export as the characteristic point of multiple objects of reference by the image of selected object of reference, by taking face as an example, there are five reference points, Eyes two, one, nose, the corners of the mouth two etc., but it is not limited to this, can also be other characteristic points, and the present embodiment does not limit System.
Step 104: current frame image is identified according to the characteristic distance of the characteristic point of the area image of the candidate object of reference Lens type.
In the step, the characteristic distance of multiple characteristic points of the candidate object of reference is first calculated;Then, it is determined that the feature The ratio of distance and current frame image height;After again, the ratio is compared with default lens type parameter area;Most Afterwards, the lens type in current frame image is determined according to comparison result.
Wherein, preset lens type parameter area, parameter area be determined by experimental data by artificially, and and Video genre strong correlation, by taking the parameter preset of face as an example: P_WS_FS=0.015, P_FS_MS=0.045, P_MS_CS= 0.070, P_CS_CU=0.150.
For example, five characteristic points of face can be used in the shot classification of variety show, two spies of eyes are calculated It levies at a distance from the midpoint of two characteristic points in midpoint and the corners of the mouth of point as characteristic distance, determines this feature distance and height Ratio determines lens type in current frame image by the ratio compared with preset lens type range, that is, checks that the ratio exists Which lens type which the middle lens type parameter section preset, the present frame just belong to.
Step 105: the lens type is marked on current frame image.
In the step, the position of the lens type is marked in current frame image, can in the upper left corner of present image, The positions such as the upper left corner, the upper right corner or the lower right corner.
In order to make it easy to understand, in the step for marking the identification sample of three kinds of lens types in current frame image, The identification sample of lens type is labeled in for the lower left corner in the figure as shown in Fig. 2, Fig. 3 and 4.Wherein, Fig. 2 is the present invention A kind of schematic diagram for middle scape (MS) camera lens identification sample that embodiment provides, Fig. 3 is a kind of feature provided in an embodiment of the present invention (CU) schematic diagram of camera lens identification sample;A kind of Fig. 4 signal of distant view (WS) camera lens identification sample provided in an embodiment of the present invention Figure.
In Fig. 2, medium shot identifies language are as follows: Fame;391, person:1, Ratio:0.07576, Method: Max, Type:MS.
In Fig. 3, close-up shot identifies language are as follows: Fame;608, person:1, Ratio:0.49322, Method: Max, Type:CU.
In Fig. 4, long shot identifies language are as follows: Fame;1396, person:MP (Mutil Person), Ratio: 0.03503, Method:Max, Type:WS.
That is, the characteristic distance that can be used between the midpoint of two characteristic points and the midpoint of two corners of the mouth characteristic points comes It calculates, then the ratio of characteristic distance and height checks the ratio in which the parameter section preset, the present frame Which lens type just belonged to.
Wherein, in the embodiment, the lens type in addition to may include above-mentioned distant view (WS, wide shot) camera lens, Panorama (FS, full shot) camera lens, close shot (CS, close shot) camera lens, middle scape (MS, medium shot) camera lens and feature (CU, close-up) camera lens etc., it is, of course, also possible to include big long shot (XLS or ELS, extreme long shot) camera lens, Full distant view (VLS, vrey long shot) camera lens, medium long shot (MLS, medium long shot) camera lens, two-shot (MCU, Medium close-up) camera lens, distant view/wide-angle (LS/WS) camera lens, tight close-up (BCU, big close-up) camera lens etc., This no longer illustrates one by one.
The meaning of its each lens type has been known technology, details are not described herein to those skilled in the art.
In the embodiment of the present invention, pumping frame first carried out according to default frame-skipping parameter to video to be clipped, delta frame image, so Afterwards, every frame image is input to convolutional neural networks to handle, obtains the area image of the candidate object of reference in every frame image; Extract the characteristic point of the area image of the candidate object of reference in every frame image;Finally, according to the administrative division map of the candidate object of reference The lens type of the characteristic distance identification current frame image of the characteristic point of picture.In other words, in the present embodiment, in the later period of video In production, for video source material according to the shot cluster of the characteristic distance identification current frame image of the characteristic point of candidate object of reference Type, and lens type mark is marked in current frame image, a large amount of artificial mark work in post-production is eliminated, is facilitated Camera lens required for later period editing personnel's quick-searching substantially shortens the post-production duration, reduces be fabricated to a certain extent This, improves post-production efficiency.
Further, the abstract concept of lens type is converted to the ratio of face characteristic distance and current frame image height, The complexity of algorithm is reduced, the time of processing video is shortened, improves post-production efficiency.
Optionally, in another embodiment, if detecting multiple objects of reference in same image to get described in arriving When candidate object of reference has multiple, candidate object of reference can be selected according to video type, select candidate object of reference maximum characteristic distance or Person's average characteristics distances calculates lens type, specifically:
1) according to video type, calculate separately in an image it is each candidate object of reference multiple characteristic points feature away from From;Wherein, characteristic distance is also known as Euclidean distance.
In the step, since video type is related with selection object of reference, for example, if it is variety show, all due to the inside It is people, preferential face of choosing calculates characteristic distance as object of reference;If it is natural documentary film, specific scenery is preferentially chosen, For example choosing tree is object of reference, calculates the characteristic distance of tree.
2) maximum characteristic distance is chosen from the characteristic distance or calculates the average characteristics distance of all characteristic distances;
3) ratio of the maximum characteristic distance or average characteristics distance and image frame height is determined;
4) ratio is compared with default lens type parameter area;
5) lens type in current frame image is determined according to comparison result.
For figure 2 above, if the photo resolution is 1280*720, according to described in above-mentioned steps, which is By the frame extracted in variety show " wonderful work conference " video, belonging to video type be variety show, therefore choose face and make Object of reference in the frame image is determined by the first and second convolutional neural networks of deep learning mentioned above for object of reference Frame are as follows: [(564,144), (667,290)] extract the key feature points of the object of reference by third convolutional neural networks are as follows: left Eye coordinates: (575,204), right eye (617,195), nose (585,226), the left corners of the mouth (585,257), the right corners of the mouth (620,250); Feature calculation mode is related with referring to species, and different object of reference characteristic distance calculations are varied, with face in this example For:
Calculate two interocular distance d_eye=√ [(575-617)2+(204-195)2]=42.95
Calculate two corners of the mouth distance d_mouth=√ [(585-620)2+(257-250)2]=35.69
Calculate two midpoint coordinates pm_eye=(596,199.5)
Calculate two mouth midpoint coordinates pm_mouth=(602.5,253.5)
Calculate two midpoint distance dm=54.39
Characteristic distance d=(d_eye+d_mouth+dm)/3=44.34
Characteristic distance d is in the section of MS, therefore judges that the frame image belongs to middle scape MS.
Dwellings different from above-described embodiment are in the embodiment of the present invention, if detecting multiple ginsengs in same picture According to object, firstly, first calculating separately the characteristic distance of multiple characteristic points of each candidate object of reference according to video type;Then, from Maximum characteristic distance is chosen in the characteristic distance or calculates the average characteristics distance of all characteristic distances;After again, institute is determined State the ratio of maximum characteristic distance or average characteristics distance and image frame height;Process later is same as the previously described embodiments, Specifically it is detailed in above-mentioned, details are not described herein.It follows that in later period video production process, to original video source material into Rower note reduces or substitutes the work of primary editor, reduce preliminary shearing or " bulk pruning " time, accelerate program making progress, reduces Editing personnel's bring personnel cost, improves work efficiency.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.
Referring to Fig. 5, be a kind of structural schematic diagram of the annotation equipment of lens type provided in an embodiment of the present invention, it is described Device is applied to later period video production, can specifically include following module: generation module 51, processing module 52, extraction module 53, Identification module 54 and labeling module 55, wherein
Generation module 51, for carrying out pumping frame, delta frame image according to default frame-skipping parameter to video to be clipped;
Processing module 52 is handled for every frame image to be input to convolutional neural networks, is obtained in every frame image The area image of candidate object of reference;
Extraction module 53, the characteristic point of the area image for extracting the candidate object of reference in every frame image;
Identification module 54 identifies current for the characteristic distance according to the characteristic point of the area image of the candidate object of reference The lens type of frame image;
Labeling module 55, for marking the lens type on the current frame image.
Optionally, in another embodiment, the embodiment on the basis of the above embodiments, wrap by the processing module 52 Include: roughing processing module 61 and selected processing module 62, structural representation is as shown in Figure 6, wherein
Roughing processing module 61, it is thick for every frame image to be input to the first convolutional neural networks progress candidate region image Choosing processing, obtains the location information of candidate object of reference;
Selected processing module 62, for the image in the outlined region of each candidate reference position information to be input to Second convolutional neural networks carry out the selected processing of candidate object of reference, obtain the area image of candidate object of reference.
Optionally, in another embodiment, on the basis of the above embodiments, the extraction module 53 has the embodiment Body is used to the area image of the candidate object of reference in every frame image being input to third convolutional neural networks joins according to setting Feature extraction is carried out according to object reference point, obtains multiple characteristic points of candidate object of reference.
Optionally, in another embodiment, which on the basis of the above embodiments, obtains in the extraction module The candidate object of reference when having one to get to include an object of reference on same frame image when, the identification module 54 wraps Include: the first computing module 71, the first determining module 72, the first comparison module 73 and the first identification module 74, structural representation is such as Shown in Fig. 7, wherein
First computing module 71, the characteristic distance of the characteristic point for calculating the candidate object of reference;
First determining module 72, for determining the ratio of the characteristic distance that the computing module calculates and image frame height Value;
First comparison module 73, for the ratio to be compared with default lens type parameter area;
First identification module 74 identifies the corresponding mirror of current frame image for the comparison result according to the comparison module Head type.
Optionally, in another embodiment, which on the basis of the above embodiments, obtains in the extraction module The candidate object of reference to have multiple be that when including multiple objects of reference on that is, same frame image, the identification module 54 includes: the Two computing modules 81 choose module 82, the second determining module 83, the second comparison module 84 and the second identification module 85, structure Signal is as shown in Figure 8, wherein
Second computing module 81, for calculating separately multiple characteristic points of each candidate object of reference according to video type Characteristic distance;
Module 82 is chosen, for choosing maximum characteristic distance from the characteristic distance or calculating all characteristic distances Average characteristics distance;
Second determining module 83, for determining the maximum characteristic distance or average characteristics distance and current frame image height Ratio;
Second comparison module 84, for the ratio to be compared with default lens type parameter area;
Second identification module 85, for identifying the lens type in current frame image according to comparison result.
Optionally, in another embodiment, on the basis of the above embodiments, the identification module 54 includes the embodiment It can simultaneously include the modules in above-mentioned Fig. 7 and Fig. 8.
Optionally, described device can integrate in the server, can also independently dispose, and the present embodiment is with no restriction.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.
Optionally, the embodiment of the present invention also provides a kind of network equipment, comprising: memory, processor and is stored in described It is real when the computer program is executed by the processor on memory and the computer program that can run on the processor Now such as in above-mentioned later period video production the step of the mask method of lens type.It is real when the computer program is executed by processor Each process of the mask method embodiment of lens type described in existing above-mentioned Fig. 1, and identical technical effect can be reached, to keep away Exempt to repeat, which is not described herein again.
Optionally, the embodiment of the present invention also provides a kind of computer readable storage medium, the computer-readable storage medium Computer program is stored in matter, which realizes lens type described in above-mentioned Fig. 1 when being executed by processor Mask method embodiment each process, and identical technical effect can be reached, to avoid repeating, which is not described herein again.Its In, the computer readable storage medium, as read-only memory (Read-OnlyMemory, abbreviation ROM), arbitrary access are deposited Reservoir (Random Access Memory, abbreviation RAM), magnetic or disk etc..
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
Also referring to Fig. 9, being a kind of structural schematic diagram of application example provided in an embodiment of the present invention, the embodiment application In later period video production, different convolutional neural networks are by taking three kinds of different convolutional neural networks as an example, each convolutional Neural Network is exactly convolutional neural networks module, i.e. the first convolution neural network model, the second convolution neural network model, third convolution Neural network model.Its detailed process are as follows:
Pumping frame software on computer first carries out pumping frame, delta frame figure according to default frame-skipping parameter to video to be clipped Picture;Then, the frame image (such as n-th frame) of generation is input to the first convolutional neural networks and carries out candidate region image by computer Roughing processing obtains multiple roughing objects of reference to get the location information of multiple candidate objects of reference is arrived;And by each candidate The image in the outlined region of reference position information is input to the second convolutional neural networks and carries out the selected processing of candidate object of reference, obtains To multiple selected objects of reference to get to the area image of candidate object of reference, then after, the area image of the candidate object of reference is defeated Enter to third convolutional neural networks, carries out feature extraction according to setting object of reference reference point, obtain multiple spies of candidate object of reference It levies point (i.e. object of reference characteristic point), finally, calculating the characteristic distance of multiple characteristic points of the candidate object of reference;Determine the spy Levy the ratio of distance and image frame height;The ratio is compared with default lens type parameter area;According to comparing As a result it identifies the lens type in current frame image, and marks the lens type on the current frame image.This implementation With distant view (WS) camera lens in example, panorama (FS) camera lens, for close shot (CS) camera lens, middle scape (MS) camera lens and feature (CU) camera lens.
In the embodiment of the present invention, before primary editing video, first video is analyzed, video is stamped into lens type Label can greatly promote the working efficiency of primary editor in conjunction with existing video analysis algorithm, save time and personnel at This.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of specified function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram The function of being specified in frame or multiple boxes.
These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart And/or in one or more blocks of the block diagram specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.
Above to a kind of mask method of lens type provided by the present invention, device and equipment, it is described in detail, Used herein a specific example illustrates the principle and implementation of the invention, and the explanation of above embodiments is only used In facilitating the understanding of the method and its core concept of the invention;At the same time, for those skilled in the art, according to the present invention Thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as Limitation of the present invention.

Claims (12)

1. a kind of mask method of lens type characterized by comprising
Pumping frame, delta frame image are carried out according to default frame-skipping parameter to video to be clipped;
Every frame image is input to convolutional neural networks to handle, obtains the administrative division map of the candidate object of reference in every frame image Picture;
Extract the characteristic point of the area image of the candidate object of reference in every frame image;
According to the lens type of the characteristic distance identification current frame image of the characteristic point of the area image of the candidate object of reference;
The lens type is marked on the current frame image.
2. being obtained the method according to claim 1, wherein described pass through convolutional neural networks for every frame image The area image of candidate object of reference in every frame image, comprising:
Every frame image is input to the first convolutional neural networks and carries out candidate region image roughing processing, obtains candidate object of reference Location information;
The image in the outlined region of each candidate reference position information is input to the second convolutional neural networks to wait The selected processing of object of reference is selected, the area image of candidate object of reference is obtained.
3. according to the method described in claim 2, it is characterized in that, the region for extracting the candidate object of reference in every frame image The characteristic point of image includes:
The area image of the candidate object of reference in every frame image is input to third convolutional neural networks according to setting reference Object reference point carries out feature extraction, obtains multiple characteristic points of candidate object of reference.
4. method according to any one of claims 1 to 3, which is characterized in that the spy according to the candidate object of reference The characteristic distance of sign point determines the lens type of current frame image, comprising:
Calculate the characteristic distance of multiple characteristic points of the candidate object of reference;
Determine the ratio of the characteristic distance Yu image frame height;
The ratio is compared with default lens type parameter area;
The lens type in current frame image is identified according to comparison result.
5. method according to any one of claims 1 to 3, which is characterized in that have in the obtained candidate object of reference more When a, the characteristic distance of the multiple characteristic points for calculating the candidate object of reference includes:
According to video type, the characteristic distance of multiple characteristic points of each candidate object of reference is calculated separately;
Maximum characteristic distance is chosen from the characteristic distance or calculates the average characteristics distance of all characteristic distances;
Determine the ratio of the maximum characteristic distance or average characteristics distance and image frame height;
The ratio is compared with default lens type parameter area;
The lens type in current frame image is determined according to comparison result.
6. a kind of annotation equipment of lens type characterized by comprising
Generation module, for carrying out pumping frame, delta frame image according to default frame-skipping parameter to video to be clipped;
Processing module is handled for every frame image to be input to convolutional neural networks, obtains the candidate ginseng in every frame image According to the area image of object;
Extraction module, the characteristic point of the area image for extracting the candidate object of reference in every frame image;
Identification module, the characteristic distance for the characteristic point according to the area image of the candidate object of reference identify current frame image Lens type;
Labeling module, for marking the lens type on the current frame image.
7. device according to claim 6, which is characterized in that the processing module includes:
Roughing processing module carries out at the roughing of candidate region image for every frame image to be input to the first convolutional neural networks Reason, obtains the location information of candidate object of reference;
Selected processing module, for the image in the outlined region of each candidate reference position information to be input to volume Two Product neural network carries out the selected processing of candidate object of reference, obtains the area image of candidate object of reference.
8. device according to claim 6, which is characterized in that the extraction module, specifically for will be in every frame image The area image of candidate's object of reference is input to third convolutional neural networks and mentions according to setting object of reference reference point progress feature It takes, obtains multiple characteristic points of candidate object of reference.
9. according to the described in any item devices of claim 6 to 8, which is characterized in that the identification module includes:
First computing module, the characteristic distance of the characteristic point for calculating the candidate object of reference;
First determining module, for determining the ratio of the characteristic distance that the computing module calculates and image frame height;
First comparison module, for the ratio to be compared with default lens type parameter area;
First identification module identifies the corresponding lens type of current frame image for the comparison result according to the comparison module.
10. according to the described in any item devices of claim 6 to 8, which is characterized in that in the time that the extraction module obtains When object of reference being selected to have multiple, the identification module includes:
Second computing module, for according to video type, calculate separately each candidate object of reference multiple characteristic points feature away from From;
Module is chosen, for choosing maximum characteristic distance from the characteristic distance or calculating the average spy of all characteristic distances Levy distance;
Second determining module, for determining the ratio of the maximum characteristic distance or average characteristics distance and current frame image height Value;
Second comparison module, for the ratio to be compared with default lens type parameter area;
Second identification module, for identifying the lens type in current frame image according to comparison result.
11. a kind of network equipment characterized by comprising memory, processor and be stored on the memory and can be in institute The computer program run on processor is stated, such as claim 1 to 5 is realized when the computer program is executed by the processor Any one of described in lens type mask method the step of.
12. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes the mark of the lens type as described in any one of claims 1 to 5 when the computer program is executed by processor The step of injecting method.
CN201810910154.2A 2018-08-10 2018-08-10 A kind of mask method of lens type, device and equipment Pending CN109325405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810910154.2A CN109325405A (en) 2018-08-10 2018-08-10 A kind of mask method of lens type, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810910154.2A CN109325405A (en) 2018-08-10 2018-08-10 A kind of mask method of lens type, device and equipment

Publications (1)

Publication Number Publication Date
CN109325405A true CN109325405A (en) 2019-02-12

Family

ID=65263406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810910154.2A Pending CN109325405A (en) 2018-08-10 2018-08-10 A kind of mask method of lens type, device and equipment

Country Status (1)

Country Link
CN (1) CN109325405A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340837A (en) * 2020-02-18 2020-06-26 上海眼控科技股份有限公司 Image processing method, device, equipment and storage medium
CN111783729A (en) * 2020-07-17 2020-10-16 商汤集团有限公司 Video classification method, device, equipment and storage medium
CN112969063A (en) * 2021-02-02 2021-06-15 烟台艾睿光电科技有限公司 Multi-lens identification system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631428A (en) * 2015-12-29 2016-06-01 国家新闻出版广电总局监管中心 Comparison and identification method and apparatus for videos
CN107392883A (en) * 2017-08-11 2017-11-24 陈雷 The method and system that video display dramatic conflicts degree calculates
CN107590489A (en) * 2017-09-28 2018-01-16 国家新闻出版广电总局广播科学研究院 Object detection method based on concatenated convolutional neutral net

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631428A (en) * 2015-12-29 2016-06-01 国家新闻出版广电总局监管中心 Comparison and identification method and apparatus for videos
CN107392883A (en) * 2017-08-11 2017-11-24 陈雷 The method and system that video display dramatic conflicts degree calculates
CN107590489A (en) * 2017-09-28 2018-01-16 国家新闻出版广电总局广播科学研究院 Object detection method based on concatenated convolutional neutral net

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
万崇玮: ""基于尺度不变特征的视频镜头检测"", 《计算机辅助设计与图形学学报》 *
蔡轶珩 等: ""融合颜色信息与特征点的镜头边界检测算法"", 《计算机应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340837A (en) * 2020-02-18 2020-06-26 上海眼控科技股份有限公司 Image processing method, device, equipment and storage medium
CN111783729A (en) * 2020-07-17 2020-10-16 商汤集团有限公司 Video classification method, device, equipment and storage medium
CN112969063A (en) * 2021-02-02 2021-06-15 烟台艾睿光电科技有限公司 Multi-lens identification system and method

Similar Documents

Publication Publication Date Title
CN110188760B (en) Image processing model training method, image processing method and electronic equipment
CN109614921B (en) Cell segmentation method based on semi-supervised learning of confrontation generation network
CN111950723B (en) Neural network model training method, image processing method, device and terminal equipment
CN113591795B (en) Lightweight face detection method and system based on mixed attention characteristic pyramid structure
CN112954450B (en) Video processing method and device, electronic equipment and storage medium
CN112132197B (en) Model training, image processing method, device, computer equipment and storage medium
CN105678216A (en) Spatio-temporal data stream video behavior recognition method based on deep learning
CN112149459A (en) Video salient object detection model and system based on cross attention mechanism
US20100067863A1 (en) Video editing methods and systems
CN111311475A (en) Detection model training method and device, storage medium and computer equipment
CN107689035A (en) A kind of homography matrix based on convolutional neural networks determines method and device
CN107566688A (en) A kind of video anti-fluttering method and device based on convolutional neural networks
CN109325405A (en) A kind of mask method of lens type, device and equipment
CN111080746B (en) Image processing method, device, electronic equipment and storage medium
CN111382647B (en) Picture processing method, device, equipment and storage medium
CN107948586A (en) Trans-regional moving target detecting method and device based on video-splicing
CN110852295A (en) Video behavior identification method based on multitask supervised learning
CN111739037B (en) Semantic segmentation method for indoor scene RGB-D image
CN112598003A (en) Real-time semantic segmentation method based on data expansion and full-supervision preprocessing
CN114882011A (en) Fabric flaw detection method based on improved Scaled-YOLOv4 model
WO2023047162A1 (en) Object sequence recognition method, network training method, apparatuses, device, and medium
CN111931572B (en) Target detection method for remote sensing image
CN112016434A (en) Lens motion identification method based on attention mechanism 3D residual error network
CN110942463B (en) Video target segmentation method based on generation countermeasure network
CN115937742B (en) Video scene segmentation and visual task processing methods, devices, equipment and media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190212

RJ01 Rejection of invention patent application after publication