CN108235116A

CN108235116A - Feature propagation method and device, electronic equipment, program and medium

Info

Publication number: CN108235116A
Application number: CN201711455916.6A
Authority: CN
Inventors: 石建萍; 李玉乐; 林达华
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2017-12-27
Filing date: 2017-12-27
Publication date: 2018-06-29
Anticipated expiration: 2037-12-27
Also published as: CN108235116B

Abstract

The embodiment of the invention discloses a kind of feature propagation method and device, electronic equipment, program and medium, wherein, method includes：Judge whether present frame is key frame；It is the non-key frame in video in response to the present frame, according to the low-level feature of adjacent previous key frame and the low-level feature of the present frame, the high-level characteristic of the present frame is obtained by the high-level characteristic of the previous key frame；Wherein, in neural network, extraction obtains the network depth of the corresponding first network layer of low-level feature of the previous key frame, is shallower than the network depth that extraction obtains corresponding second network layer of high-level characteristic of the previous key frame.The consensus information between video frame is utilized in the embodiment of the present invention, using the semantic label between contiguous frames it is close the characteristics of, video semanteme feature is traveled into present frame from adjacent previous key frame, reduces and computes repeatedly the time, and improve the accuracy of semantic segmentation.

Description

Feature propagation method and device, electronic equipment, program and medium

Technical field

The present invention relates to computer vision technique, especially a kind of feature propagation method and device, electronic equipment, program and Medium.

Background technology

Video semanteme segmentation is the major issue in computer vision and video semanteme understanding task.Video semanteme divides mould Type has important application in many fields, such as automatic Pilot, the fields such as video monitoring and video object analysis.

At present, it is although more to the comparison of the semantic segmentation technical research of image, video semanteme cutting techniques are but ground That studies carefully is fewer.Video semanteme divides more demanding real-time, while can ensure enough precision.

Invention content

The embodiment of the present invention provides the feature propagation technical solution in a kind of video.

One side according to embodiments of the present invention, a kind of feature propagation method provided, including：

Judge whether present frame is key frame；

It is the non-key frame in video in response to the present frame, according to the low of the adjacent previous key frame of the present frame The low-level feature of layer feature and the present frame, the high level that the present frame is obtained by the high-level characteristic of the previous key frame are special Sign；Wherein, in neural network, the network that extraction obtains the corresponding first network layer of low-level feature of the previous key frame is deep Degree is shallower than the network depth that extraction obtains corresponding second network layer of high-level characteristic of the previous key frame.

Optionally, in any of the above-described embodiment of the method for the present invention, the previous key adjacent according to the present frame The low-level feature of the low-level feature of frame and the present frame obtains the present frame by the high-level characteristic of the previous key frame High-level characteristic, including：

According to the low-level feature of adjacent previous key frame and the low-level feature of the present frame, obtain from the previous pass The low-level feature of key frame transforms to the conversion weights of the low-level feature of the present frame；

According to the high-level characteristic of the previous key frame and the conversion weights, by the high-level characteristic of the previous key frame Be converted to the high-level characteristic of the present frame.

Optionally, it is non-key in video in response to the present frame in any of the above-described embodiment of the method for the present invention Frame further includes：

High-level characteristic at least based on the present frame carries out semantic segmentation to the present frame, obtains the present frame Semantic label.

Optionally, in any of the above-described embodiment of the method for the present invention, the high-level characteristic at least based on the present frame, Semantic segmentation is carried out to the present frame, including：

Low-level feature and high-level characteristic based on the present frame carry out semantic segmentation, described in acquisition to the present frame The semantic label of present frame.

Optionally, in any of the above-described embodiment of the method for the present invention, low-level feature and high level based on the present frame are special Sign carries out semantic segmentation to the present frame, including：

The low-level feature of the present frame is converted, is obtained consistent with the port number of the high-level characteristic of the present frame Feature；

The feature that the present frame is converted to is spliced or merged with the high-level characteristic of the present frame, is worked as Previous frame feature；

Based on the present frame feature, semantic segmentation is carried out to the present frame.

Optionally, it is described to judge whether present frame is key frame in any of the above-described embodiment of the method for the present invention, including：

Judge whether the present frame is key frame using key frame scheduling strategy.

Optionally, it is described to judge described work as using key frame scheduling strategy in any of the above-described embodiment of the method for the present invention Whether previous frame is key frame, including：Judge whether the present frame is key frame using regular length scheduling method；

It is the non-key frame in video in response to the present frame, the method further includes：The present frame is carried out special Sign extraction obtains the low-level feature of the present frame.

Optionally, in any of the above-described embodiment of the method for the present invention, judge the present frame using key frame scheduling strategy Whether it is key frame, including：

Feature extraction is carried out to the present frame, obtains the low-level feature of the present frame；

According to the low-level feature of the previous key frame and the low-level feature of the present frame, obtain the present frame and adjusted Spend the scheduling probability value for key frame；

Determine whether the present frame is scheduled as key frame according to the scheduling probability value of the present frame.

Optionally, in any of the above-described embodiment of the method for the present invention, according to the low-level feature of the previous key frame and institute The low-level feature of present frame is stated, obtains the scheduling probability value that the present frame is scheduled as key frame, including：

The low-level feature of the low-level feature of the previous key frame and the present frame is spliced, it is special to obtain splicing Sign；

By key frame dispatch network, obtain whether the present frame should be scheduled as key based on the splicing feature The scheduling probability value of frame.

Optionally, it in any of the above-described embodiment of the method for the present invention, further includes：

It is the key frame in video in response to present frame, feature extraction is carried out to the present frame, obtains the present frame Low-level feature and caching；

Feature extraction is carried out to the low-level feature of the present frame, obtains the high-level characteristic and caching of the present frame.

It is the key frame in the video in response to present frame, based on the high-level characteristic of the present frame, to described current Frame carries out semantic segmentation, obtains the semantic label of the present frame.

Other side according to embodiments of the present invention, a kind of feature propagation device provided, including：

Judgment module, for judging whether present frame is key frame；

Feature propagation module is non-in video in response to present frame for the judging result according to the judgment module Key frame, according to the low-level feature of the adjacent previous key frame of the present frame and the low-level feature of the present frame, by described The high-level characteristic of previous key frame obtains the high-level characteristic of the present frame；Wherein, in neural network, extraction obtains described previous The network depth of the first network layer of the low-level feature of key frame is shallower than extraction and obtains the high-level characteristic pair of the previous key frame The network depth of the second network layer answered.

Optionally, in any of the above-described device embodiment of the present invention, the feature propagation module is specifically used for：

According to the low-level feature of the previous key frame and the low-level feature of the present frame, obtain from the previous key The low-level feature of frame transforms to the conversion weights of the low-level feature of the present frame；And

Optionally, it in any of the above-described device embodiment of the present invention, further includes：

Semantic segmentation module is non-in video in response to present frame for the judging result according to the judgment module Key frame, at least high-level characteristic based on the present frame carry out semantic segmentation to the present frame, obtain the present frame Semantic label.

Optionally, in any of the above-described device embodiment of the present invention, the semantic segmentation module is at least based on described current The high-level characteristic of frame when carrying out semantic segmentation to the present frame, is specifically used for：Low-level feature and height based on the present frame Layer feature carries out semantic segmentation to the present frame.

Optionally, in any of the above-described device embodiment of the present invention, the semantic segmentation module is based on the present frame Low-level feature and high-level characteristic when carrying out semantic segmentation to the present frame, are specifically used for：

The feature that the present frame is converted to is spliced or merged with the high-level characteristic of the present frame, is worked as Previous frame feature；And

Optionally, in any of the above-described device embodiment of the present invention, the judgment module, specifically for utilizing key frame tune Degree strategy judges whether the present frame is key frame.

Optionally, in any of the above-described device embodiment of the present invention, the judgment module, specifically for utilizing regular length Scheduling method judges whether the present frame is key frame；

Described device further includes：

Fisrt feature extraction module, for the judging result according to the judgment module, in response to present frame in video Non-key frame, feature extraction is carried out to the present frame, obtains the low-level feature of the present frame.

Fisrt feature extraction module, for carrying out feature extraction to the present frame, the low layer for obtaining the present frame is special Sign；

Acquisition module for the low-level feature and the low-level feature of the present frame according to adjacent previous key frame, obtains The present frame is taken to be scheduled as the scheduling probability value of key frame；

The judgment module, specifically for determining whether the present frame is adjusted according to the scheduling probability value of the present frame It spends for key frame.

Optionally, in any of the above-described device embodiment of the present invention, the acquisition module includes：

Concatenation unit is spelled for the low-level feature to the previous key frame and the low-level feature of the present frame It connects, obtains splicing feature；

Key frame dispatch network obtains whether the present frame should be scheduled as key for being based on the splicing feature The scheduling probability value of frame.

Optionally, in any of the above-described device embodiment of the present invention, the fisrt feature extraction module is additionally operable to according to institute The judging result of judgment module is stated, is the key frame in video in response to present frame, feature extraction is carried out to the present frame, is obtained Obtain the low-level feature and caching of the present frame；

Described device further includes：

Second feature extraction module for carrying out feature extraction to the low-level feature of the key frame, obtains the key The high-level characteristic and caching of frame.

Optionally, in any of the above-described device embodiment of the present invention, the semantic segmentation module is additionally operable to sentence according to The judging result of disconnected module, is the key frame in video in response to present frame, based on the high-level characteristic of the present frame, to described Present frame carries out semantic segmentation, obtains the semantic label of the present frame.

Another aspect according to embodiments of the present invention, a kind of electronic equipment provided, including：Any of the above-described reality of the present invention Apply the feature propagation device described in example.

Another aspect according to embodiments of the present invention, another electronic equipment provided, including：

Feature propagation device described in processor and any of the above-described embodiment of the present invention；

When processor runs the feature propagation device, the feature propagation device described in any of the above-described embodiment of the present invention In unit be run.

Another aspect according to embodiments of the present invention, another electronic equipment provided, including：Processor and storage Device；

For the memory for storing an at least executable instruction, the executable instruction makes the processor perform this hair The operation of each step in feature propagation method described in bright any of the above-described embodiment.

Another aspect according to embodiments of the present invention, a kind of computer program provided, including computer-readable code, It is characterized in that, when the computer-readable code in equipment when running, the processor execution in the equipment is used to implement The instruction of each step in feature propagation method described in any of the above-described embodiment of the present invention.

Another aspect according to embodiments of the present invention, a kind of computer-readable medium provided, for storing computer The instruction that can be read, described instruction are performed in the feature propagation method realized described in any of the above-described embodiment of the present invention and respectively walk Rapid operation.

The feature propagation provided based on the above embodiment of the present invention puts method and apparatus, electronic equipment, program and medium, When present frame is the non-key frame in video, according to the adjacent low-level feature of previous key frame of present frame and the low layer of present frame Feature obtains the high-level characteristic of present frame, to be based on the high-level characteristic to non-key frame by the high-level characteristic of previous key frame Carry out semantic segmentation.The consensus information between video frame is utilized in the embodiment of the present invention, is marked using the semanteme between contiguous frames The characteristics of close is signed, present frame will be traveled to from adjacent previous key frame for carrying out the high-level characteristic of video semanteme segmentation, To carry out semantic segmentation to present frame based on the high-level characteristic of the present frame, without extracting video successive frame for language frame by frame The high-level characteristic of justice segmentation relative to extraction frame by frame for the mode of the high-level characteristic of semantic segmentation, reduces when computing repeatedly Between；In addition, that the high-level characteristic of previous key frame is traveled to present frame is for semantic segmentation and indirect for the embodiment of the present invention Semantic label is propagated, the mode of key frame semantic label is propagated relative to light stream, improves the accuracy of semantic segmentation.

Below by drawings and examples, technical scheme of the present invention is described in further detail.

Description of the drawings

The attached drawing of a part for constitution instruction describes the embodiment of the present invention, and is used to explain together with description The principle of the present invention.

With reference to attached drawing, according to following detailed description, the present invention can be more clearly understood, wherein：

Fig. 1 is the flow chart of feature of present invention transmission method one embodiment.

Fig. 2 is the flow chart of another embodiment of feature of present invention transmission method.

Fig. 3 is the flow chart of another embodiment of feature of present invention transmission method.

Fig. 4 is the structure diagram of feature of present invention transmission device one embodiment.

Fig. 5 is the structure diagram of another embodiment of feature of present invention transmission device.

Fig. 6 is the structure diagram of another embodiment of feature of present invention transmission device.

Fig. 7 is the structure diagram of one Application Example of electronic equipment of the present invention.

Specific embodiment

Carry out the various exemplary embodiments of detailed description of the present invention now with reference to attached drawing.It should be noted that：Unless in addition have Body illustrates that the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The range of invention.

Simultaneously, it should be appreciated that for ease of description, the size of the various pieces shown in attached drawing is not according to reality Proportionate relationship draw.

It is illustrative to the description only actually of at least one exemplary embodiment below, is never used as to the present invention And its application or any restrictions that use.

Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.

It should be noted that：Similar label and letter represents similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need to that it is further discussed.

The embodiment of the present invention can be applied to computer system/server, can be with numerous other general or specialized calculating System environments or configuration operate together.Suitable for be used together with computer system/server well-known computing system, ring The example of border and/or configuration includes but not limited to：Personal computer system, server computer system, thin client, thick client Machine, hand-held or laptop devices, the system based on microprocessor, set-top box, programmable consumer electronics, NetPC Network PC, Little types Ji calculates machine Xi Tong ﹑ large computer systems and the distributed cloud computing technology environment including any of the above described system, etc..

Computer system/server can be in computer system executable instruction (such as journey performed by computer system Sequence module) general linguistic context under describe.In general, program module can include routine, program, target program, component, logic, number According to structure etc., they perform specific task or realize specific abstract data type.Computer system/server can be with Implement in distributed cloud computing environment, in distributed cloud computing environment, task is long-range by what is be linked through a communication network Manage what equipment performed.In distributed cloud computing environment, program module can be located at the Local or Remote meter for including storage device It calculates in system storage medium.

In the implementation of the present invention, inventors discovered through research that, a kind of current existing video semanteme segmentation side In method, it will be directly applied in video for the model that image, semantic is divided, since video successive frame has many redundancies, Processing does not utilize this information frame by frame, big so as to cause computational complexity；In another video semanteme dividing method, make With light stream from key frame propagation characteristic to non-key frame, the semantic label of key frame is calculated using a deep neural network, Then with the light stream of small a network calculations key frame and present frame, i.e., the displacement put pixel-by-pixel in key frame and present frame to Amount, then by light stream from key frame propagation semantic label to present frame, i.e.,：Based on the motion vector put pixel-by-pixel to key Frame propagation semantic label is handled the semantic label to obtain present frame, may be caused due to target movement etc. in video It is the picture weave of image in video, fuzzy, the light stream of acquisition is made to be inaccurate, so as to reduce semantic segmentation precision.

Fig. 1 is the flow chart of feature of present invention transmission method one embodiment.As shown in Figure 1, the feature of the embodiment passes Broadcasting method includes：

102, judge whether present frame is key frame.

For example, it can judge whether present frame is key frame using key frame scheduling strategy.

104, it is the non-key frame in video in response to present frame, it is special according to the low layer of the adjacent previous key frame of present frame It seeks peace the low-level feature of present frame, the high-level characteristic of present frame is obtained by the high-level characteristic of previous key frame.

Wherein, in neural network, extraction obtains the low-level feature of previous key frame and present frame and the low layer spy of present frame The network depth of the first network layer of sign is shallower than in neural network and carries out feature extraction to low-level feature and obtain the of high-level characteristic The network depth of two network layers.

In various embodiments of the present invention, neural network includes the different network layer of more than two network depths, neural network packet In the network layer included, the network layer for carrying out feature extraction is properly termed as characteristic layer, after neural network receives a frame, leads to It crosses first characteristic layer and feature extraction is carried out, and be inputted second characteristic layer to the frame of input, from second characteristic layer, Each characteristic layer carries out feature extraction to the feature of input successively, and the feature extracted is input to next network layer carries out spy Sign extraction, until obtaining the feature for carrying out semantic segmentation.The network depth of each characteristic layer is carried according to feature in neural network The characteristic layer for being used to carry out feature extraction in neural network from shallow to deep, according to network depth, can be divided by the sequence taken Low-level feature layer and high-level characteristic layer two parts, i.e., above-mentioned first network layer and the second network layer.Wherein, in low-level feature layer The feature that each characteristic layer carries out feature extraction final output successively is known as low-level feature, and each characteristic layer in high-level characteristic layer is successively The feature for carrying out feature extraction final output is known as high-level characteristic.Relative to the shallower feature of network depth in same neural network Layer, the deeper characteristic layer view field of network depth is larger, more concern spatial structural form, and the feature extracted is for semanteme During segmentation so that semantic segmentation is more accurate, however, network depth is deeper, difficulty in computation and complexity are higher.In practical application, Can the characteristic layer in neural network be divided by low-level feature layer and high-level characteristic layer according to preset standard, such as calculation amount, The preset standard can be adjusted according to actual demand.For example, the nerve net for including 101 sequentially connected characteristic layers for one Network, can according to presetting, by the 1st in 100 characteristic layers to the 30th this it is 30 first (can also be other numbers Amount) characteristic layer as low-level feature layer, using the 31st to the 100th this rear 70 characteristic layer as high-level characteristic layer.For example, For pyramid scene parsing network (Pyramid Scene Parsing Network, PSPN), which can include Four part convolutional networks (conv1 to conv4) and a classification layer, each section convolutional network include multiple convolutional layers again, can With according to calculation amount size, using in the PSPN from conv1 to conv4_3 in convolutional layer as low-level feature layer, account for The calculation amount of the PSPN about 1/8, using in the PSPN from conv4_4 classify layer to the end before each convolutional layer as high-level characteristic Layer accounts for the calculation amount of PSPN about 7/8；Classification layer is used to carry out semantic segmentation to the high-level characteristic that high-level characteristic layer exports. The feature propagation provided based on the above embodiment of the present invention puts method, when present frame is non-key frame in video, according to working as The adjacent low-level feature of previous key frame of previous frame and the low-level feature of present frame are obtained by the high-level characteristic of previous key frame and worked as The high-level characteristic of previous frame carries out semantic segmentation to be based on the high-level characteristic to non-key frame.The embodiment of the present invention, which is utilized, to be regarded Consensus information between frequency frame, using the semantic label between contiguous frames it is close the characteristics of, will be used to carrying out video semanteme point The high-level characteristic cut travels to present frame from adjacent previous key frame, so as to based on the high-level characteristic of the present frame to present frame Semantic segmentation is carried out, without extracting the high-level characteristic for semantic segmentation frame by frame to video successive frame, is used relative to extraction frame by frame In the mode of the high-level characteristic of semantic segmentation, reduce and compute repeatedly the time；In addition, the embodiment of the present invention is by previous key frame High-level characteristic travel to present frame for semantic segmentation indirect propagation semantic label, relative to light stream propagate key frame language The mode of adopted label improves the accuracy of semantic segmentation.

In one of embodiment of various embodiments of the present invention, in operation 102, according to the adjacent previous pass of present frame The low-level feature of key frame and the low-level feature of present frame obtain the high-level characteristic of present frame by the high-level characteristic of previous key frame, It can include：

According to the adjacent low-level feature of previous key frame and the low-level feature of present frame, obtain from the low of previous key frame Layer eigentransformation to the low-level feature of present frame conversion weights；

According to the high-level characteristic of previous key frame and conversion weights, the high-level characteristic of previous key frame is converted into present frame High-level characteristic, this feature is the feature propagated from previous key frame, also referred to as propagation characteristic.

In a wherein optional example, the low-level feature change from previous key frame can be obtained by multiple convolutional layers Change to the conversion weights of the low-level feature of present frame.

In another embodiment of feature of present invention transmission method, it can also include：In response to present frame in video Non-key frame, at least high-level characteristic based on present frame carries out semantic segmentation to present frame, obtains the semantic mark of present frame Label.

In a wherein embodiment, at least high-level characteristic based on present frame carries out semantic segmentation to present frame, can To include：Low-level feature and high-level characteristic based on present frame carry out semantic segmentation to present frame, obtain the semantic mark of present frame Label.

In practical applications, the port number of first network layer that extraction obtains high-level characteristic is typically more than extraction and obtains low layer The port number of the first network layer of feature, in order to which the low-level feature of present frame and high-level characteristic are merged, one wherein In exemplary, low-level feature and high-level characteristic based on present frame carry out semantic segmentation to present frame, can include：

The low-level feature of present frame is converted, obtains the feature consistent with the port number of the high-level characteristic of present frame；

The feature that present frame is converted to is spliced or merged with the high-level characteristic of present frame, it is special to obtain present frame Sign；

Based on present frame feature, semantic segmentation is carried out to present frame.

In the above embodiment of the present invention, the high-level characteristic and present frame levied by the high-level characteristic of previous key frame are merged Feature for semantic segmentation, without using the feature for calculating the big single frames model of cost and obtaining non-key frame, reducing While calculation amount, the accuracy of semantic segmentation ensure that.

In addition, in the further embodiment of feature of present invention transmission method, can also cache each after previous key frame The high-level characteristic of non-key frame, when present frame is non-key frame, feature and the high level of present frame that present frame is converted to Feature, the high-level characteristic of previous key frame and previous key frame and the high-level characteristic of each non-key frame before present frame carry out Splicing or fusion obtain present frame feature and based on present frame feature, semantic segmentation are carried out to present frame.

Based on the embodiment, the high-level characteristics of all cachings between previous key frame and present frame can be propagated to current Frame, and spliced or merged to carry out semantic segmentation, it is more robust that acquisition can be obtained under minimum fusion cost in this way Semantic segmentation effect.

In an embodiment of various embodiments of the present invention, key frame scheduling strategy therein can be regular length tune Degree method, such as be judged as a key frame every the frame of l~5, i.e.,：Can be judged using regular length scheduling method present frame whether be Key frame.

Fig. 2 is the flow chart of another embodiment of feature of present invention transmission method.As shown in Fig. 2, the feature of the embodiment Transmission method includes：

202, judge whether present frame is key frame using regular length scheduling method.

If whether present frame is key frame, operation 212 is performed.Otherwise, it if present frame is the non-key frame in video, performs Operation 204.

204, to present frame (also referred to as：Current non-key frame) feature extraction is carried out, obtain the low-level feature of present frame.

It, can be by the low-level feature layer of neural network (i.e. in an example of various embodiments of the present invention：First network Layer) feature extraction is carried out to present frame, obtain the low-level feature of present frame.

206, according to the adjacent low-level feature of previous key frame of present frame and the low-level feature of present frame, obtain before this The low-level feature of one key frame transforms to the conversion weights of the low-level feature of present frame.

Wherein, conversion weights can be previous key frame low-level feature and the low-level feature of present frame the two features it Between transition matrix, low-level feature including previous key frame and in the low-level feature of present frame, between the feature put pixel-by-pixel Switch element.

208, according to the high-level characteristic of the previous key frame and conversion weights, the high-level characteristic of the previous key frame is converted High-level characteristic for present frame.

210, low-level feature and high-level characteristic based on present frame carry out semantic segmentation to present frame, obtain present frame Semantic label.

Flow to step 210 semantic segmentation terminates, and later, does not perform the follow-up process of the present embodiment.

212, to present frame (also referred to as：Current key frame) feature extraction is carried out, it obtains the low-level feature of present frame and delays It deposits.

It, can be by the low-level feature layer of neural network (i.e. in a wherein example：First network layer) to present frame Carry out feature extraction.

214, feature extraction is carried out to the low-level feature of present frame, obtains the high-level characteristic and caching of present frame.

It, can be by the high-level characteristic layer of neural network (i.e. in a wherein example：Second network layer) to present frame Low-level feature carry out feature extraction.

216, the high-level characteristic based on present frame carries out semantic segmentation to present frame, obtains the semantic label of present frame.

In various embodiments of the present invention, the low-level feature layer that key frame and non-key frame can share neural network carries out low layer PSPN may be used in feature extraction, neural network herein, which can include four part convolutional networks, and (conv1 is arrived Conv4 it) is divided into a classification layer, each section convolutional network as multiple convolutional layers, wherein, the low-level feature layer of neural network Convolutional layer in can including in PSPN from conv1 to conv4_3, accounts for the calculation amount of PSPN about 7/8；The high level of neural network Characteristic layer can account for the calculation amount of PSPN about 1/8, for carrying including each convolutional layer before classifying layer to the end from conv4_4 Take the high-level characteristic of key frame；Layer of classifying, which is used to correspond to based on key frame or the high-level characteristic of non-key frame, identifies key frame or non- The classification of at least one pixel in key frame, so as to fulfill the semantic segmentation to key frame or non-key frame.

In various embodiments of the present invention, for key frame, it can call and calculate the big single frames model of cost, such as PSPN is carried out Semantic segmentation, so as to obtain high-precision semantic segmentation result.It, can be adaptive by the high-level characteristic of key frame for non-key frame That answers travels to present frame, obtains the high-level characteristic of present frame, takes full advantage of the consensus information between video successive frame, keeps away Exempt to compute repeatedly the time, low-level feature and high-level characteristic based on present frame, semantic segmentation is carried out to present frame, obtained current The semantic label of frame.The present embodiment is the semantic segmentation precision for ensuring key frame, it is not required that the list big using cost is calculated Frame model carries out semantic segmentation frame by frame to non-key frame, reduces computation complexity and calculates the time, saves computing resource.

Fig. 3 is the flow chart of another embodiment of feature of present invention transmission method.As shown in figure 3, the feature of the embodiment Transmission method includes：

302, feature extraction is carried out to present frame, obtains the low-level feature of present frame.

304, according to the adjacent low-level feature of previous key frame of present frame and the low-level feature of present frame, obtain present frame It is scheduled as the scheduling probability value of key frame.

In a wherein example, the low-level feature of the low-level feature of previous key frame and present frame can be spelled It connects, and obtained splicing feature is inputted into a key frame dispatch network, obtained by the key frame dispatch network based on the splicing feature Take whether present frame should be scheduled as the scheduling probability value of key frame.

306, determine whether present frame is scheduled as key frame according to the scheduling probability value of present frame.

If whether present frame is key frame, operation 314 is performed.Otherwise, it if present frame is the non-key frame in video, performs Operation 308.

308, according to present frame (also referred to as：Current non-key frame) adjacent previous key frame low-level feature and present frame Low-level feature, obtain the conversion weights of the low-level feature from the low-level feature of the previous key frame to present frame.

310, according to the high-level characteristic of the previous key frame and conversion weights, the high-level characteristic of previous key frame is converted to The high-level characteristic of present frame.

312, low-level feature and high-level characteristic based on present frame carry out semantic segmentation to present frame, obtain present frame Semantic label.

Later, the follow-up process of the present embodiment is not performed.

314, to present frame (also referred to as：Current key frame) feature extraction is carried out, it obtains the low-level feature of present frame and delays It deposits.

316, feature extraction is carried out to the low-level feature of present frame, obtains the high-level characteristic and caching of present frame.

318, the high-level characteristic based on present frame carries out semantic segmentation to present frame, obtains the semantic label of present frame.

The embodiment of the present invention can be used for the internets amusement production such as automatic Pilot scene, video monitoring scene, portrait segmentation Product etc., such as：

1, under the scene of automatic Pilot, the target Fast Segmentation in video can be come out using the embodiment of the present invention, Such as people and vehicle；

2, in video monitoring scene, people can quickly be split；

3, in the internets amusing products such as portrait segmentation, quickly people can be split from video frame.

One of ordinary skill in the art will appreciate that：Realizing all or part of step of above method embodiment can pass through The relevant hardware of program instruction is completed, and aforementioned program can be stored in a computer read/write memory medium, the program When being executed, step including the steps of the foregoing method embodiments is performed；And aforementioned storage medium includes：ROM, RAM, magnetic disc or light The various media that can store program code such as disk.

Fig. 4 is the structure diagram of feature of present invention transmission device one embodiment.The feature of various embodiments of the present invention passes Broadcasting device can be used for realizing the feature propagation method of the various embodiments described above.As shown in figure 4, the feature propagation of one of embodiment Device includes：Judgment module and feature propagation module.Wherein：

Judgment module, for judging whether present frame is key frame.

Feature propagation module is non-key in video in response to present frame for the judging result according to judgment module Frame, according to the adjacent low-level feature of previous key frame of present frame and the low-level feature of present frame, by the high level of previous key frame Feature obtains the high-level characteristic of present frame.

Wherein, in neural network, extraction obtains the network depth of the first network layer of the low-level feature of previous key frame, shallow The network depth of corresponding second network layer of high-level characteristic of previous key frame is obtained in extraction.

The feature propagation provided based on the above embodiment of the present invention puts device, the non-key frame in present frame is video When, according to the adjacent low-level feature of previous key frame of present frame and the low-level feature of present frame, by the high level of previous key frame Feature obtains the high-level characteristic of present frame, and semantic segmentation is carried out to non-key frame to be based on the high-level characteristic.The present invention is implemented Consensus information between video frame is utilized in example, using the semantic label between contiguous frames it is close the characteristics of, will be used to carry out The high-level characteristic of video semanteme segmentation travels to present frame from adjacent previous key frame, so that the high level based on the present frame is special Sign carries out semantic segmentation to present frame, without extracting the high-level characteristic for semantic segmentation frame by frame to video successive frame, relative to Extraction reduces for the mode of the high-level characteristic of semantic segmentation and computes repeatedly the time frame by frame；In addition, the embodiment of the present invention is by before The high-level characteristic of one key frame travel to present frame for semantic segmentation indirect propagation semantic label, passed relative to light stream The mode of key frame semantic label is broadcast, improves the accuracy of semantic segmentation.

In wherein one embodiment mode, feature propagation module is specifically used for：According to the low-level feature of previous key frame With the low-level feature of present frame, the low-level feature obtained from previous key frame transforms to the conversion right of the low-level feature of present frame Value；And according to the high-level characteristic of previous key frame and conversion weights, the high-level characteristic of previous key frame is converted into present frame High-level characteristic.

Fig. 5 is the structure diagram of another embodiment of feature of present invention transmission device.As shown in figure 5, with real shown in Fig. 4 It applies example to compare, the feature propagation device of the embodiment further includes：Semantic segmentation module, for the judgement knot according to judgment module Fruit is the non-key frame in video, at least high-level characteristic based on present frame in response to present frame, semantic point is carried out to present frame It cuts, obtains the semantic label of present frame.

In wherein one embodiment mode, the semantic segmentation module at least high-level characteristic based on present frame, to present frame When carrying out semantic segmentation, specifically for low-level feature and high-level characteristic based on present frame, semantic segmentation is carried out to present frame.

In a wherein optional example, low-level feature and high-level characteristic of the semantic segmentation module based on present frame, to working as When previous frame carries out semantic segmentation, it is specifically used for：The low-level feature of present frame is converted, obtains the high-level characteristic with present frame The consistent feature of port number；The feature that present frame is converted to is spliced or merged with the high-level characteristic of present frame, is obtained Obtain present frame feature；And based on present frame feature, semantic segmentation is carried out to present frame.

In an embodiment of the above-mentioned each feature propagation device embodiment of the present invention, judgment module is specifically used for utilizing Key frame scheduling strategy judges whether present frame is key frame.

In a wherein optional example, judgment module is specifically used for whether judging present frame using regular length scheduling method For key frame.Correspondingly, referring back to Fig. 5, the feature propagation device of another embodiment can also include：Fisrt feature extracts mould Block is the non-key frame in video in response to present frame for the judging result according to judgment module, feature is carried out to present frame Extraction obtains the low-level feature of present frame.

Alternatively, referring to Fig. 6, in the feature propagation device of another embodiment, fisrt feature extraction module can also be included And acquisition module.Wherein：Fisrt feature extraction module, for carrying out feature extraction to present frame, the low layer for obtaining present frame is special Sign.Acquisition module, for according to the adjacent low-level feature of previous key frame and the low-level feature of present frame, obtaining present frame quilt It is scheduling to the scheduling probability value of key frame.Correspondingly, in the embodiment, judgment module is general specifically for the scheduling according to present frame Rate value determines whether present frame is scheduled as key frame.

In a wherein embodiment, acquisition module can include：Concatenation unit, for the low layer to previous key frame The low-level feature of feature and present frame is spliced, and obtains splicing feature；Key frame dispatch network obtains for being based on splicing feature Take whether present frame should be scheduled as the scheduling probability value of key frame.

Illustratively, in the feature propagation device of the various embodiments described above, fisrt feature extraction module can be additionally used in basis The judging result of judgment module, is the key frame in video in response to present frame, and feature extraction is carried out to present frame, is obtained current The low-level feature and caching of frame.Referring back to Fig. 5 or Fig. 6, in the feature propagation device of further embodiment, can also include： Second feature extraction module for the judging result according to judgment module, carries out feature extraction to the low-level feature of key frame, obtains Obtain the high-level characteristic and caching of key frame.

Optionally, in the feature propagation device of the various embodiments described above, semantic segmentation module can be additionally used according to judging mould The judging result of block, in response to present frame for the key frame in video, the high-level characteristic based on present frame carries out language to present frame Justice segmentation obtains the semantic label of present frame.

In addition, the embodiment of the present invention additionally provides a kind of electronic equipment, include the feature of any of the above-described embodiment of the present invention Transmission device.

In addition, the embodiment of the present invention additionally provides another electronic equipment, including：

Memory, for storing executable instruction；And

One or more processors, for communicating to perform executable instruction above-mentioned thereby completing the present invention with memory The operation of the feature propagation method of one embodiment.

The feature propagation device of processor and any of the above-described embodiment of the present invention；

In processor operation characteristic transmission device, the unit in the feature propagation device of any of the above-described embodiment of the present invention It is run.

Fig. 7 is the structure diagram of one Application Example of electronic equipment of the present invention.Below with reference to Fig. 7, it illustrates suitable In for realizing the structure diagram of the electronic equipment of the terminal device of the embodiment of the present application or server.As shown in fig. 7, the electricity Sub- equipment includes one or more processors, communication unit etc., and one or more of processors are for example：One or more centres Manage unit (CPU) and/or one or more image processor (GPU) etc., processor can be according to being stored in read-only memory (ROM) executable instruction in is held from the executable instruction that storage section is loaded into random access storage device (RAM) Row various appropriate actions and processing.Communication unit may include but be not limited to network interface card, and the network interface card may include but be not limited to IB (Infiniband) network interface card, processor can communicate to perform executable finger with read-only memory and/or random access storage device It enables, is connected by bus with communication unit and communicated through communication unit with other target devices, provided so as to complete the embodiment of the present application The corresponding operation of either method, for example, judging whether present frame is key frame；It is non-in video in response to the present frame Key frame, according to the low-level feature of the adjacent previous key frame of the present frame and the low-level feature of the present frame, by described The high-level characteristic of previous key frame obtains the high-level characteristic of the present frame；Wherein, in neural network, extraction obtains described previous The network depth of the corresponding first network layer of low-level feature of key frame is shallower than extraction and obtains the high level spy of the previous key frame Levy the network depth of corresponding second network layer.

In addition, in RAM, it can also be stored with various programs and data needed for device operation.CPU, ROM and RAM lead to Bus is crossed to be connected with each other.In the case where there is RAM, ROM is optional module.RAM store executable instruction or at runtime to Executable instruction is written in ROM, executable instruction makes processor perform the corresponding operation of any of the above-described method of the present invention.Input/ Output (I/O) interface is also connected to bus.Communication unit can be integrally disposed, may be set to be with multiple submodule (such as Multiple IB network interface cards), and in bus link.

I/O interfaces are connected to lower component：Include the importation of keyboard, mouse etc.；Including such as cathode-ray tube (CRT), the output par, c of liquid crystal display (LCD) etc. and loud speaker etc.；Storage section including hard disk etc.；And including all Such as communications portion of the network interface card of LAN card, modem.Communications portion performs logical via the network of such as internet Letter processing.Driver is also according to needing to be connected to I/O interfaces.Detachable media, such as disk, CD, magneto-optic disk, semiconductor are deposited Reservoir etc. is installed as needed on a drive, in order to be mounted into as needed from the computer program read thereon Storage section.

Need what is illustrated, framework as shown in Figure 7 is only a kind of optional realization method, can root during concrete practice The component count amount and type of above-mentioned Fig. 7 are selected, are deleted, increased or replaced according to actual needs；It is set in different function component Put, can also be used it is separately positioned or integrally disposed and other implementations, such as GPU and CPU separate setting or can be by GPU collection Into on CPU, communication unit separates setting, can also be integrally disposed on CPU or GPU, etc..These interchangeable embodiments Each fall within protection domain disclosed by the invention.

In addition, the embodiment of the present invention additionally provides a kind of computer storage media, for storing computer-readable finger It enables, which is performed the operation for realizing any of the above-described embodiment feature propagation method of the present invention.

In addition, the embodiment of the present invention additionally provides a kind of computer program, including computer-readable instruction, work as calculating When the instruction that machine can be read is run in a device, it is special that the processor execution in equipment is used to implement any of the above-described embodiment of the present invention Levy the executable instruction of the step in transmission method.

In an optional embodiment, the computer program is specially software product, such as software development kit (Software Development Kit, SDK), etc..

In one or more optional embodiments, the embodiment of the present invention additionally provides a kind of computer program program production Product, for storing computer-readable instruction, described instruction is performed so that computer performs any of the above-described possible realization side Feature propagation method described in formula.

The computer program product can be realized especially by hardware, software or its mode combined.In an alternative embodiment In son, the computer program product is embodied as computer storage media, in another optional example, the computer Program product is embodied as software product, such as SDK etc..

Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with its The difference of its embodiment, the same or similar part cross-reference between each embodiment.For device embodiment For, since it is substantially corresponding with embodiment of the method, so description is fairly simple, referring to the portion of embodiment of the method in place of correlation It defends oneself bright.

Methods and apparatus of the present invention may be achieved in many ways.For example, can by software, hardware, firmware or Software, hardware, firmware any combinations realize methods and apparatus of the present invention.The said sequence of the step of for the method Merely to illustrate, the step of method of the invention, is not limited to sequence described in detail above, special unless otherwise It does not mentionlet alone bright.In addition, in some embodiments, the present invention can be also embodied as recording program in the recording medium, these programs Including being used to implement machine readable instructions according to the method for the present invention.Thus, the present invention also covering stores to perform basis The recording medium of the program of the method for the present invention.

Description of the invention provides for the sake of example and description, and is not exhaustively or will be of the invention It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those of ordinary skill in the art is enable to manage The solution present invention is so as to design the various embodiments with various modifications suitable for special-purpose.

Claims

1. a kind of feature propagation method, which is characterized in that including：

Judge whether present frame is key frame；

It is the non-key frame in video in response to the present frame, it is special according to the low layer of the adjacent previous key frame of the present frame It seeks peace the low-level feature of the present frame, the high-level characteristic of the present frame is obtained by the high-level characteristic of the previous key frame； Wherein, in neural network, extraction obtains the network depth of the corresponding first network layer of low-level feature of the previous key frame, shallow The network depth of corresponding second network layer of high-level characteristic of the previous key frame is obtained in extraction.

2. the according to the method described in claim 1, it is characterized in that, previous key frame adjacent according to the present frame The low-level feature of low-level feature and the present frame obtains the high level of the present frame by the high-level characteristic of the previous key frame Feature, including：

According to the low-level feature of adjacent previous key frame and the low-level feature of the present frame, obtain from the previous key frame Low-level feature transform to the present frame low-level feature conversion weights；

According to the high-level characteristic of the previous key frame and the conversion weights, the high-level characteristic of the previous key frame is converted High-level characteristic for the present frame.

3. method according to claim 1 or 2, which is characterized in that in response to the present frame be non-key in video Frame further includes：

High-level characteristic at least based on the present frame carries out semantic segmentation to the present frame, obtains the language of the present frame Adopted label.

4. according to the method described in claim 3, it is characterized in that, the high-level characteristic at least based on the present frame, right The present frame carries out semantic segmentation, including：

Low-level feature and high-level characteristic based on the present frame carry out semantic segmentation to the present frame, obtain described current The semantic label of frame.

The feature that the present frame is converted to is spliced or merged with the high-level characteristic of the present frame, obtains present frame Feature；

5. a kind of feature propagation device, which is characterized in that including：

Judgment module, for judging whether present frame is key frame；

Feature propagation module is non-key in video in response to present frame for the judging result according to the judgment module Frame, according to the low-level feature of the adjacent previous key frame of the present frame and the low-level feature of the present frame, by described previous The high-level characteristic of key frame obtains the high-level characteristic of the present frame；Wherein, in neural network, extraction obtains the previous key The network depth of the first network layer of the low-level feature of frame, be shallower than extraction obtain the previous key frame high-level characteristic it is corresponding The network depth of second network layer.

6. a kind of electronic equipment, which is characterized in that including：Feature propagation device described in claim 5.

7. a kind of electronic equipment, which is characterized in that including：

Feature propagation device described in processor and claim 5；

When processor runs the feature propagation device, the unit in feature propagation device described in claim 5 is run.

8. a kind of electronic equipment, which is characterized in that including：Processor and memory；

For the memory for storing an at least executable instruction, the executable instruction makes the processor perform claim requirement The operation of each step in any feature propagation methods of 1-4.

9. a kind of computer program, including computer-readable code, which is characterized in that when the computer-readable code is in equipment During upper operation, the processor execution in the equipment is used to implement each in any feature propagation methods of claim 1-4 The instruction of step.

10. a kind of computer-readable medium, for storing computer-readable instruction, which is characterized in that described instruction is held The operation of each step in any feature propagation methods of claim 1-4 is realized during row.