CN115695944B

CN115695944B - Vehicle-mounted image processing method and device, electronic equipment and medium

Info

Publication number: CN115695944B
Application number: CN202211712672.6A
Authority: CN
Inventors: 李少君; 汪骏; 张富国
Original assignee: Beijing China Tsp Technology Co ltd
Current assignee: Beijing China Tsp Technology Co ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-03-28
Anticipated expiration: 2042-12-30
Also published as: CN115695944A

Abstract

The application provides a vehicle-mounted image processing method, a device, an electronic device and a medium, comprising the following steps: when the video synthesis condition is met, acquiring each frame image of the vehicle-mounted video within a first preset time period; wherein the image carries a video time stamp; classifying each frame of image through a trained scene classification model to obtain the scene category and the scene matching degree of each frame of image; the method comprises the steps that target images with scene matching degrees meeting preset matching degree conditions in target scene categories are sorted according to video timestamps; the score of each frame of target image is obtained through the trained beauty assessment model, and the continuous frames of target images meeting the preset score condition are synthesized into the target beautiful scene video, so that the high-quality beautiful scene video or beautiful scene image is automatically generated, and a user does not need to perform special operation in the driving process.

Description

Vehicle-mounted image processing method and device, electronic equipment and medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for processing a vehicle-mounted image, an electronic device, and a medium.

Background

With the rapid development of automobiles and traffic, users have more and more demands on driving and traveling. During the journey, users can meet the requirements of photographing, recording videos and the like to different degrees for recording the journey process. For example: when the user passes through a beautiful scene during driving, the user can record the beautiful scene shot.

Current user can only carry out initiative trigger mode control such as manually operation or gesture recognition through vehicle event data recorder or vehicle-mounted camera in the car and shoot or record a video, can influence driving safety to a certain extent, has the potential safety hazard, and the photo quality through vehicle event data recorder or vehicle-mounted camera shooting moreover can not satisfy user's demand necessarily.

Disclosure of Invention

In view of the above, an object of the present application is to provide a vehicle-mounted image processing method, device, electronic device and medium, which can automatically generate a high-quality beautiful scene video or beautiful scene image according to an image recorded by a vehicle-mounted recorder or a vehicle-mounted camera, without requiring a user to perform a special operation during driving.

The application provides a vehicle-mounted image processing method, which comprises the following steps:

when the video synthesis condition is met, acquiring each frame image of the vehicle-mounted video within a first preset time period; wherein the image carries a video time stamp;

classifying each frame of image through a trained scene classification model to obtain the scene category and the scene matching degree of each frame of image;

the method comprises the steps that target images with scene matching degrees meeting preset matching degree conditions in target scene categories are sorted according to video timestamps;

and obtaining the score of each frame of target image through the trained aesthetic degree evaluation model, and synthesizing the continuous frames of target images meeting the preset score condition into the target beautiful scene video.

In some embodiments, in the vehicle-mounted image processing method, after synthesizing a target beautiful scene video from a plurality of target images satisfying a continuous frame condition and a preset scoring condition, the method further includes:

and pushing the target beautiful scene video to a vehicle-mounted display screen so that the vehicle-mounted display screen displays the target beautiful scene video.

In some embodiments, in the vehicle-mounted image processing method, before sorting target images in the target scene category according to the video timestamps, the method further includes:

dividing the first preset time period into a plurality of second preset time periods according to the video time stamp of each frame image, and determining the number of frames of images with scene matching degree greater than or equal to a preset matching degree threshold value in the second preset time period;

if the determined frame number is larger than the first frame number threshold, reserving all images in a second preset time period as images with scene matching degrees meeting preset matching degree conditions;

and determining a target image of the target scene type from the reserved image according to the scene type of the reserved image.

In some embodiments, after obtaining the score of each frame of target image through the trained aesthetic evaluation model, the method further includes:

and respectively calculating the grading mean values of the multiple groups of continuous frame target images, and determining the continuous frame target images meeting the preset grading conditions according to the grading mean values.

In some embodiments, in the vehicle-mounted image processing method, the trained beauty assessment model is obtained by training according to an image index; the image index is at least one of: definition, color, tone, depth of field, beauty and composition.

In some embodiments, the vehicle-mounted image processing method further includes:

when the beautiful scene image acquisition condition is met, after the score of each frame of target image is acquired through the trained beauty degree evaluation model, the scores of other images in the target scene category are also acquired through the beauty degree evaluation model;

and determining a preset number of images with prior grading ranking in the target scene as target beautiful scene images.

In some embodiments, in the vehicle-mounted image processing method, after synthesizing target images of consecutive frames meeting a preset scoring condition into a target beautiful scene video, the method further includes:

responding to the condition of meeting the music adding condition, and acquiring target background music matched with the labels from a background music library according to the target scene category of the target beautiful scene video and the preset label matching relationship; the label matching relationship represents the matching relationship between the label of the background music and the scene category;

and adding the acquired target background music to the target beautiful scene video to obtain the target beautiful scene video carrying the background music.

In some embodiments, there is also provided an in-vehicle image processing apparatus, the apparatus including:

the acquisition module is used for acquiring each frame of image of the vehicle-mounted video within a first preset time period when the video synthesis condition is met; wherein the image carries a video time stamp;

the classification module is used for classifying each frame of image through the trained scene classification model to obtain the scene category and the scene matching degree of each frame of image;

the sequencing module is used for sequencing the target images according to the video timestamps aiming at the target images with the scene matching degree meeting the preset matching degree condition in the target scene category;

and the synthesis module is used for acquiring the score of each frame of target image through the trained aesthetic degree evaluation model and synthesizing the continuous frames of target images meeting the preset score condition into the target beautiful scene video.

In some embodiments, there is also provided an electronic device comprising: a processor, a memory and a bus, the memory storing machine readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine readable instructions being executed by the processor to perform the steps of the in-vehicle image processing method.

In some embodiments, a computer-readable storage medium is also provided, having stored thereon a computer program which, when being executed by a processor, performs the steps of the in-vehicle image processing method.

The embodiment of the application provides a vehicle-mounted image processing method, a vehicle-mounted image processing device, electronic equipment and a medium, wherein after images shot by vehicle-mounted shooting equipment are obtained, firstly, each frame of image is classified through a trained scene classification model, images belonging to beautiful scenes are screened out, and scene categories of the images are determined; then, the target images in each scene category are scored by using the aesthetics evaluation model, high-quality continuous target images are screened out through secondary scoring, and a target beautiful scene video is automatically synthesized, so that a user does not need to carry out independent shooting operation on the seen beautiful scene in the driving process, and potential safety hazards are avoided; the problem of shooting time lag does not exist, and high-grade images are screened out through the aesthetic degree evaluation model, so that the quality of the images and the beautiful scene videos meets the requirements of users.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a flowchart illustrating a method of an onboard image processing method according to an embodiment of the present application;

fig. 2 is a flowchart illustrating a method for determining a target image in a target scene category whose scene matching degree meets a preset matching degree condition according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for obtaining a target beautiful scene image according to an embodiment of the present application;

fig. 4 is a flowchart illustrating a method for obtaining a target beautiful scene video carrying background music according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a vehicle-mounted image processing apparatus according to an embodiment of the present application;

fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Further, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

With the rapid development of automobiles and traffic, users have more and more demands on driving and traveling. During a journey, users can have the requirements of taking pictures, recording videos and the like to different degrees for recording the journey process. For example: when the user passes through a beautiful scene during driving, the user can record the beautiful scene shot.

Currently, a user can only control photographing or video recording in an automobile through an active triggering mode such as manual operation or gesture recognition through an automobile data recorder or a vehicle-mounted camera. The shooting operation is performed by distraction in the driving process, so that the driving safety is influenced to a certain extent, and potential safety hazards exist; and the photo that shoots through vehicle event data recorder or on-vehicle camera can't select the opportunity of shooting and angle, for example when the user sees the beautiful scene and operates again, probably missed the time of shooting this view, there are exposure, fuzzy scheduling problem moreover easily, the quality often can not satisfy user's demand.

Based on this, after an image shot by vehicle-mounted shooting equipment is obtained, classifying each frame of image through a trained scene classification model, screening out an image belonging to a beautiful scene, and determining a scene type of the image; then, the target images in each scene category are scored by using the aesthetics evaluation model, high-quality continuous target images are screened out through secondary scoring, and a target beautiful scene video is automatically synthesized, so that a user does not need to carry out independent shooting operation on the seen beautiful scene in the driving process, and potential safety hazards are avoided; the problem of shooting time lag does not exist, and high-grade images are screened out through the aesthetic degree evaluation model, so that the quality of the images and the beautiful scene videos meets the requirements of users.

Referring to fig. 1, fig. 1 shows a flowchart of a method of processing a vehicle-mounted image according to an embodiment of the present application, and specifically, the method of processing the vehicle-mounted image includes the following steps S101 to S104:

s101, when a video synthesis condition is met, acquiring each frame image of the vehicle-mounted video within a first preset time period; wherein the image carries a video time stamp;

s102, classifying each frame of image through the trained scene classification model to obtain the scene type and the scene matching degree of each frame of image;

s103, sequencing the target images according to the video time stamps aiming at the target images with the scene matching degree meeting the preset matching degree condition in the target scene category;

and S104, obtaining the score of each frame of target image through the trained aesthetic degree evaluation model, and synthesizing the continuous frames of target images meeting the preset score condition into the target beautiful scene video.

The vehicle-mounted image processing method can be operated on terminal equipment or a server; the terminal device may be a car machine device, and when the entity positioning method runs on the server, the entity positioning method may be implemented and executed based on a cloud interaction system, where the cloud interaction system at least includes the server and the terminal device (that is, the car machine device).

Specifically, taking the application to the car equipment as an example, when the vehicle-mounted image processing method is operated on the car equipment, the entity positioning method is used for processing the image shot by the vehicle-mounted shooting equipment to obtain the target beautiful scene video.

Here, the vehicle-mounted shooting equipment is used for representing shooting devices (such as a driving recorder, a vehicle-mounted camera and the like) installed on the vehicle, wherein the configuration is different in consideration of different vehicle models, and therefore, the specific installation position and the specific number of the vehicle-mounted shooting equipment are different, and therefore, the specific number and the specific installation position of the vehicle-mounted shooting equipment are not specifically limited in the embodiments of the present application.

In step S101, the video composition condition is satisfied: and receiving a starting signal of the vehicle-mounted shooting equipment and receiving a video synthesis signal.

Specifically, the video composite signal is generated according to an instant operation or a preset configuration of a user for the car machine device.

Illustratively, when a user opens the car-mounted device, and the car-mounted device receives the vehicle-mounted shooting device opening signal, the video synthesis control is automatically displayed on a human-computer interaction interface of the car-mounted device, and when a determination operation for the video synthesis control is received, the car-mounted device generates a video synthesis signal.

Or, the preset configuration of the in-vehicle device in the historical operation (such as the startup setting) is a video synthesis mode, and when the in-vehicle device receives the start signal of the vehicle-mounted shooting device, a video synthesis signal is automatically generated, so that a user does not need to perform determination operation on a video synthesis control, and the operation is simplified.

The preset configuration is determined according to historical operation of a user. For example, the preset configuration of this time is determined according to the determination operation of the video synthesis control displayed for the human-computer interaction interface of the car machine device at the last time.

The determination operation for the video composition control may be clicking, touch, dragging, double clicking, long pressing, and the like.

When a user drives a vehicle, the user can call the video synthesis control when the user wants to shoot a landscape, such as a windmill or a field during driving, and perform determination operation on the video synthesis control to start automatic synthesis of a beautiful scene video, and when the user passes through a common road, the user performs closing operation on the video synthesis control to stop automatic synthesis of the beautiful scene video, so that the consumed resources of vehicle equipment are not consumed.

When the user goes out for travel, for example, the user can carry out self-driving travel in tourist resorts such as grasslands, qinghai and the like, the function of automatically synthesizing the beautiful scenery video can be kept for a long time.

And the video time stamp carried in the image is the corresponding vehicle equipment CPU core time of each frame of image shot by the vehicle-mounted shooting equipment. That is to say, the video timestamps carried in the images are generated according to the kernel time of the CPU of the in-vehicle device, so that each frame of image of the vehicle-mounted video acquired in different time periods has a uniform timestamp.

In step S102, the classifying each frame of image through the trained scene classification model to obtain the scene classification and the scene matching degree of each frame of image specifically includes:

inputting each frame image into a trained scene classification model, wherein the scene classification model outputs the scene category and the scene matching degree of each scene classification model; wherein the scene category characterizes a category of beauty scenes, including, by way of example, lakes, lawns, bridges, cities, deserts, mountains, historic sites, and nephelines; the scene matching degree characterizes the similarity of the image and the scene category.

Here, for the scene classification model, each scene category is a label.

The scene classification model calculates the similarity between the image and each label, and takes the label corresponding to the highest similarity as the scene category, and the highest similarity is taken as the scene matching degree.

In the embodiment of the present application, the scene classification model is a mobilenetV2. The mobilenetV2 model has high precision and small calculated amount, and has good performance on classification tasks. The mobilenetv2 utilizes a reverse residual structure and Linear bottleecks (the last layer of the structure adopts a Linear layer) to perform good feature extraction on scene image data.

The LOSS in the scene classification model using the mobilenetV2 uses sparse category cross entropy. The sparse classification cross entropy is used for evaluating the difference situation of the probability distribution obtained by current training and the real distribution. It characterizes the distance of the actual output (probability) from the desired output (probability), i.e. the smaller the value of the cross entropy, the closer the two probability distributions are.

The training process of the scene classification model is as follows: determining sample data based on image data acquired by a vehicle event data recorder; and (4) sending the sample data to a network in batches, and performing feature extraction on the sample data through mobilenetv2 to obtain a predicted value. And performing LOSS calculation on the predicted value and the true value, adjusting the weight of each layer of the network according to the LOSS value obtained by each training, and stopping training until the LOSS is reduced to a first preset training threshold value.

The trained scene classification model can classify each frame of image to obtain the scene category and the scene matching degree of each frame of image; the scene matching degree is actually the probability that the image belongs to the scene change output by the mobilenetV2.

In step S103, the target scene type is any one of the scene types in step S102. That is, step S103 and step S104 are respectively performed for the images in each scene category to obtain at least one target beautiful scene video in each scene category.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for determining a target image in a target scene category whose scene matching degree meets a preset matching degree condition according to an embodiment of the present application; specifically, before executing step S103, before sorting the target images according to the video timestamps for the target images in the target scene category whose scene matching degrees satisfy the preset matching degree condition, the method further includes S201-S203:

s201, dividing a first preset time period into a plurality of second preset time periods according to the video time stamp of each frame of image, and determining the number of frames of the image with the scene matching degree greater than or equal to a preset matching degree threshold value in the second preset time period; and the preset matching degree thresholds in different second preset time periods are different.

S202, if the determined frame number is larger than the first frame number threshold, reserving all images in a second preset time period as images with scene matching degrees meeting preset matching degree conditions;

s203, according to the scene type of the reserved image, determining a target image of the target scene type from the reserved image.

Otherwise, if the determined frame number is smaller than the first frame number threshold, all images in the second preset time period are discarded.

In this embodiment of the present application, the preset matching degree threshold in the second preset time period is dynamically determined according to the scene matching degrees of all images in the second preset time period.

Specifically, the preset matching degree threshold in each second preset time period is determined according to the scene matching degree average of all images in the second preset time period.

Here, the preset matching degree threshold is used for distinguishing whether the image is a beautiful scene image, and if the scene matching degree of the image is smaller than the image with the preset matching degree threshold, the image is a non-beautiful scene image; and if the scene matching degree of the image is larger than or equal to the image with the preset matching degree threshold value, the image is a beautiful scene image.

However, as the vehicle-mounted shooting device can continuously shoot the landscape along the way, the continuity of the finally synthesized target beautiful scene video must be maintained, and if a frame is missing, the jumping of the picture can occur, which affects the viewing effect of the target beautiful scene video; in addition, for one bridge, when a vehicle runs, the head end, the body and the tail end of the bridge are shot firstly; if a single image is screened, only the image of the head end or the tail end of the bridge is shot, the similarity with the scene of the bridge is low and is smaller than a preset matching threshold; however, the final purpose in the embodiment of the application is to synthesize a beautiful scene video, and it is reasonable for the beautiful scene video to sequentially display the head end of the bridge, the body of the bridge and the tail end of the bridge, so that an image with a scene matching degree lower than a preset matching degree threshold value should also be used as a target image with a scene matching degree meeting a preset matching degree condition.

Therefore, in the embodiment of the application, not only a single image is retained or discarded, but also a plurality of images in the second preset time period are integrally retained or discarded, so that the retained images are all continuous frames, the finally synthesized target beautiful scene video is synthesized based on the continuous frame images, the video is smooth and has no frame lack, sudden jump of the picture can not occur, and the beautiful scene images in the scene category can be more completely displayed.

In step S203, since the retained images may belong to a plurality of scene categories, for example, a part belongs to the grass and a part belongs to the bridge, two target scene categories are determined, and the target image in each target scene category is determined according to the scene category of each retained image, for example, the image belonging to the grass and the image belonging to the bridge are determined.

In the step S104, a score of each frame of target image is obtained through a trained beauty assessment model, which is obtained by training according to an image index; the image index is at least one of: definition, color, tone, depth of field, beauty and composition.

That is, the aesthetic degree evaluation model judges whether each frame of target image is attractive from the aspects of definition, color, tone, depth of field, aesthetic degree and composition, and gives a score representing the aesthetic degree of the image, wherein the image is more attractive when the score is higher, and the image is less attractive when the score is lower.

In the embodiment of the present application, the numerical range of the score is: [0, 10]. The 0 score is lowest and the 10 score is highest.

Specifically, in the embodiment of the present application, the beauty assessment model adopts an NIMA image quality assessment model proposed by ***. NIMA is based on the latest deep object recognition (object detection) neural network, and can predict the distribution of human evaluation opinions on images from a direct look and feel (technical point of view) and an attraction degree (aesthetic point of view). The scoring of the neural network has the advantage of being close to the subjective scoring of human beings, so that the method can be used for image quality assessment work.

The NIMA algorithm generates a score histogram for any image-i.e. the images are scored 1-10 points and compared directly to images of the same subject. The design is consistent with the histogram generated by the human scoring system in form, and the evaluation effect is closer to the result of human evaluation.

The training process of the beauty assessment model is as follows: constructing an aesthetic sample data set, and marking each image by workers based on indexes such as definition, color, tone, depth of field, aesthetic degree and composition to obtain a real score of the image;

constructing an aesthetic degree evaluation model based on an NIMA image quality evaluation model, wherein the aesthetic degree evaluation model comprises a baseline network, an FC layer and softmax; the baseline network pre-trains the weights with ImageNet, and during training, the images are firstly zoomed to 256x256 and then randomly cropped 224x224; respectively outputting the probability of obtaining 1 to 10 minutes for the picture by 10 units of the FC layer, thereby predicting the distribution of the beauty score; then calculating the average value and standard deviation of the image aesthetic degree score; and predicting the distribution of the aesthetic degree scores through the CNN, taking the distribution of the scores as a histogram, and simultaneously calculating the EMD-based loss according to the evaluation probability distribution of the marked scores on the image to perform back propagation. EMD-based loss performs well on the ordered classification and is therefore a function of loss.

Specifically, the EMD-based loss is represented by the following formula:

wherein,

obtaining the probability of each score for an accumulated value of probabilities of predictive scores, rather than for independent predictions; />

Accumulating values for the probabilities of the true scores; in label, the higher the score, the greater the cumulative probability. In predictions, the cumulative probability of predictions can also increase monotonically with score, since softmax is used to ensure that each individual probability is greater than zero (and the sum is 1).

Wherein the N represents N pieces of image data; k represents the kth data in the N pieces of image data; r represents a constant, and in the embodiment of the application, r takes a value of 2 to penalize the euclidean distance between CDFs. R =2 allows easier optimization when using gradient descent.

Training and constructing an aesthetic degree evaluation model based on the aesthetic degree sample data to obtain a trained aesthetic degree evaluation model; specifically, the image data in the beauty sample data set is input into the well-constructed beauty evaluation model, parameters of the beauty evaluation model are updated according to the training result until the loss function is reduced to a second preset training threshold, and the training is stopped, so that the well-trained beauty evaluation model is obtained.

That is, the beauty assessment model predicts the distribution of human opinion scores through a convolutional neural network, examines specific pixels and overall aesthetics of an image, thereby predicting how likely a certain rating of the image is to be selected by a person, predicts how much a person likes this image, thereby reliably scoring the image, and is highly relevant to human perception.

After the score of each frame of target image is obtained through the trained aesthetic evaluation model, the method further comprises the following steps:

And finally, synthesizing the continuous frame target images meeting the preset grading condition into a target beautiful scene video.

The grading mean value of the multiple groups of continuous frame target images can be used for dividing the sorted target images according to a time window method; for example, 500 target images are totally contained in the target scene category, and 1-50 frames form a group of continuous frames, 21-70 frames form a group of continuous frames \8230, and \8230, 451-500 frames form a group of continuous frames; the mean score of each set of consecutive frames is calculated separately.

Determining the continuous frame target images meeting the preset grading condition according to the grading mean value, wherein the method comprises the following steps: the preset number of groups of continuous frame target images with the highest score mean value; for example, a set of consecutive frame target images with the highest score mean;

or the continuous frame target image group with the score mean value larger than the preset score threshold value.

In the implementation of the application, a group of continuous frame target images with the highest score mean value is selected, and the continuous frame target images meeting the preset score condition are combined into the target beautiful scene video, that is, the continuous frame target images are spliced together to combine the target beautiful scene video.

Here, the number of frames per set of consecutive frames is equal to or greater than the second frame number threshold.

After synthesizing a target beautiful scene video from a plurality of target images meeting the continuous frame condition and the preset scoring condition, the method further comprises the following steps:

In the embodiment of the application, the target beautiful scene video can be pushed to other terminal equipment so as to view the target beautiful scene video on other terminal equipment.

The other terminal devices can be mobile phones, tablets and the like.

When the images of the vehicle-mounted video in the first preset time period belong to a plurality of scene categories, respectively taking each scene category as a target scene category to obtain a target beautiful scene video of each target scene category;

respectively pushing the target beautiful scene videos of each target scene category to a vehicle-mounted display screen so as to independently display the target beautiful scene videos of each target scene category;

or synthesizing the target beautiful scene videos of the multiple target scene categories into a comprehensive beautiful scene video, and pushing the comprehensive beautiful scene video to a vehicle-mounted display screen so as to display beautiful scenes comprising multiple scene categories in one beautiful scene video.

Referring to fig. 3, fig. 3 is a flowchart illustrating a method for obtaining a target beautiful scene image according to an embodiment of the present application; specifically, in this embodiment of the present application, the vehicle-mounted image processing method further includes:

s301, when the beautiful scene image acquisition condition is met, after the score of each frame of target image is acquired through a trained beauty degree evaluation model, the scores of other images in the target scene category are also acquired through the beauty degree evaluation model;

s302, determining a preset number of images with prior grading and sequencing in a target scene as target beautiful scene images.

The condition for acquiring the beautiful scenery image is as follows: and receiving a starting signal of the vehicle-mounted shooting equipment and receiving a beautiful scene image acquisition signal.

Specifically, the beautiful scene image acquisition signal is generated according to an instant operation or a preset configuration of a user for the car machine device.

Illustratively, when a user opens the vehicle-mounted equipment, and the vehicle-mounted equipment receives the vehicle-mounted shooting equipment opening signal, the beautiful scene image acquisition control is automatically displayed on a human-computer interaction interface of the vehicle-mounted equipment, and when a determination operation for the beautiful scene image acquisition control is received, the vehicle-mounted equipment generates a beautiful scene image acquisition signal.

Or, the preset configuration of the in-vehicle device in the historical operation (such as the startup setting) is a beautiful scene image acquisition mode, and when the in-vehicle device receives the start signal of the vehicle-mounted shooting device, a beautiful scene image acquisition signal is automatically generated, so that the user does not need to perform the determination operation on a beautiful scene image acquisition control, and the operation is simplified.

The preset configuration is determined according to historical operation of a user. For example, the preset configuration of this time is determined according to the determination operation of the beautiful scene image acquisition control displayed for the man-machine interaction interface of the car equipment at the last time.

The determining operation for the beautiful scene image acquisition control can be clicking, touch, dragging, re-clicking, long-pressing and the like.

In the embodiment of the application, the beautiful scene image acquisition control and the video composition control may be displayed in one area or separately displayed.

The user needs to obtain the beautiful scene images besides the beautiful scene video for sending friend circles, keeping souvenirs and the like, and for the beautiful scene images, continuous frames are not required to be guaranteed, so that after the score of each frame of target image is obtained through the trained aesthetic degree evaluation model, the scores of other images in the target scene category are obtained through the aesthetic degree evaluation model, and in fact, the scores are carried out on each frame of image in the target scene category, and several beautiful scene images in the target scene category with the highest score are obtained.

When the images in the first preset time period belong to a plurality of scene categories, a plurality of target beautiful scene images of each scene category are obtained.

And after the beautiful scene image is obtained, pushing the target beautiful scene video to a vehicle-mounted display screen so that the vehicle-mounted display screen displays the target beautiful scene image, and when the vehicle is detected to be in a parking state, responding to the storage operation or the deletion operation of a user aiming at the target beautiful scene image, and storing or deleting the target beautiful scene image.

Because the target beautiful scene image is obtained in the driving process of the user, when the user is detected to be in a parking state, the image operation control is displayed, so that the user can save or delete the target beautiful scene image, and driving safety is guaranteed.

Referring to fig. 4, fig. 4 is a flowchart illustrating a method for obtaining a target beautiful scene video carrying background music according to an embodiment of the present application; specifically, in the embodiment of the present application, after synthesizing target beautiful scene videos from continuous frame target images meeting preset scoring conditions, the method further includes the following steps S401 to S402:

s401, responding to the condition that music is added, and acquiring target background music matched with the labels from a background music library according to the target scene category of the target beautiful scene video and the preset label matching relation; the label matching relationship represents the matching relationship between the label of the background music and the scene category;

s402, adding the acquired target background music to the target beautiful scene video to obtain the target beautiful scene video with the background music.

Therefore, in the embodiment of the application, appropriate background music is automatically added to the target beautiful scene video when the target beautiful scene video is acquired.

The response satisfies a music addition condition, including one of:

responding to the preset to automatically add music or responding to the music adding operation of a user aiming at the man-machine interaction interface of the vehicle-mounted equipment.

The background music library can be a local music library or a background music server.

For the tags of the background music, one background music may have a plurality of tags, such as popular, rock, and the like; or multiple background music under a tag, such as a user creating a rock and roll collection, a soothing music collection, each collection having multiple background music.

And acquiring target background music matched with the labels from a background music library, illustratively, acquiring background music under the labels of a grassland, a mauqin, a humai and the like when the target scene category of the target beautiful scene video is the grassland.

In the embodiment of the present application, a vehicle-mounted image processing apparatus is further provided, please refer to fig. 5, and fig. 5 shows a schematic structural diagram of the vehicle-mounted image processing apparatus according to the embodiment of the present application; specifically, the apparatus comprises:

the obtaining module 501 is configured to obtain each frame of image of the vehicle-mounted video within a first preset time period when a video synthesis condition is met; wherein the image carries a video timestamp;

the classification module 502 is configured to classify each frame of image through the trained scene classification model to obtain a scene type and a scene matching degree of each frame of image;

the sorting module 503 is configured to sort, according to the video timestamps, target images in the target scene category, for which the scene matching degree meets a preset matching degree condition;

and the synthesis module 504 is configured to obtain a score of each frame of target image through the trained aesthetic evaluation model, and synthesize the target beautiful scene video from the continuous frames of target images meeting the preset score condition.

The embodiment of the application provides a vehicle-mounted image processing device, after an image shot by vehicle-mounted shooting equipment is obtained, firstly classifying each frame of image through a trained scene classification model, screening out images belonging to beautiful scenes, and determining scene categories of the images; then, the target images in each scene category are scored by using the aesthetic degree evaluation model, high-quality continuous target images are screened out through twice scoring, and a target beautiful scene video is automatically synthesized, so that a user does not need to carry out independent shooting operation on the seen beautiful scene in the driving process, and potential safety hazards are avoided; the problem of shooting time lag does not exist, and high-grade images are screened out through the aesthetic degree evaluation model, so that the quality of the images and the beautiful scene videos meets the requirements of users.

The vehicle-mounted image processing device in the implementation of the application further comprises:

and the pushing device is used for pushing the target beautiful scene video to a vehicle-mounted display screen after synthesizing the target beautiful scene video from the multi-frame target images meeting the continuous frame conditions and the preset grading conditions so that the vehicle-mounted display screen displays the target beautiful scene video.

the first determining module is used for dividing a first preset time period into a plurality of second preset time periods according to the video time stamp of each frame of image before sequencing the target images according to the video time stamps aiming at the target images with scene matching degrees meeting the preset matching degree condition in the target scene category, and determining the frame number of the images with the scene matching degrees larger than or equal to the preset matching degree threshold value in the second preset time period;

and the calculating module is used for respectively calculating the score mean values of a plurality of groups of continuous frame target images after the score of each frame of target image is obtained through the trained aesthetic evaluation model, and determining the continuous frame target images meeting the preset score conditions according to the score mean values.

In the vehicle-mounted image processing device in the implementation of the application, the trained beauty assessment model is obtained by training according to image indexes; the image index is at least one of: definition, color, hue, depth of field, beauty and composition.

The vehicle-mounted image processing device in the embodiment of the application further comprises:

the second determination module is used for obtaining the scores of other images in the target scene category through the aesthetic degree evaluation model after the scores of each frame of target images are obtained through the trained aesthetic degree evaluation model when the beautiful scene image obtaining conditions are met;

the music adding module is used for synthesizing the continuous frame target images meeting the preset scoring condition into a target beautiful scene video, responding to the condition of meeting the music adding condition, and acquiring target background music matched with the labels from a background music library according to the target scene category of the target beautiful scene video and the preset label matching relationship; the label matching relationship represents the matching relationship between the label of the background music and the scene category;

In an embodiment of the present application, an electronic device is further provided, please refer to fig. 6, where fig. 6 shows a schematic structural diagram of a vehicle-mounted image processing apparatus according to an embodiment of the present application; specifically, the electronic device 600 includes: a processor 602, a memory 601 and a bus, wherein the memory 601 stores machine-readable instructions executable by the processor 602, the processor 602 communicates with the memory 601 through the bus when the electronic device 600 runs, and the machine-readable instructions are executed by the processor 602 to execute the steps of the vehicle-mounted image processing method.

In an embodiment of the present application, a computer-readable storage medium is further provided, having a computer program stored thereon, which, when being executed by a processor, performs the steps of the in-vehicle image processing method.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a platform server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An on-board image processing method, characterized by comprising the steps of:

determining a target image of a target scene type from the reserved image according to the scene type of the reserved image;

the method comprises the steps that target images with scene matching degrees meeting preset matching degree conditions in target scene categories are sequenced according to video timestamps;

2. The vehicle-mounted image processing method according to claim 1, wherein after synthesizing a plurality of frames of target images satisfying a continuous frame condition and a preset scoring condition into a target beautiful scene video, the method further comprises:

3. The vehicle-mounted image processing method according to claim 1, wherein after the score of each frame of target image is obtained through the trained aesthetic degree evaluation model, the method further comprises:

4. The vehicle-mounted image processing method according to claim 1, wherein the trained beauty assessment model is trained according to image indexes; the image index is at least one of: definition, color, tone, depth of field, beauty and composition.

5. The on-vehicle image processing method according to claim 1, characterized in that the method further comprises:

when the beautiful scene image acquisition condition is met, acquiring the score of each frame of target image through a trained beauty degree evaluation model, and acquiring the scores of other images in the target scene category through the beauty degree evaluation model;

6. The on-vehicle image processing method according to claim 1,

after synthesizing the target beautiful scene video from the continuous frame target images meeting the preset scoring condition, the method further comprises the following steps:

7. An in-vehicle image processing apparatus, characterized in that the apparatus comprises:

the sequencing module is used for dividing the first preset time period into a plurality of second preset time periods according to the video time stamp of each frame of image and determining the frame number of the image with the scene matching degree greater than or equal to the preset matching degree threshold value in the second preset time period; if the determined frame number is larger than the first frame number threshold, reserving all images in a second preset time period as images with scene matching degrees meeting preset matching degree conditions; determining a target image of a target scene type from the reserved image according to the scene type of the reserved image;

8. An electronic device, comprising: processor, memory and bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the in-vehicle image processing method according to any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the on-board image processing method according to any one of claims 1 to 6.