CN114444718A

CN114444718A - Training method of machine learning model, signal control method and device

Info

Publication number: CN114444718A
Application number: CN202210097077.XA
Authority: CN
Inventors: 王泽隆; 曾宏生; 周波; 王凡; 陈永锋; 何径舟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-05-06
Anticipated expiration: 2042-01-26
Also published as: CN114444718B

Abstract

The disclosure provides a training method, a signal control method, a device, equipment, a medium and a product of a machine learning model, and relates to the technical field of artificial intelligence, in particular to the technical field of intelligent transportation, reinforcement learning and deep learning. The training method of the machine learning model comprises the following steps: adjusting model parameters of the initial machine learning model based on the parameter noise value to obtain an updated initial machine learning model, wherein the initial machine learning model is obtained by training first traffic sample data; processing second traffic sample data by using the updated initial machine learning model to obtain a first adjustment strategy for the traffic signal lamp; determining a first reference traffic state for the intersection in response to a signal completion of adjusting the traffic signal based on the first adjustment strategy; and adjusting the updated model parameters of the initial machine learning model based on the first reference traffic state and the parameter noise value to obtain a trained machine learning model.

Description

Training method of machine learning model, signal control method and device

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, specifically to the field of intelligent transportation, reinforcement learning, and deep learning technologies, and more specifically, to a training method, a signal control method, an apparatus, an electronic device, a medium, and a program product for a machine learning model.

Background

In the related art, signals of traffic lights are usually adjusted in real time according to traffic states of intersections to alleviate congestion at the intersections. However, when the signals of the traffic lights are adjusted, the adjusting effect is not good, so that it is difficult to better alleviate the congestion at the intersection.

Disclosure of Invention

The present disclosure provides a training method of a machine learning model, a signal control method, an apparatus, an electronic device, a storage medium, and a program product.

According to an aspect of the present disclosure, there is provided a training method of a machine learning model, including: adjusting model parameters of an initial machine learning model based on the parameter noise value to obtain an updated initial machine learning model, wherein the initial machine learning model is obtained by training first traffic sample data, and the first traffic sample data represents the traffic state of the intersection; processing second traffic sample data by using the updated initial machine learning model to obtain a first adjustment strategy for a traffic signal lamp, wherein the second traffic sample data represents the traffic state of the intersection; determining a first reference traffic state for the intersection in response to a signal completion of adjusting the traffic signal based on the first adjustment strategy; adjusting model parameters of the updated initial machine learning model based on the first reference traffic state and the parameter noise value to obtain a trained machine learning model.

According to another aspect of the present disclosure, there is provided a signal control method including: acquiring target traffic data representing the state of the intersection; processing the target traffic data by using a trained machine learning model to obtain an adjusting strategy for the traffic signal lamp, wherein the trained machine learning model is obtained by training by using the method; and adjusting the signal of the traffic signal lamp based on the adjusting strategy.

According to another aspect of the present disclosure, there is provided a training apparatus of a machine learning model, including: the device comprises a first adjusting module, a processing module, a first determining module and a second adjusting module. The first adjusting module is used for adjusting model parameters of the initial machine learning model based on the parameter noise value to obtain an updated initial machine learning model, wherein the initial machine learning model is obtained by training first traffic sample data, and the first traffic sample data represents the traffic state of the intersection; the processing module is used for processing second traffic sample data by using the updated initial machine learning model to obtain a first adjustment strategy for a traffic signal lamp, wherein the second traffic sample data represents the traffic state of the intersection; a first determination module to determine a first reference traffic state for the intersection in response to a signal completion of adjusting the traffic signal based on the first adjustment strategy; and the second adjusting module is used for adjusting the model parameters of the updated initial machine learning model based on the first reference traffic state and the parameter noise value to obtain a trained machine learning model.

According to another aspect of the present disclosure, there is provided a signal control apparatus including: the device comprises an acquisition module, a processing module and an adjustment module. The acquisition module is used for acquiring target traffic data representing the state of the intersection; a processing module, configured to process the target traffic data by using a trained machine learning model, so as to obtain an adjustment strategy for a traffic signal lamp, where the trained machine learning model is obtained by using the apparatus as described above; and the adjusting module is used for adjusting the signal of the traffic signal lamp based on the adjusting strategy.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described training method and/or signal control method of the machine learning model.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the above-described training method and/or signal control method of the machine learning model.

According to another aspect of the present disclosure, a computer program product is provided, comprising computer program instructions to implement the steps of the above-described training method of machine learning models and/or the steps of the signal control method when executed by a processor.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an example application scenario for training and signal control of a machine learning model;

FIG. 2 schematically illustrates a flow diagram of a method of training a machine learning model according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic view of a traffic direction according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a training diagram of an initial machine learning model according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a training schematic of a machine learning model according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow chart of a signal control method according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a training apparatus for a machine learning model according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a block diagram of a signal control apparatus according to an embodiment of the present disclosure; and

FIG. 9 is a block diagram of an electronic device for performing training and/or signal control of a machine learning model used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Fig. 1 schematically illustrates an application scenario of training and signal control of an example machine learning model.

As shown in fig. 1, an application scenario 100 of an embodiment of the present disclosure includes, for example, traffic data 101 and an electronic device 102.

The traffic data 101 includes, for example, traffic data for an intersection, e.g., the traffic data 101 includes the number of vehicles passing through the intersection, vehicle congestion data, the number of vehicles queued, and the like.

The electronic device 102 may also be referred to as an agent, which has data processing functionality. Illustratively, the electronic device 102 may include a computer, server, smartphone, or the like.

The traffic data 101 is input into the electronic device 102, the electronic device 102 processes the traffic data 101 to obtain a signal adjustment strategy, and the signal of a traffic signal lamp at the intersection can be adjusted based on the signal adjustment strategy, for example, a green light state or a red light state of a certain traffic direction or a traffic phase at the intersection is adjusted, so as to reduce the traffic congestion degree.

Illustratively, the electronic device 102 may, for example, train a machine learning model and utilize the trained machine learning model to derive a signal adjustment strategy based on the traffic data 101.

The embodiment of the disclosure provides a training method and a signal control method of an optimized machine learning model. A training method and a signal control method of a machine learning model according to an exemplary embodiment of the present disclosure are described below with reference to fig. 2 to 6.

FIG. 2 schematically shows a flow diagram of a method of training a machine learning model according to an embodiment of the present disclosure.

As shown in fig. 2, the training method 200 of the machine learning model of the embodiment of the present disclosure may include operations S210 to S240, for example.

In operation S210, model parameters of the initial machine learning model are adjusted based on the parameter noise value, resulting in an updated initial machine learning model.

In operation S220, the updated initial machine learning model is used to process the second traffic sample data, so as to obtain a first adjustment strategy for the traffic signal lamp.

In operation S230, a first reference traffic state for the intersection is determined in response to a signal completion of adjusting the traffic signal based on the first adjustment strategy.

In operation S240, model parameters of the updated initial machine learning model are adjusted based on the first reference traffic state and the parameter noise value, resulting in a trained machine learning model.

Illustratively, the trained machine learning model includes a model trained using a reinforcement learning algorithm, for example, the model may be trained based on an Evolution Strategy (ES) to obtain the trained machine learning model. If the training of the model is started randomly based on the evolution strategy directly, the time for model learning optimization is increased, and a large number of samples are needed, so that the training efficiency is low.

Therefore, the embodiment of the disclosure may train to obtain the initial machine learning model by using the first traffic sample data, and then update and optimize the initial machine learning model by using the second traffic sample data based on the evolution strategy, thereby obtaining the final trained machine learning model. The first traffic sample data is indicative of a traffic state of the intersection and the second traffic sample data is for example also indicative of the traffic state of the intersection. Illustratively, the first or second traffic sample data includes, for example, a number of vehicles near the intersection, including, for example, a number of vehicle queues.

After the initial machine learning model is obtained, model parameters of the initial machine learning are adjusted based on the parameter noise value, so that an updated initial machine learning model is obtained. For example, the initial machine-learned model parameters are added to the parametric noise values.

After the updated initial machine learning model is obtained, the second traffic sample data is processed by using the updated initial machine learning model, so as to obtain a first adjustment strategy for a traffic signal, where the first adjustment strategy indicates, for example, a state of the traffic signal for adjusting a certain traffic direction or a traffic phase, and the state of the traffic signal includes, for example, a red light state, a green light state, and the like.

After the first adjustment strategy is obtained, the signal of the traffic signal is adjusted based on the first adjustment strategy, for example, the traffic signal for a certain traffic direction or traffic phase is adjusted from a red light state to a green light state. In an example, adjusting the signal of the traffic signal based on the first adjustment strategy includes simulating by a traffic simulator. The traffic simulator models a dynamic model of the vehicle in a road network, the traffic simulator can simulate and adjust signals of traffic lights, and the vehicle can simulate to run according to the adjusted traffic lights.

After adjusting the signal of the traffic light based on the first adjustment strategy, a first reference traffic state for the intersection is determined, the first reference traffic state characterizing, for example, a vehicle congestion condition at the intersection after adjusting the signal of the traffic light. Then, based on the first reference traffic state and the parameter noise value, model parameters of the updated initial machine learning model are adjusted, so that the trained machine learning model is obtained through training.

It will be appreciated that the machine learning model may be trained by cycling through multiple times to update the model parameters based on different parameter noise values. For example, the updated initial machine learning model is adjusted based on the first reference traffic state and the parameter noise value, resulting in an initial machine learning model for the first cycle. Adjusting model parameters of the initial machine learning model obtained in the first circulation based on another parameter noise value to obtain a second updated initial machine learning model, processing second traffic sample data by using the second updated initial machine learning model to obtain a first adjustment strategy for a traffic signal lamp, determining a first reference traffic state for the intersection in response to completion of adjusting signals of the traffic signal lamp based on the first adjustment strategy, adjusting model parameters of the second updated initial machine learning model based on the first reference traffic state and another parameter noise value to obtain a second circulated initial machine learning model, and repeating the steps until the models converge so as to obtain a trained machine learning model.

According to the embodiment of the disclosure, the initial machine learning model is obtained through training, and then the initial machine learning model is optimized based on the evolution strategy to obtain the final machine learning model, so that the learning optimization time of the model is reduced, and the training efficiency of the model is improved. The trained machine learning model has a better effect in the process of processing traffic data to obtain an adjustment strategy, so that a more accurate adjustment strategy is obtained, and the traffic signal lamp is adjusted based on the accurate adjustment strategy to reduce the traffic jam degree.

Fig. 3 schematically illustrates a schematic view of a traffic direction according to an embodiment of the present disclosure.

As shown in fig. 3, the intersections include a first intersection, the first or second traffic sample data includes traffic data for the first intersection and traffic data for a second intersection, and a distance between the second intersection and the first intersection satisfies a distance condition. For example, the second intersection includes a plurality of second intersections, which are adjacent to the first intersection. The distance includes euclidean distance, manhattan distance, and the like. An attention mechanism may be used to aggregate traffic data characteristics of adjacent intersections.

Because the traffic data between adjacent intersections are usually associated, for example, the congestion condition of the current intersection is usually influenced when the adjacent intersections are congested, when a model is trained based on first traffic sample data or second traffic sample data for a first intersection, the traffic data of the adjacent second intersection is considered at the same time, so that the accuracy of obtaining an adjustment strategy based on the model is improved, and the traffic congestion degree is reduced.

As shown in fig. 3, the traffic direction or the traffic phase at the intersection includes a plurality of traffic directions or traffic phases, for example, and the collision-free traffic directions or traffic phases are divided into a group, so as to obtain 8 groups, and each group includes a plurality of traffic directions or traffic phases, for example. The first adjustment strategy for traffic signal lights for multiple traffic directions or multiple traffic phases in each group is consistent. Taking as an example that each group includes a plurality of traffic directions, the second traffic sample data may be processed with the updated initial machine learning model to determine a traffic direction for the target group from the plurality of grouped traffic directions, and the first adjustment strategy for traffic lights for the traffic direction for the target group may be determined with the updated initial machine learning model.

For example, the second group includes two traffic directions, north to south and south to north, respectively. And when the updated initial machine learning model is used for determining and processing the second traffic sample data to determine that the two traffic directions have the congestion situation, determining the second group as the target group, and outputting a first adjusting strategy for the traffic lights of the two traffic directions in the target group by the updated initial machine learning model, wherein the first adjusting strategy comprises the step of adjusting the traffic lights of the two traffic directions to be in a green light state, so that the traffic congestion situation is alleviated.

If a certain intersection does not have the traffic direction of a certain group, when the machine learning model outputs the adjustment strategy, the strategy aiming at the group can be shielded so as to improve the processing speed.

According to the embodiment of the disclosure, the traffic direction is divided into a plurality of groups, and the adjustment strategy for the groups is output by using the machine learning model, so that the adjustment efficiency of the traffic signal lamp is improved, and the traffic jam alleviation effect is improved.

FIG. 4 schematically shows a training diagram of an initial machine learning model according to an embodiment of the present disclosure.

As shown in fig. 4, the initial machine learning model may be trained by mock learning. For example, a first traffic sample data 410 is obtained, and the first traffic sample data 410 is input into the initial machine learning model 420 to be trained, so as to obtain a second adjustment strategy 430 for the traffic signal lamp. The second adjustment strategy 430 is, for example, the same as or similar to the first adjustment strategy mentioned above.

In addition, the first traffic sample data 410 is input into the expert system 440, resulting in a reference adjustment strategy 450 for the traffic signal light, and the reference adjustment strategy 450 may be used as a label for the first traffic sample data for supervised training of the initial machine learning model 420. The reference adjustment strategy 450 is, for example, a strategy for adjusting a green light state or a red light state of a certain traffic direction, which is output by the expert system 440.

Then, based on the second adjustment strategy 430 and the reference adjustment strategy 450, the model parameters of the initial machine learning model 420 to be trained are adjusted, resulting in a preliminarily trained initial machine learning model. For example, based on the difference between the second tuning strategy 430 and the reference tuning strategy 450, a loss value 460 is obtained, and a gradient pass-back is performed based on the loss value 460 to tune the model parameters of the initial machine learning model 420 to be trained. The initial machine learning model 420 may include a deep learning model.

According to the embodiment disclosed by the instinct, the initial machine learning model can be trained in a supervision mode, and the training efficiency and the training effect of the initial machine learning model are improved.

In another example, the initial machine learning model may also be trained in other ways. For example, first traffic sample data is obtained, and the first traffic sample data is input into an initial machine learning model to be trained, so that a second adjustment strategy for a traffic signal lamp is obtained. The traffic signal light signal is then adjusted with the traffic simulator based on the second adjustment strategy.

After adjusting the signal of the traffic signal light based on the second adjustment strategy, the initial machine learning model to be trained determines a second reference traffic state for the intersection, the second reference traffic state for example representing a congestion condition of the intersection, for example, the second reference traffic state comprises the number of downstream lanes minus the number of upstream lanes, that is, the traffic pressure difference, the number of passing vehicles, the vehicle delay coefficient, and the like.

The initial machine learning model to be trained may be a reinforcement learning model, and the model parameters of the initial machine learning model to be trained may be adjusted by using the second reference traffic state as an incentive to obtain the initial machine learning model. The reinforcement learning model may include DQN (Deep Q-Network), SAC (Soft Actor critical), PPO (formal Policy optimization), and the like.

According to the embodiment of the disclosure, the initial machine learning model can be trained in a reinforcement learning mode, and the training efficiency and the training effect of the initial machine learning model are improved.

FIG. 5 schematically illustrates a training schematic of a machine learning model according to an embodiment of the present disclosure.

As shown in fig. 5, N/2 sets of parametric noise values are randomly generated for the initial machine learning model, N being an integer greater than 1, e.g., N/2 sets of parametric noise values include a1 set, B1 set, and C1 set. Then, the N/2 groups of parameter noise values are adjusted based on a numerical symmetry mode to obtain another N/2 groups of parameter noise values, and the another N/2 groups of parameter noise values comprise an A2 group, a B2 group and a C2 group.

For example, adding a negative sign to the parametric noise values in group a1 yields group a2, adding a negative sign to the parametric noise values in group B1 yields group B2, and adding a negative sign to the parametric noise values in group C1 yields group C2. The numerical symmetry is also referred to as mirror symmetry.

The obtained N groups of parameter noise values comprise A1 group, B1 group, C1 group, A2 group, B2 group and C2 group. The number of noise values of the parameters in each group corresponds to the number of model parameters, for example. The reference noise value is denoted noise in fig. 5.

The network in fig. 5 represents the model parameters. Model parameters of the initial machine learning model are respectively adjusted based on the N groups of parameter noise values, and N updated initial machine learning models are obtained. For example, a first updated initial machine learning model worker 1 is obtained by adding a first set of parameter noise values to model parameters of the initial machine learning model, a second updated initial machine learning model worker 2 is obtained by adding a second set of parameter noise values to model parameters of the initial machine learning model, and an nth updated initial machine learning model worker N is obtained by adding an nth set of parameter noise values to model parameters of the initial machine learning model.

And respectively inputting the N second traffic sample data D1, D2, D.

The first traffic sample data or the second traffic sample number is, for example, data for all lanes near the intersection. Due to the fact that the number of lanes near different intersections is different, if an intersection model is configured for each intersection independently to obtain first traffic sample data or second traffic sample data for each intersection, the cost is high. Therefore, different intersections can share one intersection model, when the number of lanes near a certain intersection is small, a virtual lane can be added for the intersection, and the number of first traffic sample data or second traffic sample data of the virtual lane is set to be zero, wherein the data of the first traffic sample data or the second traffic sample data comprises the number of vehicle queues within a certain range of distance between the lane and the intersection.

The plurality of first reference traffic states, for example, include N traffic congestion data for N updated initial machine learning models, and the N traffic congestion data are respectively incentives for the N updated initial machine learning models, that is, the N incentives include reward 1, reward 2, and reward.

Next, the learners mechanism in the model sorts the N traffic congestion data (N incentives) to obtain a sorting result, for example, sorting according to the severity of the traffic congestion. Then, the Leaners mechanism adjusts model parameters of the N updated initial machine learning models based on the sorting result and the N sets of parameter noise values, to obtain the initial machine learning model of the first cycle. When the sequencing result represents that the excitation of a certain worker indicates that the traffic jam condition is not serious, when the model parameters of the worker are adjusted based on N groups of parameter noise values, the model parameters can be adjusted in the direction of increasing the excitation, and the degree of representing the traffic jam is reduced when the excitation is increased.

Then, another N groups of parameter noise values are randomly generated, the initial machine learning model obtained by the first circulation is adjusted based on the other N groups of parameter noise values, the operation is repeated, the operation is circulated for multiple times until the model converges, and the trained machine learning model is obtained based on the optimal model parameters.

The evolutionary strategy optimization model continues to be used after the initial machine learning model learned through a baseline algorithm such as mock learning or DQN. The indicators (incentives) of the evolutionary strategy may be defined as final metrics such as average waiting time of all vehicles and/or congestion factor. In order to obtain a more stable index smoothly, one worker can operate for multiple times based on second traffic sample data of multiple time periods, and an average value is taken as a final measurement index (excitation) after the multiple operations. Meanwhile, a plurality of workers are operated simultaneously based on the distributed parallel capability so as to increase the search range and accelerate the convergence speed.

According to the embodiment of the disclosure, when the model is trained, the long-term income is considered by the reinforcement learning or evolution strategy, the model can better converge to an optimal solution, and in practical application, the average waiting time of the vehicle can be shortened without manually debugging and setting many parameters.

The final measurement index of the evolution strategy algorithm is the average waiting time of all vehicles at all intersections in the whole environment, the global final excitation is directly used as the optimization direction of the model, the problem of manual reward definition is omitted, and a better effect can be achieved. By further optimizing the model, the model learns a better strategy, and the traffic signal control is performed through the model, so that the traffic jam condition is reduced.

Fig. 6 schematically shows a flow chart of a signal control method according to an embodiment of the present disclosure.

As shown in fig. 6, the signal control method 600 of the embodiment of the present disclosure may include, for example, operations S610 to S630.

In operation S610, target traffic data characterizing an intersection state is acquired.

In operation S620, the target traffic data is processed using the trained machine learning model, resulting in an adjustment strategy for traffic lights.

In operation S630, a signal of the traffic signal is adjusted based on the adjustment policy.

Illustratively, the trained machine learning model is trained using the above-mentioned method.

After the final machine learning model is obtained through training, when the machine learning model is used for signal control, real-time target traffic data of the intersection can be input, and the machine learning model outputs an adjusting strategy aiming at traffic signal lamps so as to control the traffic light states of the traffic signal lamps based on the adjusting strategy.

The traffic directions of the intersection comprise a plurality of traffic directions, for example, the non-conflict traffic directions are divided into a group, so that at least one group is obtained, and the adjustment strategies of the traffic lights of the plurality of traffic directions in each group are consistent. The traffic direction of the target group is determined from the traffic directions of the at least one group on the basis of an adjustment strategy, and the signal of the traffic signal is adjusted for the traffic signal of the traffic direction of the target group.

FIG. 7 schematically illustrates a block diagram of a training apparatus for a machine learning model according to an embodiment of the present disclosure.

As shown in fig. 7, the training apparatus 700 of the machine learning model of the embodiment of the present disclosure includes, for example, a first adjusting module 710, a processing module 720, a first determining module 730, and a second adjusting module 740.

The first adjusting module 710 may be configured to adjust model parameters of the initial machine learning model based on the parameter noise value to obtain an updated initial machine learning model, where the initial machine learning model is obtained by training first traffic sample data, and the first traffic sample data represents a traffic state of the intersection. According to the embodiment of the present disclosure, the first adjusting module 710 may, for example, perform operation S210 described above with reference to fig. 2, which is not described herein again.

The processing module 720 may be configured to process second traffic sample data by using the updated initial machine learning model to obtain a first adjustment strategy for a traffic signal lamp, where the second traffic sample data represents a traffic state of an intersection. According to the embodiment of the present disclosure, the processing module 720 may, for example, perform operation S220 described above with reference to fig. 2, which is not described herein again.

The first determination module 730 can be used to determine a first reference traffic state for the intersection in response to completion of the signal adjusting the traffic signal based on the first adjustment strategy. According to an embodiment of the present disclosure, the first determining module 730 may perform, for example, the operation S230 described above with reference to fig. 2, which is not described herein again.

The second adjusting module 740 may be configured to adjust model parameters of the updated initial machine learning model based on the first reference traffic state and the parameter noise value, resulting in a trained machine learning model. According to the embodiment of the present disclosure, the second adjusting module 740 may perform, for example, the operation S240 described above with reference to fig. 2, which is not described herein again.

According to an embodiment of the present disclosure, the first adjusting module 710 includes: the device comprises a generating submodule, a first adjusting submodule and a second adjusting submodule. The generation submodule is used for randomly generating N/2 groups of parameter noise values, and N is an integer greater than 1; the first adjusting submodule is used for adjusting the noise values of the N/2 groups of parameters based on a numerical symmetry mode to obtain another noise value of the N/2 groups of parameters; and the second adjusting submodule is used for respectively adjusting the model parameters of the initial machine learning model based on the N groups of parameter noise values to obtain N updated initial machine learning models.

According to an embodiment of the present disclosure, the first reference traffic state comprises N traffic congestion data for N updated initial machine learning models; the second adjusting module 740 includes: a sorting submodule and a third adjusting submodule. The sequencing submodule is used for sequencing the N pieces of traffic jam data to obtain a sequencing result; and the third adjusting submodule is used for adjusting the model parameters of the N updated initial machine learning models based on the sequencing result and the N groups of parameter noise values to obtain the trained machine learning models.

According to an embodiment of the disclosure, the processing module 720 includes: a processing sub-module and a determination sub-module. A processing submodule, configured to process the second traffic sample data by using the updated initial machine learning model to determine a traffic direction of the target group from the traffic directions of the at least one group, where the first adjustment strategy for the traffic lights of the plurality of traffic directions in each group is consistent; a determination submodule for determining a first adjustment strategy for the traffic lights of the traffic direction for the target group.

According to an embodiment of the present disclosure, the apparatus 700 may further include: the device comprises a first acquisition module, a first input module and a third adjustment module. The first acquisition module is used for acquiring a first traffic sample; the first input module is used for inputting the first traffic sample into an initial machine learning model to be trained to obtain a second adjustment strategy for the traffic signal lamp; and the third adjusting module is used for adjusting the model parameters of the initial machine learning model to be trained based on the second adjusting strategy and the reference adjusting strategy to obtain the initial machine learning model.

According to an embodiment of the present disclosure, the apparatus 700 may further include: the device comprises a second acquisition module, a second input module, a second determination module and a fourth adjustment module. The second acquisition module is used for acquiring a first traffic sample; the second input module is used for inputting the first traffic sample into the initial machine learning model to be trained to obtain a second adjustment strategy for the traffic signal lamp; a second determination module to determine a second reference traffic state for the intersection in response to a signal completion of adjusting the traffic signal based on a second adjustment strategy; and the fourth adjusting module is used for adjusting the model parameters of the initial machine learning model to be trained based on the second reference traffic state to obtain the initial machine learning model.

According to the embodiment of the present disclosure, the intersection includes a first intersection, the first traffic sample data or the second traffic sample data includes traffic data for the first intersection and traffic data for the second intersection, and a distance between the second intersection and the first intersection satisfies a distance condition.

Fig. 8 schematically illustrates a block diagram of a signal control device according to an embodiment of the present disclosure.

As shown in fig. 8, the signal control apparatus 800 of the embodiment of the present disclosure includes, for example, an obtaining module 810, a processing module 820, and an adjusting module 830.

The acquisition module 810 can be configured to acquire target traffic data characterizing an intersection state. According to an embodiment of the present disclosure, the obtaining module 810 may perform, for example, the operation S610 described above with reference to fig. 6, which is not described herein again.

The processing module 820 may be used to process the target traffic data using the trained machine learning model, resulting in an adjustment strategy for the traffic signal light. According to the embodiment of the present disclosure, the processing module 820 may perform, for example, operation S620 described above with reference to fig. 6, which is not described herein again.

The adjustment module 830 may be used to adjust the signal of the traffic signal based on an adjustment strategy. According to the embodiment of the present disclosure, the adjusting module 830 may perform the operation S630 described above with reference to fig. 6, for example, and is not described herein again.

According to an embodiment of the present disclosure, the adjusting module 830 includes: a determination submodule and an adjustment submodule. A determining submodule for determining a traffic direction of a target group from the traffic directions of at least one group based on an adjustment strategy, wherein the adjustment strategies for the traffic lights of a plurality of traffic directions in each group are consistent; and the adjusting submodule is used for adjusting the signal of the traffic signal lamp aiming at the traffic signal lamp of the traffic direction of the target group.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.

In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. The electronic device 900 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs various methods and processes described above, such as a training method of a machine learning model and/or a signal control method. For example, in some embodiments, the training method and/or the signal control method of the machine learning model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the training method and/or the signal control method of the machine learning model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g., by means of firmware) to perform a training method and/or a signal control method of the machine learning model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a training apparatus and/or signal control apparatus of a general purpose computer, special purpose computer, or other programmable machine learning model, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: training means and/or signal control means (e.g. a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying a machine-learned model of information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a machine learning model, comprising:

adjusting model parameters of the initial machine learning model based on the parameter noise value to obtain an updated initial machine learning model, wherein the initial machine learning model is obtained by training first traffic sample data, and the first traffic sample data represents the traffic state of the intersection;

processing second traffic sample data by using the updated initial machine learning model to obtain a first adjustment strategy for a traffic signal lamp, wherein the second traffic sample data represents the traffic state of the intersection;

determining a first reference traffic state for the intersection in response to a signal completion of adjusting the traffic signal based on the first adjustment strategy; and

adjusting model parameters of the updated initial machine learning model based on the first reference traffic state and the parameter noise value to obtain a trained machine learning model.

2. The method of claim 1, wherein the adjusting model parameters of the initial machine learning model based on the parametric noise values, resulting in an updated initial machine learning model comprises:

randomly generating N/2 groups of parameter noise values, wherein N is an integer greater than 1;

adjusting the N/2 groups of parameter noise values based on a numerical symmetry mode to obtain another N/2 groups of parameter noise values; and

and respectively adjusting model parameters of the initial machine learning model based on N groups of parameter noise values to obtain N updated initial machine learning models.

3. The method of claim 2, wherein the first reference traffic state includes N traffic congestion data for the N updated initial machine learning models; adjusting model parameters of the updated initial machine learning model based on the first reference traffic state and the parameter noise value to obtain a trained machine learning model comprises:

sequencing the N traffic jam data to obtain a sequencing result; and

and adjusting model parameters of the N updated initial machine learning models based on the sequencing result and the N groups of parameter noise values to obtain a trained machine learning model.

4. The method of claim 1, wherein the processing second traffic sample data with the updated initial machine learning model resulting in a first adjustment strategy for traffic signal lights comprises:

processing second traffic sample data using the updated initial machine learning model to determine a traffic direction of a target group from the traffic directions of at least one group, wherein a first adjustment strategy for traffic lights of a plurality of traffic directions in each group is consistent; and

a first adjustment strategy for traffic lights of a traffic direction for the target group is determined.

5. The method of claim 1, further comprising:

acquiring the first traffic sample data;

inputting the first traffic sample data into an initial machine learning model to be trained to obtain a second adjustment strategy for the traffic signal lamp; and

and adjusting the model parameters of the initial machine learning model to be trained based on the second adjustment strategy and the reference adjustment strategy to obtain the initial machine learning model.

6. The method of claim 1, further comprising:

acquiring the first traffic sample data;

inputting the first traffic sample data into an initial machine learning model to be trained to obtain a second adjustment strategy for the traffic signal lamp;

determining a second reference traffic state for the intersection in response to a signal completion of adjusting the traffic signal based on the second adjustment strategy; and

and adjusting the model parameters of the initial machine learning model to be trained based on the second reference traffic state to obtain the initial machine learning model.

7. The method of claim 1, wherein the intersection comprises a first intersection, the first or second traffic sample data comprises traffic data for the first intersection and traffic data for a second intersection, a distance between the second intersection and the first intersection satisfies a distance condition.

8. A signal control method, comprising:

acquiring target traffic data representing the state of the intersection;

processing the target traffic data with a trained machine learning model to obtain an adjustment strategy for traffic signal lamps, wherein the trained machine learning model is trained with the method according to any one of claims 1-7; and

and adjusting the signal of the traffic signal lamp based on the adjusting strategy.

9. The method of claim 8, wherein the adjusting the signal of the traffic signal based on the adjustment strategy comprises:

determining the traffic direction of a target group from the traffic directions of at least one group based on the adjustment strategy, wherein the adjustment strategies of the traffic signal lamps of the plurality of traffic directions in each group are consistent; and

and aiming at the traffic signal lamp of the traffic direction of the target group, adjusting the signal of the traffic signal lamp.

10. A training apparatus for a machine learning model, comprising:

the first adjusting module is used for adjusting model parameters of the initial machine learning model based on the parameter noise value to obtain an updated initial machine learning model, wherein the initial machine learning model is obtained by training first traffic sample data, and the first traffic sample data represents the traffic state of the intersection;

the processing module is used for processing second traffic sample data by using the updated initial machine learning model to obtain a first adjustment strategy aiming at a traffic signal lamp, wherein the second traffic sample data represents the traffic state of the intersection;

a first determination module to determine a first reference traffic state for the intersection in response to a signal completion of adjusting the traffic signal based on the first adjustment strategy; and

and the second adjusting module is used for adjusting the model parameters of the updated initial machine learning model based on the first reference traffic state and the parameter noise value to obtain a trained machine learning model.

11. The apparatus of claim 10, wherein the first adjustment module comprises:

the generating submodule is used for randomly generating N/2 groups of parameter noise values, wherein N is an integer larger than 1;

the first adjusting submodule is used for adjusting the noise values of the N/2 groups of parameters based on a numerical symmetry mode to obtain another noise value of the N/2 groups of parameters; and

and the second adjusting submodule is used for respectively adjusting the model parameters of the initial machine learning model based on the N groups of parameter noise values to obtain N updated initial machine learning models.

12. The apparatus of claim 11, wherein the first reference traffic state comprises N traffic congestion data for the N updated initial machine learning models; the second adjustment module includes:

the sequencing submodule is used for sequencing the N pieces of traffic jam data to obtain a sequencing result; and

and the third adjusting submodule is used for adjusting the model parameters of the N updated initial machine learning models based on the sequencing result and the N groups of parameter noise values to obtain the trained machine learning models.

13. The apparatus of claim 10, wherein the processing module comprises:

a processing submodule, configured to process second traffic sample data using the updated initial machine learning model to determine a traffic direction of a target group from traffic directions of at least one group, wherein a first adjustment strategy for traffic lights of a plurality of traffic directions in each group is consistent; and

a determination submodule for determining a first adjustment strategy for the traffic lights of the traffic direction for the target group.

14. The apparatus of claim 10, further comprising:

a first obtaining module for obtaining the first traffic sample;

the first input module is used for inputting the first traffic sample into an initial machine learning model to be trained to obtain a second adjustment strategy for the traffic signal lamp; and

and the third adjusting module is used for adjusting the model parameters of the initial machine learning model to be trained based on the second adjusting strategy and the reference adjusting strategy to obtain the initial machine learning model.

15. The apparatus of claim 10, further comprising:

the second acquisition module is used for acquiring the first traffic sample;

the second input module is used for inputting the first traffic sample into an initial machine learning model to be trained to obtain a second adjustment strategy for the traffic signal lamp;

a second determination module to determine a second reference traffic state for the intersection in response to completion of adjusting the signal of the traffic signal based on the second adjustment strategy; and

and the fourth adjusting module is used for adjusting the model parameters of the initial machine learning model to be trained based on the second reference traffic state to obtain the initial machine learning model.

16. The apparatus of claim 10, wherein the intersection comprises a first intersection, the first or second traffic sample data comprises traffic data for the first intersection and traffic data for a second intersection, a distance between the second intersection and the first intersection satisfies a distance condition.

17. A signal control apparatus comprising:

the acquisition module is used for acquiring target traffic data representing the state of the intersection;

a processing module, configured to process the target traffic data using a trained machine learning model, so as to obtain an adjustment strategy for a traffic signal, where the trained machine learning model is trained using the apparatus according to any one of claims 10-16; and

and the adjusting module is used for adjusting the signal of the traffic signal lamp based on the adjusting strategy.

18. The apparatus of claim 17, wherein the adjustment module comprises:

a determining submodule for determining a traffic direction of a target group from the traffic directions of at least one group based on the adjustment strategy, wherein the adjustment strategies for the traffic lights of a plurality of traffic directions in each group are consistent; and

and the adjusting submodule is used for adjusting the signal of the traffic signal lamp aiming at the traffic signal lamp of the traffic direction of the target group.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the steps of the method according to any of claims 1-9.