WO2019173972A1

WO2019173972A1 - Method and system for training non-linear model

Info

Publication number: WO2019173972A1
Application number: PCT/CN2018/078866
Authority: WO
Inventors: Kuan SHI
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-03-13
Filing date: 2018-03-13
Publication date: 2019-09-19
Anticipated expiration: 2020-09-13
Also published as: JP2020530607A; CN110709861B; CN110709861A

Abstract

Systems and computer-implemented methods for training a model. The computer-implemented method can include acquiring, via a communication interface, training data, such as weather, timing, traffic condition, etc.. The computer-implemented method can further include placing, by a processor, a plurality of breaklines at a plurality of breakline positions in the training data. The computer-implemented method can also include determining, by the processor, an entropy change for each breakline position, and selecting, by the processor, at least one breakline position, each associated with an entropy change greater than a predetermined threshold. The computer-implemented method can further include segmenting, by the processor, the training data into a plurality of segments according to the at least one selected breakline position, and generating, by the processor, the non-linear model based on the segments.

Description

[Title established by the ISA under Rule 37.2] METHOD AND SYSTEM FOR TRAINING NON-LINEAR MODEL

TECHNICAL FIELD

The present disclosure relates to training a non-linear model, and more particularly to, training a non-linear model based on information entropy change of training data.

BACKGROUND

In machine learning, it is critical to train a model based on training data (e.g., including sample data and supervised signal) . The trained model can reflect a correspondence relationship between the sample data and the supervised signal. The trained model can be subsequently applied to new input data, to provide an estimated outcome according to the trained correspondence relationship. Generally, the actual correspondence relationship is non-linear. For example, providing a service through the Internet relies on estimation of user needs, service capacity, traffic condition, likelihood, etc., and all these outcomes are non-linearly related to multiple factors, such as weather, time during the day, location of service, etc. As a result, the estimation requires use of non-linear models.

FIG. 1A illustrates an exemplary non-linear correspondence relationship associated with training data. However, the trained model reflecting the correspondence relationship can be linear. FIG. 1B illustrates an exemplary trained model corresponding to training data. For example, as shown in FIG. 1A, the actual correspondence relationship can be represented by y=f (x) , wherein f (x) is a non-linear function. If a linear model is used for training, the trained model will reflect a linear relationship of t=kx (e.g., as shown in FIG. 1B) . FIG. 1C illustrates a comparison of an exemplary non-linear correspondence relationship (solid line) and an exemplary trained linear model (dotted line) . Although the linear model tracks the general shape of the non-linear model, it over-generalizes the actual model and in many parts, loses details that are crucial to future applications. As a result, the linear model oftentimes fails to generate correct results in future application.

Embodiments of the disclosure address the above problem by training a non-linear model based on information entropy change of the training data, to accurately reflect the actual non-linear correspondence relationship in the training data.

SUMMARY

Embodiments of the disclosure provide a computer-implemented machine learning method for training a non-linear model. The method can include acquiring, via a communication interface, training data. The method can further include placing, by a processor, a plurality of breaklines at a plurality of breakline positions in the training data. The method can also include determining, by the processor, an entropy change for each breakline position. The method can further include selecting, by the processor, at least one breakline position, each associated with an entropy change greater than a predetermined threshold, and segmenting, by the processor, the training data into a plurality of segments according to the at least one selected breakline position. The method can yet further include generating, by the processor, the non-linear model based on the segments.

Embodiments of the disclosure further provide a machine learning system for training a non-linear model. The system can include a communication interface configured to receive training data, and a memory configured to store the training data and the non-linear model. The system can further include at least one processor configured to place a plurality of breaklines at a plurality of breakline positions in the training data. The at least one processor can be further configured to determine an entropy change for each breakline position. The at least one processor can be also configured to select at least one breakline position, each associated with an entropy change greater than a predetermined threshold, and segment the training data into a plurality of segments according to the at least one selected breakline position. The at least one processor can be yet further configured to generate the non-linear model based on the segments.

Embodiments of the disclosure further provide a non-transitory computer-readable medium that stores a set of instructions. When the instructions are executed by at least one processor of an electronic device, the instructions cause the electronic device to perform a method for training a non-linear model. The method can include acquiring training data, and placing a plurality of breaklines at a plurality of breakline positions in the training data. The method can further include determining an entropy change for each breakline position. The method can also include selecting at least one breakline position, each associated with an entropy change greater than a predetermined threshold. The method can further include segmenting the training data into a plurality of segments according to the at least one selected breakline position, and generating the non-linear model based on the segments.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A illustrates an exemplary non-linear correspondence relationship associated with training data.

FIG. 1B illustrates an exemplary trained linear model corresponding to training data.

FIG. 1C illustrates a comparison of an exemplary non-linear correspondence relationship associated with training data and an exemplary trained linear model.

FIG. 2 illustrates an exemplary process for training a model, according to embodiments of the disclosure.

FIG. 3 illustrates a schematic diagram of an exemplary training system, according to embodiments of the disclosure.

FIG. 4 illustrates a schematic diagram of an exemplary segmented training data, according to embodiments of the disclosure.

FIG. 5 illustrates a flowchart of an exemplary method for training a non-linear model, according to embodiments of the disclosure.

DETAILED DESCRIPTION

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.

An aspect of the disclosure is directed to a system for training a non-linear model.

FIG. 2 illustrates an exemplary process for training a model, according to embodiments of the disclosure. As shown in FIG. 2, a training system 200 acquires training data 202 for training a model 204. Training data 202 can include historical data associated with model 204. In some embodiments, when the model is to be used for estimating a probability of a transportation service being shared by more than one passengers (e.g., a probability of car-pooling) , the training data can include request information of historical transportation services and result information regarding whether a historical transportation service being shared. Therefore, the request information of historical transportation services can be the sample data for training (referred to as the “training data” ) , and the result information regarding whether the historical transportation service being shared can be the supervised signal for training. For example, when the historical transportation service is shared by more than one passenger, the supervised signal can be assigned with “1” . Otherwise, when the historical transportation service fails to be shared, the supervised signal can be assigned with “0” . The sample data (e.g., request information of a historical transportation service) can include at least one of an origin, a destination, a departure time, a number of passengers, and the like. Therefore, in this example, the sample data is a multidimensional data, and the supervised signal is a one-dimensional data. It is contemplated that, the training data can include one-dimensional data or multidimensional data, and is not limited to the above example. Similarly, the trained model can be one-dimensional or multidimensional. When properly trained, the trained model can reflect the correspondence relationship between trip parameters (including the origin, the destination, the departure time, the number of passengers, and the like) and the success of sharing, and can be configured to generate the probability of a transportation service being shared by more than one passenger and further generate a price for the transportation service. It is contemplated that, non-linear models for a variety of other applications can be trained and used according to the systems and methods disclosed in the present disclosure.

FIG. 3 illustrates a schematic diagram of an exemplary training system 200, according to embodiments of the disclosure. It is contemplated that, training system 200 can be a separate system (e.g., a server) or an integrated component of a server. Because training a model may require significant computation resources, in some embodiments, training system 200 may be preferably implemented as a separate system. In some embodiments, training system 200 may include sub-systems, some of which may be remote.

Training system 200 can be a general-purpose server or a proprietary device specially designed for training a model. In some embodiments, as shown in FIG. 3, training system 200 may include a communication interface 302, a processor 304, and a memory 316. Processor 304 may further include multiple functional modules, such as a breakline placing unit 306, a breakline position selection unit 308, a segmentation unit 310, a model generation unit 312, and the like. Breakline position selection unit 308 may further include an entropy change determination unit 314. These modules (and any corresponding sub-modules or sub-units) can be hardware units (e.g., portions of an integrated circuit) of processor 304 designed for use with other components or to execute a part of a program. The program may be stored on a computer-readable medium, and when executed by processor 304, it may perform one or more functions. Although FIG. 3 shows units 306-314 all within one processor 304, it is contemplated that these units may be distributed among multiple processors located near or remotely with each other. In some embodiments, training system 200 may be implemented in the cloud, or on a separate computer/server.

Communication interface 302 may be configured to receive training data. In some embodiments, the training data can include sample input data and corresponding supervised signal. The sample input data can be history data, and the supervised signal can be an output value (e.g., “0” or “1” ) for indicating a result associated with the sample data. For example, the training data may be used to train a model for determining a probability of a vehicle being shared by more than one passenger, and further determining a price for a transportation service.

Communication interface 302 can be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection. As another example, communication interface 302 can be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented by communication interface 302. In any such implementation, communication interface 302 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information via a network. The network can typically include a cellular communication network, a Wireless Local Area Network (WLAN) , a Wide Area Network (WAN) , or the like.

Memory 316 may be implemented as any type of volatile or non-volatile memory devices, or a combination thereof, such as a static random access memory (SRAM) , an electrically erasable programmable read-only memory (EEPROM) , an erasable programmable read-only memory (EPROM) , a programmable read-only memory (PROM) , a read-only memory (ROM) , a magnetic memory, a flash memory, or a magnetic or optical disk. Memory 316 may store instructions executable by processor 304 to cause system 200 to perform functions for training a model.

Breakline placing unit 306 can place a plurality of breaklines at a plurality of breakline positions in the training data. Each breakline at a corresponding breakline position can indicate a particular data structure of the training data. In some embodiments, for training data “1222211111” , it can have a breakline between any two digits. For example, a first breakline can be placed between “1” and “222211111” , so that the data is divided as “1|222211111” . Similarly, a second breakline can divide the data as “12|22211111” . It is contemplated that, the data can be divided in to multiple hypothetical segments by more than one breaklines placed at different breakline positions. For example, the above training data can be divided into four segments by three breaklines as “12|222|111|11” . Although a one-dimensional data is used as an example, it is contemplated that the training data may be high dimensional and include several nodes (a node is a digit in one-dimension) .

Breakline position selection unit 308 can select at least one breakline position. In some embodiments, as shown in FIG. 3, breakline position selection unit 308 can further include entropy change determination unit 314, which is configured to determine an entropy change for each breakline position. An entropy is a measure of unpredictability of the state of data. For data U (u ₁, u ₂, u ₃, …, u _n) , an entropy value E can be determined according to the formula below.

Formula 1

where p _i indicates the probability of u _i in the data U, and i=1, 2, 3, …, n.

Based on the probabilities, the entropy value E can measure unpredictability of data U. Besides as a measure of the unpredictability, E (U) can also indicate the amount of information content. Data with a higher entropy value indicates that the data contains a great amount of information content and is thus hard to predict. Data with a smaller entropy value indicates that the data contains a small amount of information content and is thus easy to predict. It is contemplated that, when the data is well-ordered (e.g., linear) , the entropy of the data will be relatively small. That is, more linear the data is, the smaller the entropy is.

To determine the entropy change, entropy change determination unit 314 can determine an original entropy based on the training data, determine a breaking entropy based on entropies of hypothetical segments divided by the breakline, and determine the entropy change associated with the breakline position based on the original entropy and the breaking entropy.

The original entropy Eo can be determined based on the training data (e.g., “1222111111” ) , as a whole, according to the above Formula 1. For example, for “1222111111” , Eo=4.3417. Then, entropy change determination unit 314 can divide the original data into more than one hypothetical segments using one or more breaklines, and determine a sub-entropy for each hypothetical segment. For example, considering the training data with one breakline (e.g., “1222|111111” ) , entropy change determination unit 314 can determine a first sub-entropy (e.g., 0.9769) for the first hypothetical segment “1222” , and a second sub-entropy (e.g., 0) for the second hypothetical segment “111111” . Entropy change determination unit 314 can determine the breaking entropy of the training data based on the first and second sub-entropies. In some embodiments, the breaking entropy can be determined based on a weighted sum of the first and second entropies. For example, the breaking entropy of the training data can be determined according to the formula below.

Formula 2

where S indicates the whole original training data, Sv indicates a hypothetical segment, and |Sv|/|S| indicates a weight for the entropy E of the corresponding segment.

For example, the original training data (e.g., “1222111111” ) includes 10 digits. The data can be divided by two breaklines (e.g., as in “12|221|11111” ) . The first hypothetical segment “12” includes two digits, the second hypothetical segments “221” includes three digits, and the third hypothetical segment “11111” includes five digits. Accordingly, the weight for the first sub-entropy can be 2/10, the weight for the second sub-entropy can be 3/10, and the weight for the third sub-entropy can be 5/10. The first hypothetical segment has afirst sub-entropy of 0.1505, the second hypothetical segment has a second sub-entropy of 0.3938, and the third hypothetical segment has a third sub-entropy of 0. In some embodiments, the breaking entropy associated with the first breakline, as a weighted sum, can be determined as 0.1505×0.2+0.3938×0.3=0.14824, and the breaking entropy associated with the second breakline can be determined as 0.3938×0.3+0×0.6=0.11814.

Based on the original entropy and the breaking entropy, entropy change determination unit 314 can determine the entropy changes for the corresponding breakline position as:

First entropy change: 4.3417-0.14824=4.19346

Second entropy change: 4.3417-0.11814=4.22356

In the above method for determining the breaking entropy, the first hypothetical segment (e.g., “12” ) is between the first breakline position and the first digit of the training data, and the first digit can be considered as a neighboring breakline position to the first breakline position. It is contemplated that, both the first digit and last digit of the training data can be considered as breakline positions. The second hypothetical segment (e.g., “221” ) is between the first breakline position and the second breakline position, and the second breakline position is the neighboring breakline position of the first breakline position.

It is contemplated that, the method for determining the breaking entropy can be different from the above exemplary method. For example, in the above exemplary method, it determines the breaking entropy based on the sub-entropies of hypothetical segments between two neighboring breaklines. The breaking entropy can also be determined based on a hypothetical segment and the rest of the training data. The weights for the sub-entropies should also be adapted accordingly.

The entropy change of the original entropy and the breaking entropy can also be referred to as information enhancement, which indicates the change of linearity of the training data. It is contemplated that, training system 200 may determine the entropy change for each of the breakline positions. Based on the entropy changes, training system 200 can select at least one breakline position, each associated with an entropy change greater than a predetermined threshold. The predetermined threshold can be empirical value or may be adjusted according to the model to be trained. For example, the predetermined threshold can be associated with a depth of the model. In machine learning, neural network including several layers may be used, and the number of layer (i.e., the depth) can be associated with the predetermined threshold. As discussed above, a greater entropy indicate less linearity. Therefore, when the entropy change associated with a breakline is greater than the predetermined threshold, it indicates the linearity of the training data has been improved. For example, when the predetermined threshold is set as 4.2, the second breakline line will be selected as its associated entropy change is greater than the threshold. With reference to the above example, by dividing the training data having less linearity into two segments each having a better linearity, the linearity of the training data can be improved. In other words, each segment of the training data can be more linear than the original training data. Therefore, linear models are more likely to fit the segments generated according to the selected at least one breakline position.

Segmentation unit 310 can segment the training data into a plurality of segments according to the selected breakline position, and the segments are used as training data for modeling.

Model generation unit 312 can generate the non-linear model based on the segments. In some embodiments, model generation unit 312 can generate linear sub- models for the plurality of segments, and generate the non-linear model by aggregating the sub-models. As discussed above, machine learning can be used to train the sub-models. The linear sub-models can fit the plurality of segments well, as these segments each have better linearity. Model generation unit 312 can connect the sub-models in an order of the respective segments in the training data in order to generate the non-linear model.

FIG. 4 illustrates a schematic diagram of an exemplary segmented training data, according to embodiments of the disclosure. For example, among a plurality of breakline positions (e.g., more than 4 breakline positions) , the entropy changes for breakline positions #1, #2, #3, and #4 are greater than the predetermined threshold. In the exemplary training data segmented according to embodiments of the disclosure, segments (O-x1) , (x1-x2) , (x2-x3) , (x3-x4) , and (x4-E) are linear, and thus can be fit by linear models respectively. These linear models can then be aggregated into a non-linear model, to more accurately fit the original, unsegmented, training data.

Another aspect of the disclosure is directed to a method for training a non-linear model.

FIG. 5 illustrates a flowchart of an exemplary method 500 for training a non-linear model, according to embodiments of the disclosure. For example, method 500 may be performed by training system 200, and may include steps S502-S510 discussed as below.

In step S502, training system 200 can acquire training data. The training data can be history data and include sample input data and corresponding supervised signal, and the supervised signal can be an output value (e.g., “0” or “1” ) for indicating a result associated with the sample data. For example, the training data may be used to train a model for determining a probability of a vehicle being shared by more than one passenger, and further determining a price for a transportation service.

In step S504, training system 200 can place a plurality of breaklines at a plurality of breakline positions in the training data. Each breakline at a corresponding breakline position can indicate a particular data structure of the training data. In some embodiments, for training data “1222211111” , it can have a breakline between any two numbers. For example, a first breakline can be placed between “1” and “222211111” , so that the data divided as “1|222211111” . It is contemplated that, the data can be divided into multiple hypothetical segments by more than one breaklines placed at different breakline positions.

In step S506, training system 200 can select at least one breakline position. In some embodiments, training system 200 can further determine an entropy change for each breakline position. An entropy is a measure of unpredictability of the state of data. As discussed above, when data is well-ordered (e.g., linear) , the entropy of the data will be relatively small. That is, more linear the data is, the smaller the entropy is.

To determine the entropy change, training system 200 can determine an original entropy based on the training data, determine a breaking entropy based on entropies of hypothetical segments divided by the breakline position, and determine the entropy change associated with the breakline position based on the original entropy and the breaking entropy.

The original entropy can be determined based on the training data, as a whole. Then, training system 200 can divide the original training data into more than one hypothetical segments using one or more breaklines, and determine a sub-entropy for each hypothetical segment. For example, a breakline can generate a first sub-set and a second sub-set of the training data, and thus a first sub-entropy and a second sub-entropy corresponding to the first and second sub-sets can be determined. Training system 200 can determine the breaking entropy associated with the breakline position based on, for example, the first and second sub-entropies. In some embodiments, the breaking entropy can be determined based on a weighted sum of the first and second entropies. For example, the breaking entropy can be determined according to Formula 2 discussed above. Detailed regarding Formula 2 can be referred with the above description, which will not be repeated for clarity.

Based on the original entropy and the breaking entropy, training system 200 can determine the entropy changes for the corresponding breakline position by determining a difference between the original and breaking entropies.

In the above method for determining the breaking entropy, the first hypothetical segment can be between the first breakline position and the first node (e.g., the first digit when data is one-dimensional) of the training data, and the first node can be considered as a neighboring breakline position to the first breakline position. It is contemplated that, both the first node and the last node of the training data can be considered as breakline positions. The second hypothetical segment (e.g., “221” ) is between the first breakline position and the second breakline position, and the second breakline position is the neighboring breakline position of the first breakline position.

It is contemplated that, the method for determining the breaking entropy can be different from the above exemplary method. For example, in the above exemplary method, it determines the breaking entropy based on the sub-entropies of hypothetical segments between two neighboring breaklines. The breaking entropy can also be determined based on a hypothetical segment and the rest of the training data. For example, the first breaking entropy can be determined based on the entropies of “12” and “2211111” , and the second breaking entropy can be determined based on the entropies of “221” and “11111” . The weights for the breaking entropies should also be changed accordingly. Therefore, the determination of breaking entropy may not be limited by above examples.

The entropy change can also be referred to as information enhancement, which indicates the change of linearity of the training data. It is contemplated that, training system 200 may determine the entropy change for each breakline position. Based on the entropy changes, training system 200 can select at least one breakline position, each associated with an entropy change greater than a predetermined threshold. The predetermined threshold can be empirical value or may be adjusted according to the model to be trained. As discussed above, a greater entropy indicate less linearity. Therefore, when the entropy change associated with a breakline is greater than the predetermined threshold, it indicates the linearity of the training data has been improved greatly. By breaking the training data having less linearity into two hypothetical segments each having a higher linearity, the linearity of the overall training data can be improved. In other words, each segments of the training data can be more linear than the original training data. Therefore, it is more likely to generate liner models that accurately fit the segments generated according to the selected at least one breakline position.

In step S508, training system 200 can segment the training data into a plurality of segments according to the selected breakline position, and the segments can be used as training data for modeling.

In step S510, based on the segments, training system 200 can generate the non-linear model. In some embodiments, training system 200 can generate linear sub-models for the plurality of segments, and further generate the non-linear model by aggregating the sub-models. As discussed above, machine learning can be used to train the sub-models. The linear sub-models can fit the plurality of segments well, as these segments each have better linearity. Training system 200can connect the sub-models in an order of the respective segments in the training data in order to generate the non-linear model.

Another aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the methods, as discussed above. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. In some embodiments, the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.

It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed positioning system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods. Although the above embodiments are described using one-dimensional data as an example, the described training data can include any possible data having more than one dimension.

It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.

Claims

A computer-implemented machine learning method for training a non-linear model, comprising:

acquiring, via a communication interface, training data;

placing, by a processor, a plurality of breaklines at a plurality of breakline positions in the training data;

determining, by the processor, an entropy change for each breakline position;

selecting, by the processor, at least one breakline position, each associated with an entropy change greater than a predetermined threshold;

segmenting, by the processor, the training data into a plurality of segments according to the at least one selected breakline position; and

generating, by the processor, the non-linear model based on the segments.
The method of claim 1, wherein the predetermined threshold is determined according to a depth of the non-linear model.
The method of claim 1, wherein determining an entropy change for each breakline position further comprises:

determining an original entropy based on the unsegmented training data;

determining a breaking entropy based on hypothetical segments of the training data associated with the breakline position; and

determining the entropy change of the breakline position based on the original entropy and the breaking entropy.
The method of claim 1, wherein generating the non-linear model further comprises:

generating sub-models for the plurality of segments; and

generating the non-linear model by aggregating the sub-models.
The method of claim 3, wherein the hypothetical segments include a first segment and a second hypothetical segment, wherein determining a breaking entropy further comprises:

determining a first entropy for the first hypothetical segment of the training data;

determining a second entropy for the second hypothetical segment of the training data; and

determining the breaking entropy associated with the breakline position based on the first entropy and the second entropy.
The method of claim 5, wherein the breaking entropy is determined based a weighted sum of the first entropy and the second entropy.
The method of claim 5, wherein the first hypothetical segment is between the breakline position and a first neighboring breakline position of the breakline position, and the second hypothetical segment is between the breakline position and a second neighboring breakline position of the breakline position.
The method of claim 4, wherein each of the sub-models is a linear model.
The method of claim 4, wherein aggregating the sub-models includes connecting the sub-models in an order of the respective segments in the training data.
The method of claim 1, wherein the non-linear model is configured to generate a price for a transportation service.
A machine learning system for training a non-linear model, comprising:

a communication interface configured to receive training data;

a memory configured to store the training data and the non-linear model; and

at least one processor configured to

place a plurality of breaklines at a plurality of breakline positions in the training data;

determine an entropy change for each breakline position;

select at least one breakline position, each associated with an entropy change greater than a predetermined threshold;

segment the training data into a plurality of segments according to the at least one selected breakline position; and

generate the non-linear model based on the segments.
The system of claim 11, wherein the predetermined threshold is determined according to a depth of the non-linear model.
The system of claim 11, wherein the processor is further configured to:

determine an original entropy based on the unsegmented training data;

determine a breaking entropy based on hypothetical segments of the training data associated with the breakline position; and

determine the entropy change of the breakline position based on the original entropy and the breaking entropy.
The system of claim 11, wherein the processor is further configured to:

generate sub-models for the plurality of segments; and

generate the non-linear model by aggregating the sub-models.
The system of claim 13, wherein the segments include a first hypothetical segment and a second hypothetical segment, wherein the processor is further configured to:

determine a first entropy for the first hypothetical segment of the training data;

determine a second entropy for the second hypothetical segment of the training data; and

determine the breaking entropy associated with the breakline position based on the first entropy and the second entropy.
The system of claim 15, wherein the breaking entropy is determined based a weighted sum of the first entropy and the second entropy.
The system of claim 14, wherein the first hypothetical segment is between the breakline position and a first neighboring breakline position of the breakline position, and the second hypothetical segment is between the breakline position and a second neighboring breakline position of the breakline position.
The system of claim 14, wherein each of the sub-models is a linear model.
The system of claim 11, wherein the non-linear model is configured to generate a price for a transportation service.
A non-transitory computer-readable medium that stores a set of instructions, when executed by at least one processor of an electronic device, cause the electronic device to perform a method for training a non-linear model, the method comprising:

acquiring training data;

placing a plurality of breaklines at a plurality of breakline positions in the training data;

determining an entropy change for each breakline position;

selecting at least one breakline position, each associated with an entropy change greater than a predetermined threshold;

segmenting the training data into a plurality of segments according to the at least one selected breakline position; and

generating the non-linear model based on the segments.