US20120330884A1 - Deciding an optimal action in consideration of risk - Google Patents
Deciding an optimal action in consideration of risk Download PDFInfo
- Publication number
- US20120330884A1 US20120330884A1 US13/603,459 US201213603459A US2012330884A1 US 20120330884 A1 US20120330884 A1 US 20120330884A1 US 201213603459 A US201213603459 A US 201213603459A US 2012330884 A1 US2012330884 A1 US 2012330884A1
- Authority
- US
- United States
- Prior art keywords
- state
- risk
- states
- value
- risk measure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present invention relates to a technique of deciding an optimal action in consideration of risk. More specifically, the present invention relates to a technique of deciding an action using Markov decision process (MDP).
- MDP Markov decision process
- a simulation system and simulation method of integrally evaluating interest risk and credit risk of a portfolio are described in Japanese Unexamined Patent Publication No. 2002-230280.
- the technique provides that: (1) a large number of scenarios from a present time to a risk horizon are generated based on a default-free interest process model and a default process model; (2) a price of a portfolio and a price of an individual asset in the risk horizon are computed for each of the generated scenarios; and (3) a future price distribution of the portfolio and/or a future price distribution of the individual asset are determined based on the computed prices.
- the technique integrally evaluates interest risk and credit risk of the portfolio.
- a Markov decision process problem is a problem to find a rule for deciding an action to be executed in each state in order to maximize an expected cumulative reward obtained from the target.
- Japanese Patent No. 4400837 discloses a technique of creating a graph in which transitions of combinations of each state of an existing credit and each state of an external factor from the first to T-th years are represented.
- the technique provides: (1) for the first year, a node including an existing credit's initial state and the external factor's initial state and; and (2) for the second to T-th years, nodes indicating patterns of combinations of each state of the existing credit and each state of the external factor.
- the aforementioned technique corresponds to finding an optimal policy that, by way of solving a Markov decision process problem of T years by dynamic programming (DP), maximizes an expected total gain for T years while tracking back from a terminal T-th year node.
- DP dynamic programming
- an iterated risk measure is recently receiving attention as a risk measure based on which a financial institution determines its capital.
- a (conditional) value at risk is also called a CTE (conditional tail expectation), but has no time consistency.
- CTE conditional tail expectation
- the iterated risk measure has time consistency. This is described in M. R. Hardy and J. L. Wirch, “The iterated CTE: A dynamic risk measure”, The North American Actuarial Journal, 62-75, 2004.
- the iterated risk measure can represent risk preference that is rational but cannot be represented by expected utility, discounted expected utility, or the like. Accordingly, Japanese Patent Application No. 2010-211588 discloses a technique of optimizing a Markov decision process so as to minimize the iterated risk measure using dynamic programming.
- Another aspect of the present invention provides a method for computing an action that minimizes an iterated risk measure, the method including the steps of: generating, during postdecision, data including combinations of a predetermined state and a possible action on a memory of the computer; selecting a state-action combination data from generated data of the combinations of the state and the action, based on a value associated with each of the combinations; generating, during predecision, a state from selected state-action combination data, by way of a Markov decision process based on a Monte Carlo method; generating a state data sequence by iterating the step of generating a state and the step of generating data including combinations; computing, based on risk measures of a plurality of states transitionable from a present predecision state, a risk measure of an immediately preceding postdecision state by tracking generated states in opposite order to order of the generation, where the risk measure is calculated from a value at risk or an exceedance probability; and setting a value of a state having a minimum value in a present postdecision state to
- Another aspect of the present invention provides a system for computing an iterated risk measure, the system including: a generating module for generating sequentially, by way of a Markov decision process based on a Monte Carlo method, a series of data having states on a memory of a computer; a risk measure module for computing a risk measure of a present object by tracking generated data from opposite order to generation order, where the risk measure is calculated from a value at risk or an exceedance probability that is derived from risk measures of a plurality of states transitionable from a state of the present object; and an executing module for executing the risk measure module while tracking back to starting object.
- Another aspect of the present invention provides a system for computing an action that minimizes an iterated risk measure, the system including: a postdecision module for generating, during postdecision, data including combinations of a predetermined state and a possible action on a memory of the computer; a selecting module for selecting a state-action combination data from generated data of the combinations of the state and the action, based on a value associated with each of the combinations; a predecision module for generating, during predecision, a state from selected state-action combination data, by way of a Markov decision process based on a Monte Carlo method; a state data sequence module for generating a state data sequence by iterating the step of generating a state and the step of generating data including combinations; a risk measure module for computing, based on risk measures of a plurality of states transitionable from a present predecision state, a risk measure of an immediately preceding postdecision state by tracking generated states in opposite order to order of the generation, where the risk measure is calculated from a value at risk
- FIG. 2 is a functional block diagram of a logical structure for a process of computing an iterated risk measure according to an embodiment of the present invention.
- FIG. 3 is a flowchart of the process of computing an iterated risk measure according to an embodiment of the present invention.
- FIG. 4 is a flowchart of a process of a SAMPLE_POLICY routine in the process of computing an iterated risk measure.
- FIG. 5 is a diagram showing correspondence between states and reaching probabilities referenced to in the SAMPLE_POLICY routine.
- FIG. 6 is a flowchart of a process of an UPDATE_VALUE routine in the process of computing an iterated risk measure.
- FIG. 9 is a functional block diagram of a logical structure for a process of deciding an action that minimizes an iterated risk measure according to the present invention.
- FIG. 11 is a flowchart of a process of an EXPLORATION_POLICY routine in the process of deciding an action that minimizes an iterated risk measure.
- FIG. 14 is a diagram showing correspondence between postdecision states and values referenced to in the UPDATE_VALUE_MIN routine.
- FIG. 15 is a diagram schematically showing the process of deciding an action that minimizes an iterated risk measure.
- a (iterated risk measure provisional) value of a postdecision state is computed using a value of a next reachable predecision state, as in the above-mentioned technique of the first aspect.
- a (iterated risk measure provisional) value of a predecision state is updated using a minimum iterated risk measure provisional value of a next reachable postdecision state. As a result of iteration, an action sequence that minimizes an iterated risk measure is selected.
- FIG. 1 is a block diagram of computer hardware for realizing a system structure and process according to an embodiment of the present invention.
- a CPU 104 a main memory (RAM) 106 , a hard disk drive (HDD) 108 , a keyboard 110 , a mouse 112 , and a display 114 are connected to a system path 102 .
- the CPU 104 is preferably based on a 32-bit or 64-bit architecture. For example, PentiumTM 4, CoreTM 2 Duo, or XeonTM by Intel Corporation, AthlonTM by AMD, or the like can be used for the CPU 104 .
- the main memory 106 preferably has a capacity of 4 GB or more.
- the hard disk drive 108 desirably has a capacity of, for example, 500 GB or more, to allow a large amount of data to be stored.
- the display 114 is preferably a liquid crystal display, and can have, for example, an arbitrary resolution such as XGA (1024 ⁇ 768 in resolution) or UXGA (1600 ⁇ 1200 in resolution).
- the display 114 is used to display an operation window for starting the process according to the present invention, a computation result of a selected action, risk, and the like, though not shown.
- a process of selecting a stock in which a user is to invest in each term with predetermined money in possession is described as the process according to the present invention, though the present invention is not limited to such.
- the following scenario is assumed.
- a stock in which the user is to invest in each term is selected, starting from predetermined money in possession.
- a state is represented by a combination of (money in possession, stock in which the user invests, time).
- action candidates as many as stock types.
- a return as a result of taking an action in each state is determined by a return for a period of a corresponding stock.
- a parameter 204 includes parameters and data for computing probability of a Markov decision process indicating performance of various stocks, and the like.
- a SAMPLE_POLICY routine 206 is a routine for performing a process of generating a state with a predetermined probability by a generated random number, according to a Monte Carlo method.
- An UPDATE_VALUE routine 208 is a routine for computing a risk measure by referencing to a set of directly transitionable states.
- An output routine 210 is a routine for outputting a risk value as a computation result.
- the computation result is displayed on the display 114 according to need.
- this process is started by an operator operating a menu of a window screen displayed on the display 114 using the keyboard 110 or the mouse 112 .
- FIG. 8 schematically shows such data objects.
- a series of data objects 802 , 804 , 806 , and 808 is shown in FIG. 8 .
- step 304 the main routine 202 pushes the state s onto a stack. Such pushing the state s onto the stack is performed for later popping and backtracking of the state.
- FIG. 5 shows a transition probability of each state s i from s. Such correspondence information is prepared beforehand for each different s, in the parameter 204 .
- the SAMPLE_POLICY routine 206 outputs a state s m corresponding to the random number m in step 404 .
- step 306 in FIG. 3 such a returned value s m is assigned to s. This corresponds to a situation where a transition is made to a state S 2 of the data object 804 in FIG. 8 .
- step 308 the main routine 202 pushes the state s onto the stack.
- step 310 the main routine 202 determines whether or not to stop forward sampling.
- a criterion for stopping forward sampling is, for example, whether or not states are generated for a predetermined number of stages. In the example in FIG. 8 , the state 802 is the first stage, the state 804 is the second stage, the state 806 is the third stage, and the state 808 is the fourth stage.
- the criterion for stopping forward sampling can be whether or not a predetermined time elapses from the start of the process.
- the main routine 202 determines that the criterion for stopping forward sampling is not met, the main routine 202 returns to step 306 and calls the SAMPLE_POLICY routine 206 .
- the main routine 202 determines that the criterion for stopping forward sampling is met in step 310 , the main routine 202 goes to step 312 , and pops the state s from the stack.
- step 314 the main routine 202 calls the UPDATE_VALUE routine 208 by UPDATE_VALUE(s).
- FIG. 6 shows a detailed process of the UPDATE_VALUE routine 208 .
- Step 602 is a definition block.
- the UPDATE_VALUE routine 208 sets ⁇ s 1 , s 2 , . . . , s n ⁇ as a set of states directly transitionable from s (i.e. having a transition probability more than 0), where n is the number of directly transitionable states from s.
- states 806 a , 806 b , and 806 c are directly transitionable states from the state S 2 designated by reference numeral 804 c .
- the computation result is denoted by V ⁇ .
- VaR ⁇ ⁇ ⁇ % ⁇ [ X ] inf x ⁇ R ⁇ ⁇ ⁇ i ⁇ : ⁇ v i > x ⁇ p i ⁇ 1 - ⁇ 100 ⁇ [ Math . ⁇ 1 ]
- An exceedance probability can be computed instead of V ⁇ , according to the following expression.
- the main routine 202 determines whether or not a stopping condition is met in step 318 .
- the stopping condition mentioned here is any of whether or not a predetermined time elapses from the start of the process shown in the flowchart in FIG. 3 , whether or not a loop of steps 302 to 318 is performed a predetermined number of times, or whether or not a risk measure computed value at the starting point designated by S 1 in FIG. 8 eventually has only a change of a threshold or below from a value computed in the immediately preceding loop of steps 302 to 318 , though the present invention is not limited to such.
- the process ends, and the output routine 210 outputs a risk measure value corresponding to the first state S 1 in FIG. 8 .
- processing routines for executing a process of approximately deciding an action that minimizes an iterated risk measure in a specific state according to the present invention, with reference to a functional block diagram in FIG. 9 .
- These processing routines are also generated in an existing programming language such as C, C++, or Java® beforehand, held in the hard disk drive 108 in an executable form, and loaded into the main memory 106 and executed according to the operating system.
- the process of approximately deciding an action that minimizes an iterated risk measure uses the routine for approximately computing an iterated risk measure shown in FIG. 2 , and so there are some common processing routines. However, the processing routines in FIG. 9 are given different reference numerals from those in FIG. 2 .
- a process of selecting a stock in which the user is to invest in each term with predetermined money in possession is described as the process according to the present invention.
- the following scenario is assumed.
- a stock in which the user is to invest in each term is selected, starting from predetermined money in possession.
- a state is represented by a combination of (money in possession, stock in which the user invests, time).
- action candidates as many as stock types.
- a return as a result of taking an action in each state is determined by a return for a period of a corresponding stock.
- a main routine 902 is a program for an overall operation according to the present invention, and has a function of displaying an operation window on the display 114 , receiving a user operation and starting a process, and the like, though not shown.
- a parameter 904 includes parameters and data for computing probability of a Markov decision process indicating performance of various stocks, and the like.
- An EXPLORATION_POLICY routine 908 is a routine for selecting a postdecision state.
- An UPDATE_VALUE routine 910 is a routine for computing a risk measure by referencing to a set of directly transitionable states.
- the UPDATE_VALUE routine 910 can be the same as the UPDATE_VALUE routine 208 in FIG. 2 .
- An UPDATE_VALUE_MIN routine 912 is a routine for returning a minimum value in the set of directly transitionable states.
- An output routine 914 is a routine for outputting an action sequence as a computation result.
- the computation result is displayed on the display 114 according to need.
- this process is started by an operator operating a menu of a window screen displayed on the display 114 using the keyboard 110 or the mouse 112 .
- FIG. 15 schematically shows such data objects.
- a series of data objects 1502 , 1504 , 1506 , 1508 , 1510 , 1512 , 1514 , and 1516 is shown in FIG. 15 .
- two states that are a predecision state and a postdecision state are used.
- the data objects 1502 , 1506 , 1510 , and 1514 correspond to predecision states
- the data objects 1504 , 1508 , 1512 , and 1516 correspond to postdecision states.
- step 1004 the main routine 902 pushes the state s onto a stack. Such pushing the state s onto the stack is performed for later popping and backtracking of the state.
- step 1006 the main routine 902 calls the EXPLORATION_POLICY routine 908 by EXPLORATION_POLICY(s) using s as an argument.
- FIG. 11 shows a detailed process of the EXPLORATION_POLICY routine 908 .
- the EXPLORATION_POLICY routine 908 sets ⁇ a 1 , a 2 , . . . , a n ⁇ as a set of actions that can be taken in the state s.
- the value mentioned here is the same as that described with reference to FIG. 7 .
- the number of visits to s′ i is denoted by c i .
- the number of visits c i is recorded in order to select a balanced action sequence by avoiding a postdecision state with a large number of visits as much as possible.
- FIG. 12 shows an example of correspondence between postdecision states, values, and counters.
- step 1104 the EXPLORATION_POLICY routine 908 computes i that minimizes a function f(v i , c i ).
- the function f is an expression such as f(v, c) ⁇ v+ ⁇ ( ⁇ /c) 0.6 , though the present invention is not limited to such. That is, the function f has a requirement of monotonically increasing with v and monotonically decreasing with c. ⁇ and ⁇ are positive constants, and parameters that can be arbitrarily set.
- the EXPLORATION_POLICY routine 908 sets the computed i as i* in step 1104 .
- postdecision states 1504 a , 1504 b , and 1504 c can be reached from the predecision state 1502 by possible different actions, the postdecision state 1504 c is selected according to the computation in step 1104 .
- step 1006 in FIG. 10 after the EXPLORATION_POLICY routine 908 is completed and s′ is output in step 1006 , the main routine 902 pushes s′ onto the stack in step 1008 .
- step 1010 the main routine 902 calls the SAMPLE_POLICY routine 906 by SAMPLE_POLICY(s′) using s′ as an argument.
- the SAMPLE_POLICY routine 906 performs a process of selecting one transitionable state based on the combination of a Monte Carlo method and a Markov decision process, in the same manner as the SAMPLE_POLICY routine 206 . Since this process is the same as that shown in the flowchart in FIG. 4 , its description is omitted here.
- This state selection corresponds to selecting a state 1506 b in the predecision state 1506 from the state (S 1 ) 1504 c in the postdecision state 1504 in FIG. 15 .
- the main routine 902 pushes s onto the stack in step 1012 .
- step 1014 the main routine 902 determines whether or not to stop forward sampling.
- a criterion for stopping forward sampling is, for example, whether or not states are generated for a predetermined number of stages.
- the criterion for stopping forward sampling can be whether or not a predetermined time elapses from the start of the process.
- the main routine 902 determines that the criterion for stopping forward sampling is not met, the main routine 902 returns to step 1006 to call the EXPLORATION_POLICY routine 908 .
- the main routine 902 determines that the criterion for stopping forward sampling is met in step 1014 , the main routine 902 goes to step 1016 , and pops the state s from the stack.
- the main routine 902 calls the UPDATE_VALUE_MIN routine 912 by UPDATE_VALUE_MIN(s) using the popped state s.
- step 1302 is a definition step.
- the UPDATE_VALUE_MIN routine 912 sets ⁇ s′ 1 , s′ 2 , . . . , s′ n ⁇ as a set of postdecision states directly reachable from s.
- FIG. 14 shows correspondence between postdecision states and values.
- the UPDATE_VALUE_MIN routine 912 stores the computed v as a value of s.
- supposing that the popped state s is a predecision state (S 4 ) 1514 c
- actions 1516 a , 1516 b , and 1516 c are actions that can be taken in the state 1514 c .
- this value is stored in the state 1514 c.
- the main routine 902 determines whether or not the stack is empty in step 1020 . In the case where the stack is empty, the main routine 902 determines whether or not a stopping condition is met in step 1026 .
- the stopping condition mentioned here is any of whether or not a predetermined time elapses from the start of the process shown in the flowchart in FIG. 10 , whether or not a loop of steps 1002 to 1014 is performed a predetermined number of times, or whether or not a value at the starting point designated by S 1 in FIG. 15 eventually has only a change of a threshold or below from a value computed in the immediately preceding loop of steps 1002 to 1014 .
- the main routine 902 determines that the stopping condition is met, the process ends. Otherwise, the main routine 902 returns to step 1002 , to resume the process from the first step. At this time, the values set in the states in the previous loop are maintained, and put to use in the next computation.
- the main routine 902 determines that the stack is not empty in step 1020 , the main routine 902 pops the state s′ from the stack in step 1022 .
- the main routine 902 then calls the UPDATE_VALUE routine 910 by UPDATE_VALUE(s′) in step 1024 , to update the value of s′.
- the process of the UPDATE_VALUE routine 910 is substantially the same as the process of the UPDATE_VALUE routine 208 , which is shown in detail in the flowchart in FIG. 6 .
- a value of a postdecision state (S 3 ) 1512 b is computed from predecision states 1514 a , 1514 b , and 1514 c.
- step 1024 the main routine 902 returns to step 1016 .
- an action sequence of actions 1504 c , 1508 a , 1512 b , and 1516 c is obtained.
- the main routine 902 calls the output routine 914 to output, for each predecision state, an action a associated with a postdecision state (s, a) having a minimum value among postdecision states directly transitionable from the predecision state.
- the output result is preferably displayed on the display 114 or written to a file.
- the present invention is not limited to this, and is applicable to any decision making process that involves probabilistic risk computation performed sequentially in time series.
- the present invention is not limited to a specific hardware and software platform of a computer, and can be implemented with any platform.
- aspects of the present invention can be embodied as a system, method or computer program product. Accordingly, aspects of the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that can all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention can take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
- a computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium can include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
- a computer readable storage medium can be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- Computer program code for carrying out operations for aspects of the present invention can be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the program code can execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer.
- These computer program instructions can also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams can represent a module, segment, or portion of code, which includes one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block can occur out of the order noted in the figures. For example, two blocks shown in succession can, in fact, be executed substantially concurrently, or the blocks can sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Mathematical Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Mathematics (AREA)
- Artificial Intelligence (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Complex Calculations (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
A method and system for deciding an optimal action in consideration of risk. The method includes the steps of: generating sequentially, by way of a Markov decision process based on a Monte Carlo method, a series of data having states on a memory of a computer; computing a risk measure of a present data by tracking generated data from opposite order to generation order, where the risk measure is calculated from a value at risk or an exceedance probability that is derived from risk measures of a plurality of states transitionable from a state of the present data; and executing the step of computing the risk measure while tracking back to starting data, where at least one of the steps is carried out using a computer device.
Description
- This application is a continuation of and claims priority from U.S. patent application Ser. No. 13/371,513, filed Feb. 13, 2012, which in turn claims priority under 35 U.S.C. §119 from Japanese Patent Application No. 2011-029660 filed Feb. 15, 2011, the entire contents of both are incorporated herein by reference.
- 1. Field of the Invention
- The present invention relates to a technique of deciding an optimal action in consideration of risk. More specifically, the present invention relates to a technique of deciding an action using Markov decision process (MDP).
- 2. Description of Related Art
- A simulation system and simulation method of integrally evaluating interest risk and credit risk of a portfolio are described in Japanese Unexamined Patent Publication No. 2002-230280. The technique provides that: (1) a large number of scenarios from a present time to a risk horizon are generated based on a default-free interest process model and a default process model; (2) a price of a portfolio and a price of an individual asset in the risk horizon are computed for each of the generated scenarios; and (3) a future price distribution of the portfolio and/or a future price distribution of the individual asset are determined based on the computed prices. As a result, the technique integrally evaluates interest risk and credit risk of the portfolio.
- Research is also conducted on a risk computation technique that uses a Markov process. The Markov process has a Markov property, where its future state transition depends only on its present state and independently of its past state. Research is further conducted on an action decision technique that uses a Markov decision process, which is an extension of the Markov process. For example, for a target capable of undergoing state transitions, a Markov decision process problem is a problem to find a rule for deciding an action to be executed in each state in order to maximize an expected cumulative reward obtained from the target.
- To provide a credit portfolio control method used for selecting an optimal policy in credit control to enable situations of external factors such as an economic environment and a set credit line to be reflected on a future credit rating transition probability, Japanese Patent No. 4400837 discloses a technique of creating a graph in which transitions of combinations of each state of an existing credit and each state of an external factor from the first to T-th years are represented. The technique provides: (1) for the first year, a node including an existing credit's initial state and the external factor's initial state and; and (2) for the second to T-th years, nodes indicating patterns of combinations of each state of the existing credit and each state of the external factor. The aforementioned technique corresponds to finding an optimal policy that, by way of solving a Markov decision process problem of T years by dynamic programming (DP), maximizes an expected total gain for T years while tracking back from a terminal T-th year node.
- In addition, an iterated risk measure is recently receiving attention as a risk measure based on which a financial institution determines its capital. A (conditional) value at risk is also called a CTE (conditional tail expectation), but has no time consistency. However, the iterated risk measure has time consistency. This is described in M. R. Hardy and J. L. Wirch, “The iterated CTE: A dynamic risk measure”, The North American Actuarial Journal, 62-75, 2004.
- However, a backward-computed iterated CTE (ICTE) is considered to be difficult to implement, because ICTE requires a large computation load. Furthermore, a typical Monte Carlo method cannot handle ICTE.
- The iterated risk measure can represent risk preference that is rational but cannot be represented by expected utility, discounted expected utility, or the like. Accordingly, Japanese Patent Application No. 2010-211588 discloses a technique of optimizing a Markov decision process so as to minimize the iterated risk measure using dynamic programming.
- However, the technique described in the specification of Japanese Patent Application No. 2010-211588 requires an extremely long computation time when the number of possible states or actions increases. Thus, the technique can actually solve only limited problems, and as a result the technique is constrained.
- Accordingly, one aspect of the present invention provides a method for computing an iterated risk measure, the method including the steps of: generating sequentially, by way of a Markov decision process based on a Monte Carlo method, a series of data having states on a memory of a computer; computing a risk measure of a present data by tracking generated data from opposite order to generation order, where the risk measure is calculated from a value at risk or an exceedance probability that is derived from risk measures of a plurality of states transitionable from a state of the present data; and executing the step of computing the risk measure while tracking back to starting data, where at least one of the steps is carried out using a computer device.
- Another aspect of the present invention provides a method for computing an action that minimizes an iterated risk measure, the method including the steps of: generating, during postdecision, data including combinations of a predetermined state and a possible action on a memory of the computer; selecting a state-action combination data from generated data of the combinations of the state and the action, based on a value associated with each of the combinations; generating, during predecision, a state from selected state-action combination data, by way of a Markov decision process based on a Monte Carlo method; generating a state data sequence by iterating the step of generating a state and the step of generating data including combinations; computing, based on risk measures of a plurality of states transitionable from a present predecision state, a risk measure of an immediately preceding postdecision state by tracking generated states in opposite order to order of the generation, where the risk measure is calculated from a value at risk or an exceedance probability; and setting a value of a state having a minimum value in a present postdecision state to an immediately preceding predecision state, by tracking the generated states in the opposite order to the order of the generation, where at least one of the steps is carried out using a computer device.
- Another aspect of the present invention provides a system for computing an iterated risk measure, the system including: a generating module for generating sequentially, by way of a Markov decision process based on a Monte Carlo method, a series of data having states on a memory of a computer; a risk measure module for computing a risk measure of a present object by tracking generated data from opposite order to generation order, where the risk measure is calculated from a value at risk or an exceedance probability that is derived from risk measures of a plurality of states transitionable from a state of the present object; and an executing module for executing the risk measure module while tracking back to starting object.
- Another aspect of the present invention provides a system for computing an action that minimizes an iterated risk measure, the system including: a postdecision module for generating, during postdecision, data including combinations of a predetermined state and a possible action on a memory of the computer; a selecting module for selecting a state-action combination data from generated data of the combinations of the state and the action, based on a value associated with each of the combinations; a predecision module for generating, during predecision, a state from selected state-action combination data, by way of a Markov decision process based on a Monte Carlo method; a state data sequence module for generating a state data sequence by iterating the step of generating a state and the step of generating data including combinations; a risk measure module for computing, based on risk measures of a plurality of states transitionable from a present predecision state, a risk measure of an immediately preceding postdecision state by tracking generated states in opposite order to order of the generation, where the risk measure is calculated from a value at risk or an exceedance probability; and a value module for setting a value of a state having a minimum value in a present postdecision state to an immediately preceding predecision state, by tracking the generated states in the opposite order to the order of the generation.
- According to the present invention, it becomes possible to provide a technique of approximately obtaining an iterated risk measure at high speed using a probabilistic method such as a Monte Carlo method, where a typical iterated risk measure normally requires considerable time when precisely computed.
- It is also possible to provide a technique of obtaining an action sequence that minimizes an iterated risk measure at high speed, using the above-mentioned technique of approximately obtaining an iterated risk measure at high speed.
-
FIG. 1 is a block diagram showing a hardware structure as an example for implementing the present invention. -
FIG. 2 is a functional block diagram of a logical structure for a process of computing an iterated risk measure according to an embodiment of the present invention. -
FIG. 3 is a flowchart of the process of computing an iterated risk measure according to an embodiment of the present invention. -
FIG. 4 is a flowchart of a process of a SAMPLE_POLICY routine in the process of computing an iterated risk measure. -
FIG. 5 is a diagram showing correspondence between states and reaching probabilities referenced to in the SAMPLE_POLICY routine. -
FIG. 6 is a flowchart of a process of an UPDATE_VALUE routine in the process of computing an iterated risk measure. -
FIG. 7 is a diagram showing correspondence between states, values, and reaching probabilities referenced to in the UPDATE_VALUE routine. -
FIG. 8 is a diagram schematically showing the process of computing an iterated risk measure. -
FIG. 9 is a functional block diagram of a logical structure for a process of deciding an action that minimizes an iterated risk measure according to the present invention. -
FIG. 10 is a flowchart of the process of deciding an action that minimizes an iterated risk measure according to the present invention. -
FIG. 11 is a flowchart of a process of an EXPLORATION_POLICY routine in the process of deciding an action that minimizes an iterated risk measure. -
FIG. 12 is a diagram showing correspondence between postdecision states, values, and counters referenced to in the EXPLORATION_POLICY routine. -
FIG. 13 is a flowchart of a process of an UPDATE_VALUE_MIN routine in the process of deciding an action that minimizes an iterated risk measure. -
FIG. 14 is a diagram showing correspondence between postdecision states and values referenced to in the UPDATE_VALUE_MIN routine. -
FIG. 15 is a diagram schematically showing the process of deciding an action that minimizes an iterated risk measure. - The following describes an embodiment of the present invention based on drawings. The same reference numerals designate the same elements throughout the drawings, unless otherwise stated. Note that the following merely describes one embodiment of the present invention, and the scope of the present invention is not limited to this embodiment.
- It is an object of the present invention to provide a technique of computing an iterated risk measure at high speed using a Monte Carlo method, in a Markov process.
- It is another object of the present invention to provide a technique of approximately deciding an action that minimizes an iterated risk measure by applying the above-mentioned technique of computing an iterated risk measure at high speed, in a Markov decision process of such a size that cannot be precisely optimized.
- In a first aspect of the present invention, a state sequence (S1, S2, . . . , Sn) is generated by sequential sampling based on a Markov process, by processing of a computer. A (iterated risk measure provisional) value (V(Sn), . . . , V(S2), V(S1)) of each state is then updated in opposite order (Sn, . . . , S2, S1) to the generation order.
- A value V(Si) of each state Si is updated according to a risk measure (especially computed using a value at risk or an exceedance probability or partially using them) of a random variable defined from a transition probability p(Si+1(j)|Si) to a state (Si+1(1), Si+1(2), . . . , Si+1(m)) reachable from the state by one transition and a value V(Si+1(j)) of the transition destination. Iterating this process of a predetermined duration yields an iterated risk measure approximate value. Hereafter, the iterated risk measure provisional value is also simply referred to as a “value”.
- In a second aspect of the present invention, a technique of approximately deciding an action that minimizes an iterated risk measure in a specific state through the use of the above-mentioned technique of approximately computing a risk measure is provided. In this technique, states are generated so that a predecision state and a postdecision state appear alternately in the above-mentioned technique of the first aspect.
- A (iterated risk measure provisional) value of a postdecision state is computed using a value of a next reachable predecision state, as in the above-mentioned technique of the first aspect. A (iterated risk measure provisional) value of a predecision state is updated using a minimum iterated risk measure provisional value of a next reachable postdecision state. As a result of iteration, an action sequence that minimizes an iterated risk measure is selected.
-
FIG. 1 is a block diagram of computer hardware for realizing a system structure and process according to an embodiment of the present invention. InFIG. 1 , aCPU 104, a main memory (RAM) 106, a hard disk drive (HDD) 108, a keyboard 110, amouse 112, and adisplay 114 are connected to asystem path 102. TheCPU 104 is preferably based on a 32-bit or 64-bit architecture. For example,Pentium™ 4,Core™ 2 Duo, or Xeon™ by Intel Corporation, Athlon™ by AMD, or the like can be used for theCPU 104. Themain memory 106 preferably has a capacity of 4 GB or more. Thehard disk drive 108 desirably has a capacity of, for example, 500 GB or more, to allow a large amount of data to be stored. - The
hard disk drive 108 stores an operating system beforehand, though not shown. The operating system can be an arbitrary operating system compatible with theCPU 104, such as Linux™, Windows XP™ orWindows™ 7 by Microsoft Corporation, or Mac OS™ by Apple Inc. - The
hard disk drive 108 also stores data and parameters for probability computation of a Markov decision process, processing routines for the process according to the present invention, and so on. These parameters and processing routines will be described in detail later, with reference toFIG. 2 . - The keyboard 110 and the
mouse 112 are used to activate the operating system or a program (not shown) which is loaded from thehard disk drive 108 into themain memory 106 and displayed on thedisplay 114, or enter characters. - The
display 114 is preferably a liquid crystal display, and can have, for example, an arbitrary resolution such as XGA (1024×768 in resolution) or UXGA (1600×1200 in resolution). Thedisplay 114 is used to display an operation window for starting the process according to the present invention, a computation result of a selected action, risk, and the like, though not shown. - The following describes processing routines for executing especially a process of approximately computing an iterated risk measure according to the present invention, with reference to a functional block diagram in
FIG. 2 . These processing routines are generated in an existing programming language such as C, C++, or Java® beforehand, held in thehard disk drive 108 in an executable form, and loaded into themain memory 106 and executed according to the operating system. - In this embodiment, a process of selecting a stock in which a user is to invest in each term with predetermined money in possession is described as the process according to the present invention, though the present invention is not limited to such. The following scenario is assumed. A stock in which the user is to invest in each term is selected, starting from predetermined money in possession. A state is represented by a combination of (money in possession, stock in which the user invests, time). There are action candidates as many as stock types. In each term, there are action candidates as many as stock types, and which stock the user is to invest in is decided. A return as a result of taking an action in each state is determined by a return for a period of a corresponding stock.
- A
main routine 202 is a program for an overall operation according to the present invention, and has a function of displaying an operation window on thedisplay 114, receiving a user operation and starting a process, and the like, though not shown. - A
parameter 204 includes parameters and data for computing probability of a Markov decision process indicating performance of various stocks, and the like. - A
SAMPLE_POLICY routine 206 is a routine for performing a process of generating a state with a predetermined probability by a generated random number, according to a Monte Carlo method. - An
UPDATE_VALUE routine 208 is a routine for computing a risk measure by referencing to a set of directly transitionable states. - An
output routine 210 is a routine for outputting a risk value as a computation result. The computation result is displayed on thedisplay 114 according to need. - The following describes the process of approximately computing an iterated risk measure according to the present invention, with reference to a flowchart in
FIG. 3 . For example, this process is started by an operator operating a menu of a window screen displayed on thedisplay 114 using the keyboard 110 or themouse 112. - In this embodiment, it is assumed that a series of data objects for storing states are already loaded into the
main memory 106 prior to the process described below. The data objects are, for example, instances of a class in Java® or C++.FIG. 8 schematically shows such data objects. A series of data objects 802, 804, 806, and 808 is shown inFIG. 8 . - In
step 302, the main routine 202 sets an initial value of a variable s indicating a state, from theparameter 204. The variable s is set to an attribute value of the data object 802 which is the first data object inFIG. 8 . For example, the state is represented by a combination of (money in possession, stock in which the user invests, time). - In
step 304, themain routine 202 pushes the state s onto a stack. Such pushing the state s onto the stack is performed for later popping and backtracking of the state. - Next, in
step 306, the main routine 202 calls the SAMPLE_POLICY routine 206 by SAMPLE_POLICY(s) using s as an argument. -
FIG. 4 shows a detailed process of theSAMPLE_POLICY routine 206. InFIG. 4 , the SAMPLE_POLICY routine 206 generates, for i=1, . . . , n, a random number so that i occurs with a probability pi, instep 402. The generated random number is denoted by m (1≦m≦n). The probability pi mentioned here is a probability of transiting from the state s to a state si, in a Markov process context. -
FIG. 5 shows a transition probability of each state si from s. Such correspondence information is prepared beforehand for each different s, in theparameter 204. - The SAMPLE_POLICY routine 206 outputs a state sm corresponding to the random number m in
step 404. - Returning to step 306 in
FIG. 3 , such a returned value sm is assigned to s. This corresponds to a situation where a transition is made to a state S2 of the data object 804 inFIG. 8 . - In
step 308, themain routine 202 pushes the state s onto the stack. Instep 310, themain routine 202 determines whether or not to stop forward sampling. A criterion for stopping forward sampling is, for example, whether or not states are generated for a predetermined number of stages. In the example inFIG. 8 , thestate 802 is the first stage, thestate 804 is the second stage, thestate 806 is the third stage, and thestate 808 is the fourth stage. Alternatively, the criterion for stopping forward sampling can be whether or not a predetermined time elapses from the start of the process. - In the case where the
main routine 202 determines that the criterion for stopping forward sampling is not met, the main routine 202 returns to step 306 and calls theSAMPLE_POLICY routine 206. - On the other hand, in the case where the
main routine 202 determines that the criterion for stopping forward sampling is met instep 310, themain routine 202 goes to step 312, and pops the state s from the stack. - Next, in
step 314, the main routine 202 calls the UPDATE_VALUE routine 208 by UPDATE_VALUE(s). -
FIG. 6 shows a detailed process of theUPDATE_VALUE routine 208. Step 602 is a definition block. Instep 602, the UPDATE_VALUE routine 208 sets {s1, s2, . . . , sn} as a set of states directly transitionable from s (i.e. having a transition probability more than 0), where n is the number of directly transitionable states from s. In the example inFIG. 8 , states 806 a, 806 b, and 806 c are directly transitionable states from the state S2 designated byreference numeral 804 c. The UPDATE_VALUE routine 208 also sets, for i=1, . . . , n, pi as a probability of transitioning from s to si, and v i as a value (iterated risk measure provisional value) of si.FIG. 7 shows this correspondence. InFIG. 7 , the fields of the state and the reaching probability from s are based on values stored in theparameter 204 beforehand, but the field of the value can initially store 0. This being the case, values are sequentially stored as a result of computation. Alternatively, the value can be initially set as the money in possession in the state. The present invention can be realized with other initial value settings. - In
step 604, the UPDATE_VALUE routine 208 computes, for i=1, . . . , n, a α % value at risk of a random variable X that takes the value vi with the probability pi, according to the following expression. The computation result is denoted by Vα. -
- An exceedance probability can be computed instead of Vα, according to the following expression.
-
- In
step 606, the UPDATE_VALUE routine 208 computes a risk measure v of X, using Vα or the exceedance probability. For example, this computation is performed as v=E[X|X>Vα]. As an alternative, the risk measure v can be computed by the following expression partially using the exceedance probability. -
ρn [Y]=E[Y]−c(Pr(Y≦0)−α)I{Pr(Y≦0)≧α} [Math. 3] - In this expression, I{ } is a function that returns 1 when the expression in { } is true, and 0 when the expression in { } is false.
- In
step 608, the UPDATE_VALUE routine 208 stores the computed v as a value corresponding to the state s. In the example inFIG. 8 , the risk measure v computed based on the 806 a, 806 b, and 806 c is stored in association with the state S2.states - Returning to the process of the flowchart in
FIG. 3 , afterstep 314 of calling UPDATE_VALUE(s), themain routine 202 determines whether or not the stack is empty instep 316. In the case where the stack is not empty, the main routine 202 returns to step 312. - In the case where the
main routine 202 determines that the stack is empty instep 316, themain routine 202 determines whether or not a stopping condition is met instep 318. The stopping condition mentioned here is any of whether or not a predetermined time elapses from the start of the process shown in the flowchart inFIG. 3 , whether or not a loop ofsteps 302 to 318 is performed a predetermined number of times, or whether or not a risk measure computed value at the starting point designated by S1 inFIG. 8 eventually has only a change of a threshold or below from a value computed in the immediately preceding loop ofsteps 302 to 318, though the present invention is not limited to such. - In the case where the
main routine 202 determines that the stopping condition is not met instep 318, the main routine 202 returns to step 302, to resume computation from the first state S1 inFIG. 8 . At this time, the previously computed risk measure values (the values in the value field inFIG. 7 ) are maintained, so that the intermediate risk measure values which were initially mostly 0 are gradually changed to nonzero values as the loop ofsteps 302 to 318 is repeated. - In the case where the
main routine 202 determines that the stopping condition is met instep 318, the process ends, and theoutput routine 210 outputs a risk measure value corresponding to the first state S1 inFIG. 8 . - The following describes processing routines for executing a process of approximately deciding an action that minimizes an iterated risk measure in a specific state according to the present invention, with reference to a functional block diagram in
FIG. 9 . These processing routines are also generated in an existing programming language such as C, C++, or Java® beforehand, held in thehard disk drive 108 in an executable form, and loaded into themain memory 106 and executed according to the operating system. - The process of approximately deciding an action that minimizes an iterated risk measure uses the routine for approximately computing an iterated risk measure shown in
FIG. 2 , and so there are some common processing routines. However, the processing routines inFIG. 9 are given different reference numerals from those inFIG. 2 . - In this embodiment, too, a process of selecting a stock in which the user is to invest in each term with predetermined money in possession is described as the process according to the present invention. The following scenario is assumed. A stock in which the user is to invest in each term is selected, starting from predetermined money in possession. A state is represented by a combination of (money in possession, stock in which the user invests, time). There are action candidates as many as stock types. In each term, there are action candidates as many as stock types, and which stock the user is to invest in is decided. A return as a result of taking an action in each state is determined by a return for a period of a corresponding stock.
- A
main routine 902 is a program for an overall operation according to the present invention, and has a function of displaying an operation window on thedisplay 114, receiving a user operation and starting a process, and the like, though not shown. - A parameter 904 includes parameters and data for computing probability of a Markov decision process indicating performance of various stocks, and the like.
- A
SAMPLE_POLICY routine 906 is a routine for performing a process of generating a state with a predetermined probability by a generated random number, according to a Monte Carlo method. The SAMPLE_POLICY routine 906 can be the same as the SAMPLE_POLICY routine 206 inFIG. 2 . - An
EXPLORATION_POLICY routine 908 is a routine for selecting a postdecision state. - An
UPDATE_VALUE routine 910 is a routine for computing a risk measure by referencing to a set of directly transitionable states. The UPDATE_VALUE routine 910 can be the same as the UPDATE_VALUE routine 208 inFIG. 2 . - An
UPDATE_VALUE_MIN routine 912 is a routine for returning a minimum value in the set of directly transitionable states. - An output routine 914 is a routine for outputting an action sequence as a computation result. The computation result is displayed on the
display 114 according to need. - The following describes the process of approximately deciding an action that minimizes an iterated risk measure according to the present invention, with reference to a flowchart in
FIG. 10 . For example, this process is started by an operator operating a menu of a window screen displayed on thedisplay 114 using the keyboard 110 or themouse 112. - In this embodiment, it is assumed that a series of data objects for storing states are already loaded into the
main memory 106 prior to the process described below. The data objects are, for example, instances of a class in Java® or C++.FIG. 15 schematically shows such data objects. A series of 1502, 1504, 1506, 1508, 1510, 1512, 1514, and 1516 is shown indata objects FIG. 15 . In this embodiment, two states that are a predecision state and a postdecision state are used. InFIG. 15 , the data objects 1502, 1506, 1510, and 1514 correspond to predecision states, and the data objects 1504, 1508, 1512, and 1516 correspond to postdecision states. - In
step 1002, the main routine 902 sets an initial value of a variable s indicating a state, from the held parameter 904. The variable s is set to an attribute value of the data object 1502 which is the first data object inFIG. 15 and corresponds to a predecision state. For example, the state is represented by a combination of (money in possession, stock in which the user invests, time). - In
step 1004, themain routine 902 pushes the state s onto a stack. Such pushing the state s onto the stack is performed for later popping and backtracking of the state. - In
step 1006, the main routine 902 calls the EXPLORATION_POLICY routine 908 by EXPLORATION_POLICY(s) using s as an argument. -
FIG. 11 shows a detailed process of theEXPLORATION_POLICY routine 908. As shown indefinition step 1102, the EXPLORATION_POLICY routine 908 sets {a1, a2, . . . , an} as a set of actions that can be taken in the state s. The EXPLORATION_POLICY routine 908 also sets, for i=1, . . . , n, s′i=(s, ai) as a postdecision state when ai is taken in s, vi as a value of s′i, and c i as the number of visits to s′i. The value mentioned here is the same as that described with reference toFIG. 7 . The number of visits to s′i is denoted by ci. The number of visits ci is recorded in order to select a balanced action sequence by avoiding a postdecision state with a large number of visits as much as possible.FIG. 12 shows an example of correspondence between postdecision states, values, and counters. - In
step 1104, the EXPLORATION_POLICY routine 908 computes i that minimizes a function f(vi, ci). - For example, the function f is an expression such as f(v, c)≡v+α(β/c)0.6, though the present invention is not limited to such. That is, the function f has a requirement of monotonically increasing with v and monotonically decreasing with c. α and β are positive constants, and parameters that can be arbitrarily set.
- The EXPLORATION_POLICY routine 908 sets the computed i as i* in
step 1104. The EXPLORATION_POLICY routine 908 increments ci* as ci*=ci*+1 instep 1106, and outputs si* instep 1108. - The output of si* can be understood more easily with reference to
FIG. 15 . Though postdecision states 1504 a, 1504 b, and 1504 c can be reached from thepredecision state 1502 by possible different actions, the postdecision state 1504 c is selected according to the computation instep 1104. - Returning to step 1006 in
FIG. 10 , after the EXPLORATION_POLICY routine 908 is completed and s′ is output instep 1006, themain routine 902 pushes s′ onto the stack instep 1008. - Next, in
step 1010, the main routine 902 calls the SAMPLE_POLICY routine 906 by SAMPLE_POLICY(s′) using s′ as an argument. - The SAMPLE_POLICY routine 906 performs a process of selecting one transitionable state based on the combination of a Monte Carlo method and a Markov decision process, in the same manner as the
SAMPLE_POLICY routine 206. Since this process is the same as that shown in the flowchart inFIG. 4 , its description is omitted here. This state selection corresponds to selecting a state 1506 b in thepredecision state 1506 from the state (S1) 1504 c in thepostdecision state 1504 inFIG. 15 . - After the SAMPLE_POLICY routine 906 selects s from s′ in
step 1010, themain routine 902 pushes s onto the stack instep 1012. - Next, in
step 1014, themain routine 902 determines whether or not to stop forward sampling. A criterion for stopping forward sampling is, for example, whether or not states are generated for a predetermined number of stages. Alternatively, the criterion for stopping forward sampling can be whether or not a predetermined time elapses from the start of the process. - In the case where the
main routine 902 determines that the criterion for stopping forward sampling is not met, the main routine 902 returns to step 1006 to call theEXPLORATION_POLICY routine 908. - On the other hand, in the case where the
main routine 902 determines that the criterion for stopping forward sampling is met instep 1014, themain routine 902 goes to step 1016, and pops the state s from the stack. - Next, the main routine 902 calls the UPDATE_VALUE_MIN routine 912 by UPDATE_VALUE_MIN(s) using the popped state s. The following describes a process of the UPDATE_VALUE_MIN routine 912, with reference to a flowchart in
FIG. 13 . - In
FIG. 13 ,step 1302 is a definition step. Instep 1302, the UPDATE_VALUE_MIN routine 912 sets {s′1, s′2, . . . , s′n} as a set of postdecision states directly reachable from s. The UPDATE_VALUE_MIN routine 912 also sets, for i=1, . . . , n, vi as a value of s′i.FIG. 14 shows correspondence between postdecision states and values. - In
next step 1304, the UPDATE_VALUE_MIN routine 912 computes, for i=1, . . . , n, a minimum value of vi as v, according to v=min, vi. Instep 1306, the UPDATE_VALUE_MIN routine 912 stores the computed v as a value of s. In the example inFIG. 15 , supposing that the popped state s is a predecision state (S4) 1514 c,actions 1516 a, 1516 b, and 1516 c are actions that can be taken in the state 1514 c. When the minimum value among the values associated with theactions 1516 a, 1516 b, and 1516 c is the value associated with the action 1516 c, this value is stored in the state 1514 c. - Returning to the flowchart in
FIG. 10 , themain routine 902 determines whether or not the stack is empty instep 1020. In the case where the stack is empty, themain routine 902 determines whether or not a stopping condition is met instep 1026. The stopping condition mentioned here is any of whether or not a predetermined time elapses from the start of the process shown in the flowchart inFIG. 10 , whether or not a loop ofsteps 1002 to 1014 is performed a predetermined number of times, or whether or not a value at the starting point designated by S1 inFIG. 15 eventually has only a change of a threshold or below from a value computed in the immediately preceding loop ofsteps 1002 to 1014. - In the case where the
main routine 902 determines that the stopping condition is met, the process ends. Otherwise, the main routine 902 returns to step 1002, to resume the process from the first step. At this time, the values set in the states in the previous loop are maintained, and put to use in the next computation. - In the case where the
main routine 902 determines that the stack is not empty instep 1020, the main routine 902 pops the state s′ from the stack instep 1022. The main routine 902 then calls the UPDATE_VALUE routine 910 by UPDATE_VALUE(s′) instep 1024, to update the value of s′. The process of the UPDATE_VALUE routine 910 is substantially the same as the process of the UPDATE_VALUE routine 208, which is shown in detail in the flowchart inFIG. 6 . In the example inFIG. 15 , a value of a postdecision state (S3) 1512 b is computed from predecision states 1514 a, 1514 b, and 1514 c. - After
step 1024, the main routine 902 returns to step 1016. As a result, an action sequence of actions 1504 c, 1508 a, 1512 b, and 1516 c is obtained. The main routine 902 calls the output routine 914 to output, for each predecision state, an action a associated with a postdecision state (s, a) having a minimum value among postdecision states directly transitionable from the predecision state. The output result is preferably displayed on thedisplay 114 or written to a file. - Though the above embodiment of the present invention is described using an example of applying to a process of selecting a stock in which the user is to invest in each term with predetermined money in possession, the present invention is not limited to this, and is applicable to any decision making process that involves probabilistic risk computation performed sequentially in time series.
- The present invention is not limited to a specific hardware and software platform of a computer, and can be implemented with any platform.
- The above and other features of the present invention will become more distinct by a detailed description of embodiments shown in combination with attached drawings. Identical reference numbers represent the same or similar parts in the attached drawings of the invention.
- As will be appreciated by one skilled in the art, aspects of the present invention can be embodied as a system, method or computer program product. Accordingly, aspects of the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that can all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention can take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
- Any combination of one or more computer readable medium(s) can be utilized. A computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium can include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium can be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- Computer program code for carrying out operations for aspects of the present invention can be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code can execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer.
- Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer program instructions can also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
- The computer program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams can represent a module, segment, or portion of code, which includes one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block can occur out of the order noted in the figures. For example, two blocks shown in succession can, in fact, be executed substantially concurrently, or the blocks can sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
- The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “includes” and/or “including,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
- The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Claims (7)
1. A system for computing an iterated risk measure, the system comprising:
a generating module for generating sequentially, by way of a Markov decision process based on a Monte Carlo method, a series of data having states on a memory of a computer;
a risk measure module for computing a risk measure of a present object by tracking generated data from opposite order to generation order, wherein said risk measure is calculated from a value at risk or an exceedance probability that is derived from risk measures of a plurality of states transitionable from a state of said present object; and
an executing module for executing said risk measure module while tracking back to starting object.
2. The system according to claim 1 , wherein said risk measure module computes said risk measure using the following expression:
wherein:
X is a random variable;
vi (i=1, . . . , n) is a value of each of said plurality of states transitionable from said state of a present object; and
pi (i=1, . . . , n) is a transition probability of each of said plurality of states transitionable from said state of said present object.
3. The system according to claim 1 , wherein said risk measure module computes said risk measure using the following expression:
wherein:
Pr is a probability;
X is a random variable;
vi (i=1, . . . , n) is a value of each of said plurality of states transitionable from said state of a present object; and
pi (i=1, . . . , n) is a transition probability of each of said plurality of states transitionable from said state of said present object.
4. A system for computing an action that minimizes an iterated risk measure, the system comprising:
a postdecision module for generating, during postdecision, data comprising combinations of a predetermined state and a possible action on a memory of the computer;
a selecting module for selecting a state-action combination data from generated data of said combinations of said state and said action, based on a value associated with each of said combinations;
a predecision module for generating, during predecision, a state from selected state-action combination data, by way of a Markov decision process based on a Monte Carlo method;
a state data sequence module for generating a state data sequence by iterating said step of generating a state and said step of generating data comprising combinations;
a risk measure module for computing, based on risk measures of a plurality of states transitionable from a present predecision state, a risk measure of an immediately preceding postdecision state by tracking generated states in opposite order to order of the generation, wherein said risk measure is calculated from a value at risk or an exceedance probability; and
a value module for setting a value of a state having a minimum value in a present postdecision state to an immediately preceding predecision state, by tracking said generated states in the opposite order to the order of the generation.
5. The system according to claim 4 , wherein said risk measure module computes said risk measure using the following expression:
wherein:
X is a random variable;
vi (i=1, . . . , n) is a value of each of said plurality of states transitionable from said state of a present object; and
pi (i=1, . . . , n) is a transition probability of each of said plurality of states transitionable from said state of said present object.
6. The system according to claim 4 , wherein the said risk measure module computes said risk measure using the following expression:
wherein:
Pr is a probability;
X is a random variable;
vi (i=1, . . . , n) is a value of each of said plurality of states transitionable from said state of a present object; and
pi (i=1, . . . , n) is a transition probability of each of said plurality of states transitionable from said state of said present object.
7. The system according to claim 4 , wherein said selecting module uses an evaluation function which is a monotonically decreasing function with respect to a frequency of visiting said state.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/603,459 US20120330884A1 (en) | 2011-02-15 | 2012-09-05 | Deciding an optimal action in consideration of risk |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2011-029660 | 2011-02-15 | ||
| JP2011029660A JP5759200B2 (en) | 2011-02-15 | 2011-02-15 | Action decision method, program and system |
| US13/371,513 US8972331B2 (en) | 2011-02-15 | 2012-02-13 | Deciding an optimal action in consideration of risk |
| US13/603,459 US20120330884A1 (en) | 2011-02-15 | 2012-09-05 | Deciding an optimal action in consideration of risk |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/371,513 Continuation US8972331B2 (en) | 2011-02-15 | 2012-02-13 | Deciding an optimal action in consideration of risk |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20120330884A1 true US20120330884A1 (en) | 2012-12-27 |
Family
ID=46637676
Family Applications (3)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/371,513 Expired - Fee Related US8972331B2 (en) | 2011-02-15 | 2012-02-13 | Deciding an optimal action in consideration of risk |
| US13/603,459 Abandoned US20120330884A1 (en) | 2011-02-15 | 2012-09-05 | Deciding an optimal action in consideration of risk |
| US14/635,316 Expired - Fee Related US9430740B2 (en) | 2011-02-15 | 2015-03-02 | Deciding an optimal action in consideration of risk |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/371,513 Expired - Fee Related US8972331B2 (en) | 2011-02-15 | 2012-02-13 | Deciding an optimal action in consideration of risk |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/635,316 Expired - Fee Related US9430740B2 (en) | 2011-02-15 | 2015-03-02 | Deciding an optimal action in consideration of risk |
Country Status (2)
| Country | Link |
|---|---|
| US (3) | US8972331B2 (en) |
| JP (1) | JP5759200B2 (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10360509B2 (en) * | 2015-10-19 | 2019-07-23 | International Business Machines Corporation | Apparatus and method for generating an optimal set of choices |
| US11289075B1 (en) * | 2019-12-13 | 2022-03-29 | Amazon Technologies, Inc. | Routing of natural language inputs to speech processing applications |
| CN112270643B (en) * | 2020-09-04 | 2024-06-14 | 深圳市菲森科技有限公司 | A three-dimensional imaging data splicing method, device, electronic device and storage medium |
| JP7565841B2 (en) * | 2021-03-22 | 2024-10-11 | 三菱重工業株式会社 | Analysis device, analysis method, and analysis program |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110246386A9 (en) * | 2008-12-16 | 2011-10-06 | Sean Coleman Keenan | Methods and systems for generating transition probability matrices through an optimization framework |
Family Cites Families (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2002063345A (en) | 2000-08-16 | 2002-02-28 | Hitachi Ltd | Credit risk quantification method and system |
| JP2002092311A (en) | 2000-09-19 | 2002-03-29 | Ibj-Dl Financial Technology Co Ltd | Risk quantification system |
| JP2002230280A (en) | 2001-02-01 | 2002-08-16 | Masaaki Kijima | Integrated interest rate risk and credit risk evaluation system and method |
| JP2002334207A (en) | 2001-05-11 | 2002-11-22 | Keio Gijuku | Portfolio optimization system |
| JP2003108759A (en) | 2001-09-30 | 2003-04-11 | Numerical Technologies Inc | Simulation method of market information and system therefor |
| US7720761B2 (en) * | 2002-11-18 | 2010-05-18 | Jpmorgan Chase Bank, N. A. | Method and system for enhancing credit line management, price management and other discretionary levels setting for financial accounts |
| JP2006309571A (en) | 2005-04-28 | 2006-11-09 | Sumitomo Mitsui Banking Corp | Computer calculation processing method and residual risk determination device |
| JP2009134337A (en) | 2007-11-28 | 2009-06-18 | Toshiba Corp | Risk evaluation apparatus and evaluation method thereof |
| JP2009266184A (en) | 2008-04-23 | 2009-11-12 | Junichi Yamazaki | Markov risk calculation |
| JP2010211588A (en) | 2009-03-11 | 2010-09-24 | Aisin Aw Co Ltd | Device, method and program for supporting operation |
| JP4400837B1 (en) | 2009-03-17 | 2010-01-20 | 国立大学法人北見工業大学 | Bond portfolio control device, bond portfolio control program, and bond portfolio control method |
| JP2011081452A (en) * | 2009-10-02 | 2011-04-21 | Tokyo Univ Of Science | Financial asset display data generation device and program |
-
2011
- 2011-02-15 JP JP2011029660A patent/JP5759200B2/en not_active Expired - Fee Related
-
2012
- 2012-02-13 US US13/371,513 patent/US8972331B2/en not_active Expired - Fee Related
- 2012-09-05 US US13/603,459 patent/US20120330884A1/en not_active Abandoned
-
2015
- 2015-03-02 US US14/635,316 patent/US9430740B2/en not_active Expired - Fee Related
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110246386A9 (en) * | 2008-12-16 | 2011-10-06 | Sean Coleman Keenan | Methods and systems for generating transition probability matrices through an optimization framework |
Non-Patent Citations (6)
| Title |
|---|
| Artzner, P. et al. "Coherent measure of risk." Mathematical finance vol. 9 no. 3 (1999) pp 203-228. [retrieved from onlinelibrary.wiley.com] * |
| Boda, K. et al. "Time consistent dynamic risk measures." Mathematical Methods of Operations Research vol. 63 no. 1 (2006) pp 169-186. [retrieved from link.springer.com] * |
| Chong, E. et al. "Monte-Carlo-based partially observable Markov decision process approximations for adaptive sensing." Discrete Event Systems, 2008. WODES 2008. 9th International Workshop on. IEEE, 2008. * |
| Chong, E. et al. "Monte-Carlo-based partially observable Markov decision process approximations for adaptive sensing." Discrete Event Systems, 2008. WODES 2008. 9th International Workshop on. IEEE, 2008. [retrieved from ieeexplore.ieee.org] * |
| Hardy, M. et al. "The iterated CTE: a dynamic risk measure." North American Actuarial Journal vol. 8 no 4 (2004): pp 62-75. [retrieved from tandfonline.com] * |
| Hardy, M. et al. "The iterated CTE: a dynamic risk measure." North American Actuarial Journal vol. 8 no. 4 (2004): pp 62-75. [retrieved from tandfonline.com] * |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2012168767A (en) | 2012-09-06 |
| US20150178635A1 (en) | 2015-06-25 |
| US9430740B2 (en) | 2016-08-30 |
| US8972331B2 (en) | 2015-03-03 |
| US20120209801A1 (en) | 2012-08-16 |
| JP5759200B2 (en) | 2015-08-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11256990B2 (en) | Memory-efficient backpropagation through time | |
| US9805114B2 (en) | Composable selection model through reusable component | |
| US9430740B2 (en) | Deciding an optimal action in consideration of risk | |
| US8589331B2 (en) | Predicting outcomes of a content driven process instance execution | |
| US9161085B2 (en) | Adaptive timeline views of data | |
| US20040243457A1 (en) | Project estimating system and method | |
| CN110019174B (en) | Data quality determining method and device, electronic equipment and storage medium | |
| US20120072259A1 (en) | Determining optimal action in consideration of risk | |
| US20090172057A1 (en) | Computer system for predicting the evolution of a chronological set of numerical values | |
| US20230206024A1 (en) | Resource allocation method, resource allocation apparatus, device, medium and computer program produ | |
| US20180060729A1 (en) | Apparatus and method for learning a model corresponding to time-series input data | |
| US20230039523A1 (en) | Model update device and method and process control system | |
| US9305259B2 (en) | Apparatus, program, and method for solving mathematical programming problem | |
| Li et al. | Leverage, asymmetry, and heavy tails in the high-dimensional factor stochastic volatility model | |
| US11593822B2 (en) | Method and system for time series data prediction based on seasonal lags | |
| US20140046735A1 (en) | Evaluation value calculation device, evaluation value calculation method, and computer program product | |
| Bertucci et al. | Agents' behavior and interest rate model optimization in defi lending | |
| US20220188678A1 (en) | Computer-readable recording medium storing optimization program, optimization method, and information processing apparatus | |
| WO2016205153A1 (en) | Incremental estimation for probabilistic forecaster | |
| US20210125064A1 (en) | Method and apparatus for training neural network | |
| Zhang et al. | Combine deep q-networks with actor-critic | |
| US20160004982A1 (en) | Method and system for estimating the progress and completion of a project based on a bayesian network | |
| US11790032B2 (en) | Generating strategy based on risk measures | |
| US8994730B2 (en) | Optimizing edge crossing computations when creating a drawing of a directed graph having a minimum number of edge crossings | |
| WO2022195658A1 (en) | Basic salary calculation program |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |