Outline

Context and motivations

In a probabilistic model, a rare event is an event having a very small probability. Rare events are important in many areas. For instance, catastrophic failures in transporting systems or in nuclear power plants leading to human losses are represented by rare events. The failure of an information processing system in a bank, or in the communication network of a group of banks, leading to important money losses is also a rare event in the appropriate models. Some low level events in specific communication networks are also rare, or must be. A typical example is the loss of a small unit of information (a packet, or a cell in ATM networks). The loss of such an unit must be rare because the traffic intensities in these transportation systems are extremely high, and thus, if the loss probabilities are not small enough, important losses can occur in a small amount of time. Equivalently, the probability of ruin is an central issue for the overall wealth of a insurance company.

Being able to evaluate the probability of rare events is critical in many applications. Assessing that the probability of failure of control flight systems in aeronautics, for instance, is small enough, is necessary in the design of a new aircraft. The same typically happens in automatic transportation systems. Comparing the order of magnitude of the probability of rare events is needed when the designer must choose among different possible architectures for her future communication network. If the implemented fault tolerant mechanisms must provide very small probability of critical failure, it may happen that system A leads to a probability of failure of around 10^{-8} while system B provides a failure probability of around 10^{-9}; evaluating accurately these numbers is crucial in these decision processes. Similar problems occur for managers of portfolios which have to maintain reserves against rare events such as large losses due to loan defaults; measuring those losses with enough precision and efficiently is thus of high importance.

Forecasting the behaviour of a system where the events of interest are rare is, in general, a formidable task, because different technical problems arise when trying to accurately evaluate very small probabilities. For instance, when using stochastic processes such a discrete state Markov chains to represent the system (either directly or through a high level model representation such a Stochastic Petri Net, a Stochastic Activity Network, etc.), numerical methods when applicable (the size of the state space must be moderate enough) can fail because of the stiffness of the model. From the representation power point of view, the most powerful techniques are simulation ones, and these suffer from a major drawback in rare event analysis: the standard approaches fail because of the rarity of the targets. If the event is rare and the system is simulated, the rarity makes the event difficult or impossible to observe, thus to analyze. This means that more sophisticate techniques must be used. This proposal concerns the development of such methods in the Monte Carlo area. The main idea is to design and evaluate various techniques coming from research work in different areas. This will be in part possible because the participants of the project work in several important domains (communication networks, financial applications, risk analysis in general...) and follow specific approaches in their research work (importance sampling, splitting, particle systems...). Next section describes the planned scientific tasks which follow the preceding lines, and the overall structure of the proposal.

Scientific Objectives -- Activities and Collaborations

The main goal of this proposal is to put together people working on rare event simulation, but using different techniques, and with different application areas. A first step will be to confront/compare the methods. We believe that the project will then allow to come up with a set of new powerful techniques, that will be illustrated on applications in engineering and finance.

For INRIA, the main point is to put in place a collaborative environment in the field and to increase, this way, the visibility of the institute in this important part of the global area of risk analysis, one of our identified strategic goals. Moreover, the interaction with other experts out of INRIA will also increase. The existence of such a group should also have the effect of attracting other colleagues working in the area.

In a nutshell, the different three INRIA groups use different techniques (importance sampling, important splitting, particle systems...) with different types of models (static and dynamic...), for different areas (dependability analysis of computer and communication systems, financial and insurance risk analysis, air traffic management...). The other partners bring still different approaches, while having some methods in common with researchers of INRIA. In several problems many of the different techniques of the partners can be relevant, and a project such as this ARC is a natural environment to compare them, and if possible, to combine the best aspects and possibilities of each of them to derive new approaches.

The activities can be decomposed into two main categories: bringing together the efforts on theoretical methods for simulating rare events, and applying those sets of methods in different domains, including the specific ones of our industrial partners.

Theoretical work

We describe in this subsection the methodological aspects of the work and the links between the different techniques and methods used by the partners.

Importance sampling

Groups involved: Armor, Bamberg, CWI, MathFi, EDF

Importance sampling is a variance reduction technique that basically consists of simulating the system under a change of measure so that rare events occur more often. This results in an estimate that is made unbiased by weighing it with the so-called likelihood ratio (basically the original measure divided by the new one). The choice of a proper measure is a complicated task, since a priori naive changes of measure have been shown to lead to an increase of (and even infinite, in some cases) variance. Several partners have contributed on importance sampling on their own (Armor, Aspi, Bamberg, CWI), without any active collaboration among them up to now. A first objective ois thus to compare our respective previous and current work that follows this approach.

Another objective here is to analyze together other research lines under the same importance sampling umbrella, that have not been explored yet by the different partners. For instance, a new area of research is around adaptive importance sampling, involving updating and learning based on previously obtained samples, simplifying the standard importance sampling approach for a large class of problems, an example being the cross-entropy method. Though, for realistic network models cross-entropy is limited in its current version (it needs to much information to store), and one should also try to combine dynamic/adaptive approaches meaning that the change of measure should not only be updated after some simulation runs but also during each run.

Importance splitting

Groups involved: Armor, Aspi, Bamberg, DGAC

An alternative approach to importance sampling is importance splitting, in which a sequence of increasingly critical events is defined and independent simulations are run: successful trajectories, that is, those that have managed to reach a given critical set, are replicated, in the form of a number of offsprings, which in turn are run independently until they reach a higher critical set or until a prescribed final time has occurred. Splitting can be achieved in many different ways: in fixed splitting for instance, each successful trajectory is given a prescribed deterministic number (possibly depending on the generation number) of offsprings, whereas in fixed effort splitting, there is a prescribed deterministic number of trajectories alive at each generation, which amounts to sample with replacement from the successful trajectories at the current stage of the algorithm, and is therefore very similar to the method described in the next section.

No matter which splitting strategy is used, the probability of the rare event is estimated as the product of the fraction of successful trajectories, that have managed to reach the higher critical set at each generation. The method has become popular under the name of restart algorithm, but very little is known about its actual performance.

Armor group has been active in this field with recently a special focus on the combination of this technique with randomized quasi-Monte Carlo methods applied to Markov chains, while Aspi is looking at adaptive methods to define thresholds, a critical aspects of the approach. The idea here is to compare and combine the different approaches that have been explored by the two groups.

Interacting particle methods

Groups involved: Armor, Aspi

Particle methods, which are Monte Carlo methods based on interacting particle systems, provide efficient numerical solution to many problems in nonlinear estimation. This not only holds in Bayesian filtering and its spectacular applications to localisation, navigation and tracking, but also in simulating rare events for the purpose of risk assessment, with applications to reliability and safety of complex systems. Intuitively speaking, particles explore the state space by mimicking the evolution of an underlying random process, learn the environment by evaluating a fitness function, and interact so that only the most successful particles (in view of the value of the fitness function) are allowed to survive and to get offsprings at the next generation. The effect of this mutation/ selection mechanism is to automatically concentrate particles (i.e. the available computing power) in regions of interest of the state space.

It appears that importance splitting can be interpreted in terms of Feynman--Kac flows, which makes it possible not only to approximate the probability of the rare but critical event, but also to learn which critical trajectories are responsible for the critical event to occur. Following this approach, many asymptotic results as the sample size N goes to infinity have already been obtained. A specific issue in implementing importance splitting schemes is that it could happen that none of the N simulated trajectories is able to reach the next intermediate set, with the result that the particle system dies out and the algorithm stalls. An alternative is to design fixed performance strategies in which a random number of trajectories is simulated, until a prescribed number H of successful trajectories is obtained, as opposed to usual fixed effort strategies, in which a prescribed number N of trajectories is simulated.

Challenging issues to be investigated here include the automatic selection of the intermediate sets and their number: asymptotic results have been obtained in the one--dimensional case, while in the multi--dimensional case a preliminary objective would be the efficient choice of the importance function h, used to define the intermediate sets as level sets B_L = {x: h(x) <= L}.

Another idea would be to use here the concept of adaptive redistribution, routinely used in particle filtering, without any clear mathematical justification: independent trajectories would be simulated, as long as an appropriate criterion (interpreted as the normalisation constant in the Feynman--Kac formulation) remains below a given threshold, and would be replicated/ terminated when the monitored criterion would exceed this threshold.

As stated before, in many problems these techniques can be put in competition with other methods, and the ARC context is the ideal one for comparing and analyzing the associated performances and the possibilities of combining the estimation procedures.

Applications

The areas of application considered in our project fall into two categories: engineering (telecommunications, dependability analysis) and finance. Possible crossed use (methodology of a partner applied to the field of others) will be carefully studied.

Telecommunications

Groups involved: Armor, Aspi, Bamberg, CWI

Telecommunication networks have evolved in the last decades from the traditional circuit-switched telephone network to the datagram-based Internet. From a modelling point of view, the traditional Poisson assumption is not valid anymore, the arrival process for example being showed to be often a heavy-tailed one. This requires a special focus on rare events involving heavy-tailed random variables. While such a work has been initiated when using importance sampling, it is still in its early stages, and thus requires further developments. Applying importance splitting and particle systems analysis in this context is also of interest. For instance, fluid models have recently been designed to represent the traffic of packets transported in a network as continuous flows; Monte Carlo approaches inside this framework pose major theoretical problems that have to be solved. Similarly, research in communication networks deals with new architectures such as DiffServ (for differentiation of services) in the Internet, requiring new multi-class analysis; something similar happens when studying 3G wireless systems using CDMA technology. This results in a need to adapt the methodologies to those specific contexts and to compare their respective performances. For instance, and to be more specific, there is a necessity to extend the methods proposed for single FIFO queues to queues equipped with specific scheduling mechanisms (processor sharing, priority, etc.). The experience of CWI on the corresponding queuing models will be helpful.

Dependability analysis

Groups involved: Armor, Aspi, Bamberg}

In the dependability context, a system designer often needs to configure the system so that over its mission time (0,t) the probability that it is always operational (its reliability at time t) is very close to 1. We remind here that even under Markovian assumptions on failures and repairs, such models are in many cases very difficult (or impossible) to solve using numerical techniques due to the problem of state space explosion. For generally distributed failures and repairs times, simulation may be the only feasible approach to estimate a solution. Observe also that the same combinatorial explosion occurs in static problems where the time variable plays no specific role (the so-called network reliability area). Monte Carlo remains the only method able to evaluate even moderate size models.

The Armor group, the CWI and the University of Bamberg have been active in Monte Carlo techniques, and the Aspi team has also recently been involved in importance splitting, all applied in particular to this context. A first issue will be to confront the respective methods. Another issue will be to combine them with the new quasi-Monte Carlo simulation technique that has been shown to yield spectacular variance reductions (up to more than 10^5) on queuing analysis examples.

Insurance risk

Groups involved: CWI, MathFi

A risk reserve process describes the time evolution of the reserves of an insurance company. There is some initial reserve R_0; the reserve decreases every now and then due to incoming claims (for instance according to a compound Poisson process), while there is a steady increase due to the premiums paid by the customers (for instance linear in time). One of the main questions is to estimate the probability of ruin, i.e., the probability of the risk reserve process hitting 0. This process is closely related to that of a queuing process hitting a high level, but with `the picture turned upside down'. Compared to telecommunications applications, however, different types of input processes play a role; emphasis is on Levy processes, and on Markov-modulated Levy processes (also called Markov additive processes). There are interesting relations to option pricing as well. Expertise at INRIA and CWI will facilitate the development of efficient yet adequate simulation techniques. Extension of change-of-measure results to the setting described above requires both mathematical analysis (proofs of efficiency properties of the estimators proposed), as well as extensive testing.

Value--at--risk

Groups involved: Aspi, EDF, MathFi

To measure the risk in a portfolio of assets, simulation methods are very useful in estimating the profit and loss PL distribution, and in computing various risk measures that summarize the information contained there. For a given portfolio, the PL distribution is the probability distribution of variations in the value of the portfolio between now and a future time instant, resulting from variations in the market prices and rates. A special attention is given to the problem of estimating the probability of large losses, above a given threshold, which implies the simulation of rare but significant events. Indeed, for large and complex portfolios of assets held e.g. by large financial institutions on multiple markets, it is not possible in general to compute explicitly the PL distribution, and a possible if not unique alternative is to design efficient Monte Carlo methods, which is precisely the objective of this proposition. Among the many available risk measures associated with the PL distribution, the most popular two are the value--at--risk (VaR) at level p, i.e. the p--quantile x_p defined by the relation P[L>x_p] = p, and the conditional value--at--risk (CVaR), i.e. the expected loss E[L | L>x_p] conditioned on exceeding the level x_p. Exploring the tails of the PL distribution can be formulated in terms of Feynman--Kac flows, and the objective here is to design efficient Monte Carlo methods based on interacting particle systems, as introduced above, to approximate the probability of exceeding a given threshold (VaR), and to generate a sample (approximately) distributed according to the PL distribution conditioned on exceeding this threshold (CVaR).

Since the deregulation of the electricity sector in Europe, several trading places for electricity markets have appeared, and it is proposed to take advantage of existing scientific collaborations with the OSIRIS department (optimisation, simulation, risk analysis and statistics) at EDF to focus on the electricity area, as a motivating example and a source of actual challenging problems. EDF takes an active part in these trading places for various reasons (hedging, contracts indexed on market prices, etc.), and is naturally exposed to high price fluctuations, resulting from high volatilities and high prices spikes (which are specific features of electricity prices). As a result, EDF has to design procedures to monitor and manage market risks, at the global corporate level and at the subsidiaries and branches levels as well. The procedures are naturally based on the abovementioned risk measures (VaR, CVaR, etc.), and the challenge for EDF is to design fast and accurate numerical methods for these risk measures, together with error estimates and reliable confidence intervals.

Air traffic management

Groups involved: Aspi, DGAC

The study of rare events is an important area in the analysis and prediction of major risks in the field of air traffic management. As an example, assessing the safety of autonomous en--route operation as a function of traffic density above the current average density level, under various weather conditions, is an important issue for the future of air traffic management. An intrinsic difficulty in quantifying the collision risk between aircrafts is that an air traffic management system is highly distributed: it includes automated systems interacting with one another, along with pilots and controllers. To handle this type of uncertainties, stochastic hybrid systems have been proposed within the HYBRIDGE European project, which extend the piecewise--deterministic Markov processes.

To speed up Monte Carlo simulations, a major issue is to extend the interacting particle methods to stochastic hybrid systems. A first attempt in this direction has shown the need to couple importance splitting with importance sampling, because of the occurrence of jumps. Other objectives here are (i) to mix advanced Monte Carlo simulation speed--up techniques with Markov Chain Monte Carlo (MCMC) algorithms, in order to assess the sensitivity of collision risk to multiple aircraft encounter geometries, and (ii) to extend the current existing techniques for pairs of aircrafts to situations with more than two units.