I have had a lot of fun with Monte Carlo simulations today.

I used python to code the sim. I plotted the results over time, varied the sample size, the various input variables and observed.

I had used Monte Carlo previously for estimating staffing for a project, but I got interested in it again after reading Fooled by Randomness.

I like Taleb’s style of writing and his habits of mind relative to probability rather than determinism.
He described Monte Carlo as a **toy** that is lots of fun. That statement made me go re-look at Monte Carlo simulations, and sure enough it is almost as much fun as a trip to Home Depot or Lowes.

My next projects on the Monte Carlo simulation list will include:

- Optimal staffing for a customer project (for use during pursuits)
- Testing various ways of organizing work flow for courseware development to see if there is an optimal flow
- Play out comparisons of Agile Lean versus traditional project management approaches to show experimental data
- Run Monte Carlo for the likely range of Net Present Value (NPV) of a project while we’re early in pursuing it to decide whether or not to continue to pursue it or to walk away from it

## Why Bother

Many technology-heavy development processes have become too complex for a reliable analytical (deterministic) estimates, using networks of suppliers and multiple variables to consider. We can use a Monte Carlo model to evaluate our plan numerically. We can change the initial conditions for various contract options, we can ask ‘what if’ and see many ‘future scenarios’ that give a sense of the overall results.

Because our success depends on making good forecasts and managing activities that involve uncertainty, we benefit from using Monte Carlo simulations.

Learning something new may seem difficult or costly because it does cost some effort, but it is more costly to have real projects come in later than we expected, or have margins smaller than estimated as “worst case,” or productivity forecasts or staffing profile forecasts based on averages not work out as averaged.

Why would we want to do a Monte Carlo Simulation? There could be hundreds of reasons and thousands of examples, but they all reduce down to one thing: predicting performance without conducting hundreds of experiments or building thousands of samples. ~Alan Nicol

With Monte Carlo Simulations we can model what it will take to execute our solution design, before we finalize the design, and predict if or how often key elements might end up with unacceptable outputs.

Additionally, once the models are set up, we can reuse them with only slight configuration changes for other projects.

## The Model

For the model, we will think of our teams of skilled people and the processes we use as a ‘system’. We’ll need to build a representation of our system’s behavior. It is worth repeating to audiences used to building courseware for a software or hardware system or product that here I mean our people and process as the ‘system’ and not the product as the system. To start we need a good understanding of process steps that convert our inputs to outputs in our modeled development process. Rather than using the average time for each input step, we will need an estimate for the probability distribution for each input.

This is not new, it is like using Microsoft Project. You list each of the tasks necessary to achieve a particular deliverable or component of a deliverable. Some organizations even go so far as to use PERT analysis using worst case, most likely case, and best case.

So if the deterministic project plan says step 2 averages 15 hours, then we have to add our risk component. The probability of step 2 may be bounded by 11 hours (so called lower limit) to 19 hours (so called upper limit) with our most likely time of 15 hours. So we might use a triangular probability distribution to represent this since that may be all we have for estimating the times for step 2. The computer then assigns the input hours for step 2 randomly somewhere between 11-19 hours. Simple enough.

Let’s consider a training example, what if step 6 in the courseware development process involves the use of a 3D immersive environment for learning. Building this involves using game engines, motion capture, modeling of the environment in which the learners will work and train. Perhaps we have lots of historical data on what it takes to create avatar animations by hand, manually key framing. Let’s say we rented a motion capture studio to gather data on using the motion capture production pipeline to create the same avatar animations. Monte Carlo lets us run many more experiments than our budget allows to validate that mocap is faster than manual key framing. So now we can convince the company to buy motion capture equipment. Or if we already have the mocap studio in place, the data allows us to add in complexities and uncertainty for the post capture session clean up steps that vary depending on how often and how the actors obstructed the camera’s view of the markers on their suits. Marker occlusions can take a lot of time to clean up. We could use Monte Carlo simulation to estimate the best number of cameras for a 4-actor capture volume, for example.

And finally, we need to know the range of acceptable outputs. Here, I mean process cycle time, not the quality of the content or the effectiveness of the intervention.

Note that sometimes it is not possible to cost-effectively get an underlying distribution for an input variable to build our probability-based simulation model. In that event, consider bootstrapping the input parameters. Record all assumptions made.

## Goldilocks - What We’re Trying to Find Out

If I know we want to estimate the staffing needed for project A, and I know what is involved in project A, then I can estimate the project A steps, the staffing required for each step, the uncertainty for each step, and get a sample of simulated cycle times with various staffing levels. We’re looking for the Goldilocks ranges.

- What is too few making the project delivery too slow?
- What is just right?
- What is too many, making our costs eat the margins on project A?

## Target – Testable Prediction

We form a testable prediction based on our deterministic estimation process and historical data. We use that target to design the Monte Carlo experiments by large numbers of simulations.

## Data Collection

The experiments will produce numeric data. To get statistically valid results we need large sample sizes. This is expensive to get using your teams and customer projects, and cheap to gain using Monte Carlo simulation. So we tell the computer to run our simulation over and over automatically. We may repeat the experiment 10,000 to 100,000 times. This simulation helps us inexpensively collect data that reflects our processes using the given inputs. This experiment data either validates our prediction, helping price the project accurately, or the data suggests a different staffing profile to hit intended cost, delivery time, and quality contract parameters. In the end we want to price the project in a accurate way so we have a better chance of financial performance than our previous predicting using averages has provided.

Then we decide how confident we are with the experiment results and adjust our decision making accordingly.

Be careful to record your assumptions so when project A is complete that you can compare the one outcome (the actual historical data) to the range of possible outcomes the Monte Carlo simulation provided.