Modelling and Simulations in Data Science

Pier Paolo Ippolito Pier Paolo Ippolito
February 1, 2021 AI & Machine Learning

Using Data Science and Machine Learning even when there is no data available.

Introduction

One of the main limitations of the current state of Machine Learning and Deep Learning, is the constant need for new data. Although, how it can be possible then to make estimates and predictions for situations in which we don’t have any data available? This can in fact be more common than we would normally think.

As an example, let’s consider a thought experiment: we are working as a Data Scientist for a local authority and we are asked to find a way in order to optimise how the evacuation plan should work in case of a natural disaster (e.g. volcanic eruption, heartquake, etc…). In our situation, because natural disasters don’t tend to happen too frequently (fortunately!), we don’t have any data currently available. At this point, we decide to create a model able to summarise to a good extent the key characteristics of our real world and use it to run different simulations (from which we can then get all the data we need). There are two main types of programmable simulation models:

  • Mathematical Models: make use of mathematical symbols and relationships in order to summarise processes. Compartmental Models in Epidemiology are a typical example of mathematical models (e.g. SIR, SEIR, etc…).
  • Process Models: are based on a list of steps handcrafted by the designer in order to represent an environment (e.g. Agent-Based Modelling).

Modelling and Simulations are used in many different fields such as finance (e.g.Monte Carlo Simulations for Portfolio Optimization), medical/military training, epidemiology and threat modelling [1, 2].

Some of the main uses of simulations are to verify analytical solutions, experiment policies before creating any physical implementation and understand the connection and relative importance of the different variables composing a system (e.g. by modifying input parameters and examining the results). As a result, these properties make the Modelling and Simulations paradigm a white-box approach to predict future trends.

Once having run many different simulations and tested all the different possible scenarios, we can then make use of the generated data in order to train our Machine Learning model of choice to make predictions in the real world.

As part of this article, I will now introduce you to different possible approaches you might want to take in order to get started with Modelling and Simulations in Python. All the code used throughout this article, can be found on my GitHub account.

Agent-Based Modelling

In Agent-Based Modelling, we make use of an Objected Oriented Programming (OOP) approach in order to create a class for each different type of individual we want to have in our artificial environment and we then instantiate as many agents as we want. These agents are finally placed in a virtual environment and let them interact with each other and the environment in order to record their actions and simulation outcomes. Two possible ways in order to create Agent-Based Modelling simulations in Python is to make use of either Mesa or HASH. For not Python users, AnyLogic and Blender can be two great free alternatives.

Mesa

In order to demonstrate some of the key capabilities of Mesa, we are now going to create a model in order to simulate a fire spreading in a forest. Additional examples are available in Mesa official repository.

First of all, we need to import all the necessary dependencies.

https://gist.github.com/pierpaolo28/2be34d745a3db73b1d407b46a1e0709b#file-sim-py

Now, we are ready to create a Python Class (Tree), which will be used to create our agents in the simulation. In this case, our trees could be in one of 3 possible states: Healthy, Burning or Dead. We will start our simulation with a small number of Burning trees at the edge of the forest and this number will then vary depending on how close the other trees are from the ones already burning. Once a tree will have made all its neighbouring trees catch fire, it will then become Dead.

https://gist.github.com/pierpaolo28/716aed741d659e9225c12cdecdd4426d#file-sim2-py

We can then move on and design the world our trees will be situated in (ForestModel). Using a probability value (prob), we can additionally vary the likelihood to have a tree on each cell (to regulate how densely populated is the forest).

https://gist.github.com/pierpaolo28/a45d31155eaad9211adcf4219c961e4a#file-sim3-py

Finally, we need to create two helper functions in order to get statistics from the simulation and create a data-frame for plotting.

https://gist.github.com/pierpaolo28/d3b4f67e7833843f0b95748b0d144587#file-sim4-py

Once having designed our classes and functions, we are now ready to run our simulation and store all the generated data.

https://gist.github.com/pierpaolo28/299adb8c7092205a3a79e97a59c71bd6#file-sim5-py

There are now two possible ways in order to visualize the results from our simulation. We can either create our plotting utilities using Python (as shown in Figure 1) or we can make use of MESA visualization capabilities.

All the code used in order to create Figure 1, is available at this link (feel free to interact with the plot below!).

Forest Fire Simulation
Figure 1: Plotly Time Series Results

Using MESA visualization capabilities, we can then create this same plot (Figure 2) and launch it on a webpage at this www.experfy.com address: http://127.0.0.1:8521/

https://gist.github.com/pierpaolo28/9b6b7eec63fd754880931bd7bab32d66#file-sim6-py

Modelling and Simulations in Data Science
Figure 2: MESA Time Series Chart

Furthermore, we can also create a 2D representation of how the fire spread in our forest (Figure 3). Running multiple simulations with different sizes for the forest and density of tree population, we would then be able to create enough data to perform a Machine Learning analysis.

Modelling and Simulations in Data Science
Figure 3: 2D view of the fire spreading through the forest

HASH

HASH is a free platform which can be used to quickly create highly parallelizable Agent-Based Simulations in either Python or Javascript. A large number of examples are freely available on the platform and they can also be forked to use as a base for your own projects.

For instance, it is already available a model quite similar to the one we just coded in MESA (Wildfires — Regrowth, Figure 4). Adjusting the different parameters of the model, it could then be possible to obtain similar results to the ones we had before.

Forest Fire Simulation in HASH
Figure 4: Forest Fire Simulation in HASH

If you are interested in learning more about how to create simulations with HASH, their documentation is a great place where to start.

Mathematical Models

These type of models are typically designed using either ordinary differential equations or stochastic elements as well (e.g. SIR model in Figure 5).

SIR Model Equations - Modelling and Simulations
Figure 5: SIR Model Equations

Diagrams representations of this type of models can be of great help in order to understand how the model equations work and what are the possible movements between different the different allowed states (Figure 6).

SIR DIagram Representation - Modelling and Simulations
Figure 6: SIR DIagram Representation

Some quite famous Mathematical Models in Epidemiology are the SIR (Susceptible-Infected-Recovered) model and all the other models which can be derived from it (e.g. SEIR, Vaccination, Time-Limited Immunity). If you are interested in finding out more about this type of models, this article by The Washinton Post is a great place where to start.

It can then be possible to code this type of model in either plain Python or making use of advanced mathematics packages such as Scipy and Simpy. For example, you can find below a summary video of a my past project in which I created a dashboard to analyse COVID-19 trends over the last few months. All the code used for this project is available on my GitHub account

Bibliography

[1] Defence — Advanced multi-domain synthetic environmentsImprobable.io. Accessed: https://improbable.io/defense August 2020.

[2] Threat Modeling Security FundamentalsMicrosoft Learn. Accessed: https://docs.microsoft.com/en-us/learn/paths/tm-threat-modeling-fundamentals/ August 2020.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Pier Paolo Ippolito

    Tags
    Data ScienceDeep learningMachine LearningProgramming
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Remote Work, Leadership and Cyber Security

    Remote Work, Leadership and Cyber Security

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.