By Jupyter-Is This the Future of Open Science?

Glyn Moody Glyn Moody
May 14, 2019 Big Data, Cloud & DevOps

Taking the scientific paper to the next level.

In a recent article, I explained why open source is a vital part of open science. As I pointed out, alongside a massive failure on the part of funding bodies to make open source a key aspect of their strategies, there’s also a similar lack of open-source engagement with the needs and challenges of open science. There’s not much that the Free Software world can do to change the priorities of funders. But, a lot can be done on the other side of things by writing good open-source code that supports and enhances open science.

People working in science potentially can benefit from every piece of free software code—the operating systems and apps, and the tools and libraries—so the better those become, the more useful they are for scientists. But there’s one open-source project in particular that already has had a significant impact on how scientists work—Project Jupyter:

Project Jupyter is a set of open-source software projects that form the building blocks for interactive and exploratory computing that is reproducible and multi-language. The main application offered by Jupyter is the Jupyter Notebook, a web-based interactive computing platform that allows users to author documents that combine live code, equations, narrative text, interactive dashboard and other rich media.

Project Jupyter was spun-off from IPython in 2014 by Fernando Pérez. Although it began as an environment for programming Python, its ambitions have grown considerably. Today, dozens of Jupyter kernels exist that allow other languages to be used. Indeed, the project itself speaks of supporting “interactive data science and scientific computing across all programming languages”. As well as this broad-based support for programming languages, Jupyter is noteworthy for its power. It enables users to create and share documents that contain live code, equations, visualizations and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, data visualization and machine learning.

In a way, Project Jupyter is the ultimate scientific tool, since it can be used in any discipline and for multiple purposes. As an article in the Atlantic rightly put it, it also can be thought of as the scientific paper taken to the next level by exploiting the possibilities of digital technology. A key aspect is that it’s interactive—readers can use the embedded code to explore the data and carry out limitless “what ifs”. It’s such an obvious idea, you may wonder why it hasn’t been done before. And the answer is that it has, notably in the form of Mathematica from Wolfram Research.

Mathematica is an innovative and powerful program with one huge flaw: it’s proprietary. As such, it suffers all the usual downsides, one of which more or less disqualifies it for science: you can’t check the code. That means you don’t really know why it produces the results it does; you just have to take it on trust. That’s not science; that’s voodoo.

Its closed-source nature means that Mathematica can’t tap into the community of users in the same way open-source projects can. Whatever advantages Mathematica once had, it’s only a matter of time before open-source alternatives like Jupyter surpass it. Indeed, it’s interesting that the Google Trends comparison of searches for Mathematica and searches for Jupyter show that interest in the latter is rising, while Google searches for the former are falling. It’s an inexact metric, of course, but the overall trends are clear: Mathematica, like Microsoft Windows, is the past, and Jupyter, like GNU/Linux, is the future.

It’s not just about the familiar dynamics of open-source development. There’s a key reason why Jupyter has beaten Mathematica, as the academic Paul Romer explained in a perceptive post:

Mathematica failed, despite technical accomplishments, because the norms of its developers clashed so obviously with the norms of its intended users. Jupyter is succeeding because the norms of the community that is developing it are aligned with the norms of its users.

As well as its culture, there’s another aspect of Jupyter that makes it a perfect fit for open science. On the page listing dozens of Jupyter notebooks—all freely accessible—there’s a section titled “Reproducible academic publications“:

This section contains academic papers that have been published in the peer-reviewed literature or pre-print sites such as the ArXiv that include one or more notebooks that enable (even if only partially) readers to reproduce the results of the publication.

Coupled with the transparency of the underlying code, this ability for anybody to check the logic and results of a publication is a real breakthrough in open science. At the moment, most academic papers can be read only superficially. In theory, anyone could set about reproducing the final conclusions—at least, provided the relevant datasets are freely available. Few will take the trouble to do so though, because there are no academic incentives to expend all that time and energy. With papers published not as static documents, but as dynamic Jupyter notebooks with full open datasets, it is possible to check the results properly, as well as to plug in other datasets or tweak the underlying assumptions. In this way, Jupyter notebooks are the perfect marriage of open source, open access and open data. This is exactly how open science should work, but until now almost never does.

The power and flexibility of the Jupyter environment make it a strong foundation for open-ended experimentation of the kind the Free Software community relishes. Moreover, coding in this domain could have a major impact on scientists using the notebook format and on the science they produce. That combination of satisfying intellectual challenge with real-world practical benefits makes it a perfect candidate for open-source coders looking for new and meaningful challenges.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Glyn Moody

    Tags
    Data Science
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Verifiable AI Data: Why It’s Critical for the Automation Revolution

    Verifiable AI Data: Why It’s Critical for the Automation Revolution

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.