Spreadsheets are "the dark matter of business software": they're everywhere, they're invisible, and they hold everything together. Business and finance run on spreadsheets; no other software tool has empowered so many people to build solutions to so many different problems. In this context, you have to understand any assertion that "Jupyter is the new Excel" as intentionally sensational.
Jupyter notebooks do, however, share some key similarities with Excel spreadsheets. Notebooks are ubiquitous in scientific and statistical computing, in the same way that spreadsheets dominate business operations and front-office finance. In this post, we'll explore some philosophical and practical similarities and differences between the two tools in an attempt to explain why both have such passionate fans and critics.
similarities: the positives
- Superficially, both Jupyter notebooks and Excel spreadsheets use "cells" as a visual metaphor for breaking an analysis into discrete steps. Cells in both formats contain code and show results.
- Both are designed for interactive, iterative, exploratory analysis, combining computation with data visualizations.
- Both aim to have a shallow learning curve for beginners.
- Both are designed to be self-contained and easy to share. Online environments like Google Colab and JupyterHub abstract away the often-complex Python setup process.
- Both have a strong hold on higher education in their respective fields. Business schools almost universally teach financial modeling with Excel, and STEM departments usually teach data analysis with Jupyter notebooks1. New graduates bring their familiarity with these tools into the workplace.
similarities: the negatives
Both Excel spreadsheets and Jupyter notebooks are criticized by software engineers as not being "real software". Aside from the obvious limitation that both artifacts require another program to run, they also make it difficult to adhere to software engineering best practices:
- As large, monolithic files, they're difficult to version control with developer tools like git. Office OpenXML documents are zipped, which "scrambles" the file contents so that git can't track changes to the underlying data. Jupyter notebooks are really just large JSON files, but cell output and execution count changes introduce superfluous deltas2.
- Both Excel spreadsheets and Jupyter notebooks are difficult to productionalize, although both tools do get used in production in practice. Excel and Jupyter are heavy execution environments that introduce their own dependencies and seem wasteful to engineers used to writing standalone scripts.
- Both are error-prone and difficult to test. The fact that both platforms cater to users with less experience writing code gives them a reputation for creating solutions riddled with bugs. In reality it might just be that, without tools like unit testing or a culture of quality control, bugs in spreadsheets and notebooks are more likely to make it into production.
differences
- Excel makes it easier for non-programmers to understand how data flows between cells.
- Excel's grid provides a natural way to reference data via cell coordinates, whereas Jupyter relies on named variables, forcing users to confront the reality that naming variables is hard.
- It's easier to inspect intermediate results of multi-step calculations in Excel because the cells are right in front of you. Print statements in Jupyter notebooks require more effort to set up and execute.
- Excel is self-contained; Jupyter's value lies in Python's package ecosystem.
- Python's reliance on external libraries makes it easier for IT departments to restrict Jupyter's use.
- Both installing Jupyter locally and running notebooks over a network require more setup than opening Excel.
- Most Excel spreadsheets only use functions that ship with Excel, which means that a business contact can just open your model, modify it, and run it. Notebooks are difficult to share outside an organization, and even within one, because they're so tied to a specific Python environment and Python environments are difficult to set up.
- Excel can function as a "poor man's database", storing tabular data across multiple sheets and providing OLAP-like capabilities via PivotTables. Jupyter notebooks usually load data from an API or shared file location, another reason why they're not as self-contained.
- "Fudging the numbers" is easier in Excel than in Jupyter. Spreadsheets update in real-time without having to re-run code or set up interactive widgets. One-off changes are easier to make, which matters when speed is of the essence.
- Working with code is unavoidable in Jupyter, but Excel can be used entirely through a GUI: there are even menus to select functions in cell formulas.
- Jupyter is more open-ended and flexible, but it requires more technical knowledge to use effectively.
- Jupyter has a stronger emphasis on narrative and storytelling than Excel.
- Jupyter notebooks are designed for literate programming, where code and prose are interspersed to create a narrative flow.
- Reporting and presentation in Excel typically relies on either copy/paste or integrations with PowerPoint.
implications
Microsoft's efforts to integrate Python into Excel won't significantly erode Jupyter's dominance in scientific and technical computing. Spreadsheets lack a natural narrative structure, which makes them less suitable for education and reproducible research. Moreover, the "open science" community will never adopt a closed-source tool built by an American tech giant.
Tools and "best practices" will emerge to mitigate the operational disadvantages of Jupyter notebooks3, just as they have for spreadsheets. Most front-office users will ignore such guidelines4, engendering ongoing tension with IT departments. Having witnessed how things turned out with Excel, many IT departments view supporting Jupyter like opening a Pandora's box of security vulnerabilities and maintenance headaches.
Both platforms will survive into the foreseeable future. Neither will supplant the other because they target user bases with fundamentally different skill sets. People working at the intersection of quantitative modeling and business decision-making will continue to need familiarity with both tools.
conclusion
Use the tool that best fits into the culture of the organization in which you're solving problems. There are situations where technical requirements will force you to use one tool over the other, just as there are organizations that will only allow you to use one tool or the other. If you work in an Excel-dominated field and do need Python's capabilities, in my experience it's easier to read and write Excel spreadsheets from Python code than it is to get Excel users to open a Jupyter notebook.
Software engineers and IT departments worldwide will continue to look down on Jupyter notebooks, just as they have done with spreadsheets for decades. The fact that MBA-types don't use Jupyter notebooks makes it easier for IT to enforce draconian restrictions on their use. Ironically, many front-office users may only gain access to Python once Microsoft finishes integrating it into Excel.
-
Some holdouts still use MATLAB, R, SPSS, or SAS, but hefty licensing fees will continue to push users towards free and open-source alternatives over time. Capturing the education market is a key part of the business strategy for firms like MathWorks, but it's unlikely they'll hold on forever.??
-
Tools like nbdime can help with version control for Jupyter notebooks, but using them adds another layer of complexity.??
-
Tools like papermill aim to streamline running notebooks in production environments. Cloud providers also support creating pipelines involving Jupyter notebooks in production.??
-
How many people have even heard of the FAST standard for building spreadsheets???
The above is the detailed content of Jupyter Notebooks Are Like Spreadsheets. Learn Both.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Python's unittest and pytest are two widely used testing frameworks that simplify the writing, organizing and running of automated tests. 1. Both support automatic discovery of test cases and provide a clear test structure: unittest defines tests by inheriting the TestCase class and starting with test\_; pytest is more concise, just need a function starting with test\_. 2. They all have built-in assertion support: unittest provides assertEqual, assertTrue and other methods, while pytest uses an enhanced assert statement to automatically display the failure details. 3. All have mechanisms for handling test preparation and cleaning: un

PythonisidealfordataanalysisduetoNumPyandPandas.1)NumPyexcelsatnumericalcomputationswithfast,multi-dimensionalarraysandvectorizedoperationslikenp.sqrt().2)PandashandlesstructureddatawithSeriesandDataFrames,supportingtaskslikeloading,cleaning,filterin

Dynamic programming (DP) optimizes the solution process by breaking down complex problems into simpler subproblems and storing their results to avoid repeated calculations. There are two main methods: 1. Top-down (memorization): recursively decompose the problem and use cache to store intermediate results; 2. Bottom-up (table): Iteratively build solutions from the basic situation. Suitable for scenarios where maximum/minimum values, optimal solutions or overlapping subproblems are required, such as Fibonacci sequences, backpacking problems, etc. In Python, it can be implemented through decorators or arrays, and attention should be paid to identifying recursive relationships, defining the benchmark situation, and optimizing the complexity of space.

To implement a custom iterator, you need to define the __iter__ and __next__ methods in the class. ① The __iter__ method returns the iterator object itself, usually self, to be compatible with iterative environments such as for loops; ② The __next__ method controls the value of each iteration, returns the next element in the sequence, and when there are no more items, StopIteration exception should be thrown; ③ The status must be tracked correctly and the termination conditions must be set to avoid infinite loops; ④ Complex logic such as file line filtering, and pay attention to resource cleaning and memory management; ⑤ For simple logic, you can consider using the generator function yield instead, but you need to choose a suitable method based on the specific scenario.

Future trends in Python include performance optimization, stronger type prompts, the rise of alternative runtimes, and the continued growth of the AI/ML field. First, CPython continues to optimize, improving performance through faster startup time, function call optimization and proposed integer operations; second, type prompts are deeply integrated into languages ??and toolchains to enhance code security and development experience; third, alternative runtimes such as PyScript and Nuitka provide new functions and performance advantages; finally, the fields of AI and data science continue to expand, and emerging libraries promote more efficient development and integration. These trends indicate that Python is constantly adapting to technological changes and maintaining its leading position.

Python's socket module is the basis of network programming, providing low-level network communication functions, suitable for building client and server applications. To set up a basic TCP server, you need to use socket.socket() to create objects, bind addresses and ports, call .listen() to listen for connections, and accept client connections through .accept(). To build a TCP client, you need to create a socket object and call .connect() to connect to the server, then use .sendall() to send data and .recv() to receive responses. To handle multiple clients, you can use 1. Threads: start a new thread every time you connect; 2. Asynchronous I/O: For example, the asyncio library can achieve non-blocking communication. Things to note

Polymorphism is a core concept in Python object-oriented programming, referring to "one interface, multiple implementations", allowing for unified processing of different types of objects. 1. Polymorphism is implemented through method rewriting. Subclasses can redefine parent class methods. For example, the spoke() method of Animal class has different implementations in Dog and Cat subclasses. 2. The practical uses of polymorphism include simplifying the code structure and enhancing scalability, such as calling the draw() method uniformly in the graphical drawing program, or handling the common behavior of different characters in game development. 3. Python implementation polymorphism needs to satisfy: the parent class defines a method, and the child class overrides the method, but does not require inheritance of the same parent class. As long as the object implements the same method, this is called the "duck type". 4. Things to note include the maintenance

The core answer to Python list slicing is to master the [start:end:step] syntax and understand its behavior. 1. The basic format of list slicing is list[start:end:step], where start is the starting index (included), end is the end index (not included), and step is the step size; 2. Omit start by default start from 0, omit end by default to the end, omit step by default to 1; 3. Use my_list[:n] to get the first n items, and use my_list[-n:] to get the last n items; 4. Use step to skip elements, such as my_list[::2] to get even digits, and negative step values ??can invert the list; 5. Common misunderstandings include the end index not
