Phoenix Blogs/Python Packages/rep2nb

rep2nb

from Python Packages

python
pip
ipynb
jupyter
repository
utility
package

I Built a Tool That Turns Any Python Repo Into an Executable Jupyter Notebook

Last week I was doing the Tower Research Capital x MIT Limestone Data Challenge — a multi-part quant problem with matrix completion, convex optimization, and trading strategy work. The kind of thing where you naturally end up with a real repo: multiple modules, shared utilities, pipeline scripts, cross-file imports, intermediate outputs, argparse, the whole deal.

Everything was clean. The repo ran perfectly.

Then I saw the submission requirement:

Submit a single .ipynb file.

So I did what everyone does: started manually copying files into notebook cells, reordering code, trying to inline imports, fixing execution order, and debugging all the dumb notebook-specific breakage that had nothing to do with the actual problem.

It sucked.

So I built a tool for it.

rep2nb

rep2nb is a pip-installable package that converts an entire Python repo into a single executable Jupyter notebook.

pip install rep2nb
rep2nb myproject/ -o submission.ipynb

If your repo runs, the notebook should too.

Why this is harder than it sounds

At first glance this seems easy: just grab every .py file and dump them into cells.

That works for toy repos. It breaks immediately on anything real.

The moment you have cross-file imports, scripts that depend on outputs from other scripts, if __name__ == "__main__" blocks, package structure, or multiple entry points, naive copy-paste falls apart.

rep2nb handles the parts that actually make this annoying.

What it does

1. Orders files correctly

If pipeline.py depends on coefficients.py, and coefficients.py depends on matrix.py, those need to execute in the right order in the notebook.

rep2nb parses the repo with Python’s AST, builds a dependency graph, and topologically sorts the files so dependencies run first.

2. Preserves imports across files

In a normal repo, Python’s module system handles imports for you. In a notebook, there are no real files anymore — just a flat execution environment.

rep2nb registers executed file contents into sys.modules, so import x, from x import y, relative imports, and package imports still resolve the way they did in the repo.

3. Handles entry points intelligently

Not every if __name__ == "__main__": block should be executed.

If a file is really a library module, unwrapping that block would run test code or side effects in the middle of the notebook. rep2nb distinguishes between true entry points and support modules. Entry points get unwrapped. Library modules get their guards stripped. You can also override this manually with --entry.

4. Splits multi-project repos into sections

The repo that motivated this had multiple independent sub-projects, each with its own pipeline and helper files.

rep2nb detects subdirectories that should behave like isolated notebook sections and handles them separately. That includes:

  • markdown headers for section boundaries
  • changing directories so relative paths still work
  • clearing sys.modules state between sections
  • cleaning up temp directories after execution

5. Fixes the annoying notebook edge cases

A lot of repo-to-notebook breakage comes from tiny environment assumptions. rep2nb patches the common ones automatically:

  • argparse scripts that choke on Jupyter kernel args
  • code that expects __file__ to exist
  • optional generation of a pip install cell from import analysis
  • extraction of module docstrings into markdown cells
  • automatic README inclusion as notebook header

Stress test

I tested rep2nb on the contest repo that originally annoyed me into building it.

That repo had:

  • 12 Python files
  • 3 independent sub-projects
  • cross-file imports
  • argparse-driven scripts
  • code relying on __file__
  • intermediate file handoffs
  • external dependencies like numpy, pandas, scipy, and torch

The generated notebook ran top to bottom without errors.

That was the bar: not “looks nice,” not “kind of works,” but actually executable on a repo that was annoying enough to justify making the tool in the first place.

What it does not handle

There are still limits, and I’d rather be explicit about them:

  • dynamic imports where module names are computed at runtime
  • subprocess-based Python execution that expects separate files
  • circular imports, which are detected and surfaced as errors
  • non-Python files, aside from listing them so you know what else needs to come along

Usage

CLI

rep2nb myproject/

rep2nb myproject/ \
  --entry pipeline.py \
  --exclude tests/ \
  --include-pip-install \
  -o submission.ipynb

rep2nb contest-repo/ \
  --entry problem-1/pipeline.py \
  --entry problem-2/main.py

Python API

from rep2nb import convert

convert(
    "myproject/",
    output="submission.ipynb",
    entry=["generate.py", "analyze.py"],
    exclude=["tests/"],
    include_pip_install=True,
)

Under the hood

The pipeline is roughly:

  1. discover Python files
  2. detect independent sections vs real packages
  3. parse ASTs for imports, docstrings, definitions, and main guards
  4. build dependency graphs
  5. topologically sort execution order
  6. rewrite imports / guards / runtime assumptions
  7. assemble the final notebook with code cells, markdown cells, and cleanup logic

Get it

pip install rep2nb

I built it because I got annoyed enough doing this by hand once.

Felt like there was a decent chance other people had the exact same problem.

Tye Phoenix