A companion to my blog post on Rowan: How to Make a Great Open-Source Scientific Project.

Most people write Python code without ever setting up a proper project. Python is widely used for scripting and solving problems, and thus many programmers bring a scripting mentality to their software development. Because of this, many projects are poorly organized, difficult to maintain, and impossible to reuse as libraries in other projects. The mess that is Python package management compounds these problems (see my previous blog post), and since Python tooling is incredibly fragmented, it can often be difficult to know how to best set up a project.

I've maintained cookiecutters for the past four years with my opinions on how to best set up projects (with ones for pipenv, poetry, pixi, and uv), and wanted to share my reasoning behind why I organized these cookiecutters how I did, and why you should do the same.

Project layout

There are two main ways for laying out packages, src/ and flat. One of the major reasons people tend to gravitate toward using src/package_name instead of just package_name is that it isolates project code. From the Python packaging user guide:

The src layout requires installation of the project to be able to run its code, and the flat layout does not. … [The flat layout] can lead to subtle misconfiguration of the project's packaging tooling, which could result in files not being included in a distribution.

Most people never need to worry about these considerations, and a lot of major packages use the flat layout (e.g. NumPy). I prefer the simplicity of the flat layout, but ultimately this often comes down to personal choice as opposed to there being an obvious right answer. My preferred layout for projects is as follows:

- {package_name}/  # Source code
- notebooks/       # Jupyter/marimo notebooks
- results/         # Output from running the package
- scripts/         # Useful scripts for managing the package
- static/          # Holds input data
- tests/           # Unit tests
- .github/         # GitHub workflows, templates, etc.
- LICENSE          # License indicating how code can be used
- prek.toml        # Git hooks for pre-commit/pre-push
- pyproject.toml   # Package setup, tool management, etc.
- README.md        # Information about your package and how to use it
- {lockfile}.lock  # Precise set of packages to load
- .coveragerc      # Information on how to assess code coverage
- .editorconfig    # Information for the editor on how to format files
- .env             # Private environment variables (add to .gitignore)
- .envrc           # Handles automatic loading of the environment
- .gitignore       # Tells Git which files to ignore (e.g. .env)

Package manager

Package managers download packages from the internet so that they can be used by your application. Good package managers allow you to download your set of packages on a per project basis (with smart caching under the hood), isolating the dependencies of each project from each other. Installing packages system-wide requires that all packages respect the dependencies of every other package, impeding upgrades and sometimes completely preventing the installation of new packages. Good package managers also provide lockfiles to ensure repeatable deployments and consistent environments.

There are a lot of package managers out there; in my opinion, only uv and Pixi are up to the task of modern development and all packages should switch to them. Previous attempts at package managers didn't have lockfiles, took too long to solve for dependencies, the environment management was lacking, or were not PEP compliant. All of these features are necessary for productive usage. In my opinion, uv should be used if all the packages that you need are on PyPI; if you need conda packages, then use Pixi.

Project isolation

Along with the need for package managers to have a separate set of packages for each project, each project needs to be fully isolated from the world around it. The easiest way to do this is to set up a virtual environment, or to use your environment manager's built-in shell. Using direnv can simplify this process by automatically loading you into the environment when you enter a directory. This avoids the problem of forgetting to activate it before trying to run something, and ensures that local environment variables don't pollute other projects accidentally (e.g. per-project private API keys). When you exit the directory, direnv automatically exits the environment and unloads environment variables.

# Watch files for changes and reload the environment if modified
watch_file uv.lock  # or pixi.lock
watch_file .env

# Load the variables in the .env file (if it exists)
dotenv_if_exists .env

# Load the uv environment
uv sync --frozen --dev
source .venv/bin/activate

# Alternatively, load the pixi shell
eval "$(pixi shell-hook)"

Project tools

The most important consideration for tools is that you shouldn't have to think about them. This means they need to be simple to use, fast, and easy to understand when they produce errors. Historically, a lot of Python tooling required complicated setup to use, and allowed an excessive number of configurable options. However, features are an antipattern: mirroring the Zen of Python¹

There should be one-- and preferably only one --obvious way to:

Format code and documentation
Lint code and documentation
Type check
Unit test and check code coverage
Integration test if part of a larger application or package
Acceptance, PR, merge checks

Developers of a project should never need to think about their tooling; it should just run quickly and automatically. With the advent of LLM coding agents, the need for fast tools has increased, as they often make trivial mistakes and format inconsistently.

Formatting

Black was released on Pi Day 2018 by Łukasz Langa "to provide a consistent style and take away opportunities for arguing about style."². It was opinionated about formatting and I didn't always agree with its choices. (I still maintain that tabs are better than spaces and that the 88 character line length is slightly too small³.) However, it was fast and consistent, so I quickly switched to it to end all questions about how to format things. Ruff has since replaced Black as my formatting tool due to its incredible speed (it maintains the same formatting style).

Linting

Ruff has also adopted the functionality of many other tools, such as isort (for sorting imports in a standardized manner), and code linting (static code analysis to catch logic errors). Despite having coded with Python since 2009, I regularly make simple mistakes (and so do AI agents), and Ruff catches the vast majority of them. With the addition of docstring linting, it can even find missing arguments in docstrings and ensure that you conform to documentation writing-style standards.

Typing

Type hinting was first released in Python 3.5, and is incredibly useful for improving the readability and maintainability of your code. Type checking can ensure that you don't run into type errors at runtime, or accidentally misuse duck typing. It also encourages cleaner code, by making multi-argument and return types explicit.

Many different type checkers have been released for Python (e.g. mypy, pytype, Pyre, and Pyright). In my opinion ty and Pyrefly represent the best combinations of speed and breadth of type support, being significantly faster than their competition. Now that ty is in beta and has mostly stabilized, it is my default type checker, and I recommend everyone upgrade from mypy unless you require mypy specific plugins (e.g. pydantic⁴).

Unit testing

It is important that code does what the user thinks it does. Writing unit tests helps ensure that code works as expected, and that any changes do not disrupt the expected behavior. Pytest is perhaps the most common unit testing framework, and makes it simple to write code to check your functions:

from pytest import raises, approx, mark

def test_add() -> None:
    assert add(2, 2) == 4

    # check approximate equivalence of floats
    assert add(2.5, 3.9) == approx(7.4)

    with raises(TypeError, match="Cannot add a str and an int")
        add("a", 4)

@mark.skip("Not yet implemented")
def test_multiply() -> None:
    multiply(3, 4)

Tests can also be placed in documentation (see documentation section [add link]). This provides a useful indication of how a function should be used and a simple smoke test to indicate when something has gone horribly wrong. Writing comprehensive tests in doctests should be avoided due to the excessive verbosity, and the lack of formatting, linting, and type checking of them.⁵

Checking that your codebase is adequately covered by unit tests is also useful. A simple pytest --cov will report how many of your lines of code are touched when running unit tests. Various sites exist that can report coverage; my default being codecov.io, but I haven't performed a thorough enough examination of the options to provide a strong opinion.

Git flow

Tracking the changes in your code through Git is helpful for working with others and finding where mistakes entered. The use of automated tooling to run checks on commits, PRs, and merges to master helps prevent bugs from sneaking into your codebase.

Pre-commit hooks are tests that run on every commit (and possibly push). These should be fast checks that test if your code is correctly formatted, linted, and type checked, ensuring that merging it won't likely mess everything else up (I recommend prek, which is relatively new). It is important that these tests run quickly, otherwise developers will be tempted to skip them.

Pull requests and merges to the master code branch should include a more complete set of tests, typically the above + unit tests and code coverage. Test matrices can be used to check against multiple versions of Python or multiple operating systems, the latter being particularly useful for multi-language projects. Large projects may split tests into a diet set of tests for PRs, and a more complete set that run when the PR is merged to master. If your project includes multiple semi-independent parts (e.g. a core library and a bunch of extension), you can have separate GitHub Actions workflows for different sections of the codebase, minimizing the number of tests that need to be run on each PR.

Publishing packages

The simplest way to distribute your work and encourage others to use and build with it is to release it as a package. Packages are commonly released on PyPI or conda-forge, the latter often being favored for multi-language packages, those containing binary extensions, or OS-specific configurations. With the rise of Python wheels, most packages can be released on PyPI.

Various front-end and backend tools exist for building and deploying packages. If using uv, uv build will trigger a configurable build backend that constructs the package (I prefer hatchling as the build backend). uv publish uploads the package to PyPI for others to download and use. All of this can be placed in a GitHub Action to be triggered on a new version tag or a release.

Regularly shipping new versions makes the task much easier each time, and makes any bugs that users find much simpler to fix by just releasing a new version with the correction. For normal-sized projects I recommend shipping a new version of your code for every 10–50 PRs—more often and it may become excessive noise for the users of your package (though some developers recommend publishing every new merge), less often and bug fixes and new features won't get to users fast enough.

Documentation

Documentation is useful for both the programmers and users of a package. Docstrings can be built to produce public documentation, and doctests can provide useful, runnable examples for users. They are also a great opportunity to ensure that the code you have written is clear; if the description of the features in the docstring gets too long, it is often an indication that the function is too complicated, and should probably be split into multiple functions.

I prefer Google-style docstrings, but there are many good options. Argument descriptions should be terse and leading articles should be avoided (e.g. drop the from numerator: the number to be divided since it is implicit). The docstrings should not contain type information, as that information is included as functions type hints, and types in docstrings can quickly become stale since static typing does not check them. Not everything needs a complete docstring; simple functions with good argument names often make full docstrings unnecessary, but when in doubt you should write a full docstring.

def divide(numerator: float, denominator: float) -> float:
    """Divide numerator by denominator.

    Args:
        numerator: number to be divided (dividend)
        denominator: number to divide by (divisor)

    Returns:
        result of division (quotient)

    # `pytest --doctest-modules` will run this and check result
    >>> divide(4, 2)
    2
    """

Documentation websites can be automatically generated from docstrings via tools like sphinx, providing a quick way for users of a package to understand functions, without requiring them to dig into the source code.

Licensing code

Even when source code is shared publicly, there is no inherent license for others to use and modify it. When publishing your code, you should pick a standard license (I recommend MIT) and place it in the same repository as your code so that others will know how they are allowed to interact with your code.

There are numerous classes of less than free licenses that make interacting with your code difficult for commercial entities (see Google's guide for more). In my experience, copyleft licenses like GPL that restrict the ability to make derivative works without sharing them publicly discourage the use of such code due to potential liabilities. Many companies refuse outright to use GPL-licensed code, thus starving such open-source projects of exposure and contributions. Similarly, non-commercial licenses like ASL can have unexpected consequences; at Rowan we cannot benchmark ASL-licensed models in the papers that we publish, as research carve-outs are generally not allowed. In my opinion, licenses that restrict the usage of code to specific purposes are fundamentally against the spirit of open source, and should be avoided.

Conclusion

Setting up projects well takes significant work, but will have a lasting impact on your ability to develop and distribute code quickly. Standardizing setup and maintenance of your code with modern tooling will help minimize friction and allow you to easily check the code you (or your LLM agent) write. This is particularly important for organizations, where different individual styles with regard to formatting, linting, and tooling can lead to miscommunication and conflict, slowing down the process⁶ or leading to significant noise in commits due to changing styles.

As a start, I recommend using my cookiecutters for uv and pixi to set up all of your new code. Feel free to fork them for your organization's needs (I maintain separate internal versions with Rowan-specific setup) or contribute your own improvements so that we can all benefit. If you can't fully adopt all of the tooling and setup, I highly recommend using parts of the cookiecutters to slowly move your existing projects toward standardized tooling and setup.

There should be one-- and preferably only one --obvious way to do it
Although that way may not be obvious at first unless you're Dutch.

Much like gofmt

I'm a big-brained hobgoblin

⁴

Upgrading is still pretty easy

⁵

Though I did it for fun once

⁶

Bike shedding