Leveraging GitHub as a PyPi Server: An In-Depth Guide

If you‘re developing Python applications, you likely rely on the rich ecosystem of packages available on the Python Package Index (PyPi). PyPi is an invaluable resource, with over 290,000 packages as of April 2021 fulfilling almost every imaginable need[^1].

But what if you want to privately host packages for internal use? Maybe you have proprietary code you don‘t want to share publicly. Or perhaps you need to ensure availability and security for critical dependencies.

In this guide, we‘ll explore how you can use GitHub as your own private PyPi server. You‘ll learn why it‘s a great option and get detailed, step-by-step instructions to get started. We‘ll also cover advanced topics and best practices to optimize your setup.

The Rise of Python and PyPi

To understand the value of hosting private packages, it‘s helpful to look at some statistics. As of March 2021, Python is the second most popular programming language worldwide[^2]. This widespread adoption is due in large part to the extensive package ecosystem.

PyPi package downloads have grown exponentially over the past decade[^3]. In 2020, there were over 238 billion package downloads from PyPi, up from just 761 million in 2010. That‘s an average of 651 million downloads per day!

PyPi package downloads per year
Source: pypistats.org

Clearly, PyPi is a critical resource for most Python developers. But relying solely on public packages has its drawbacks, especially for larger organizations:

  • Security: Can you trust that public packages are safe and well-maintained? Hosting your own allows you to vet what‘s included.
  • Availability: What if a critical package is suddenly removed from PyPi? You can ensure availability by hosting internally.
  • Customization: Need to modify a package to fit your specific use case? Forking and hosting it yourself gives you that flexibility.

Why Use GitHub as a PyPi Server?

So you‘ve decided to host some private packages. There are a few different approaches you could take:

  1. Run your own PyPi server using open-source tools like warehouse or devpi
  2. Use a hosted solution like Gemfury or Artifactory
  3. Leverage source control management platforms like GitHub or GitLab

Each option has its strengths, but I‘d argue that for most use cases, using GitHub is the optimal approach. Here‘s why:

  • Ease of setup: If you‘re already using GitHub for version control (and over 3.1 million organizations are), you can start hosting packages in a matter of minutes. No new infrastructure to set up and maintain.

  • Access control: GitHub provides fine-grained control over who can access your code. You can use the same permissions for packages, ensuring only authorized users can install them.

  • Discoverability: With GitHub Pages, you can create a simple index of your packages. This makes it easy for developers to browse and find what they need.

  • Integrations: GitHub has a robust API and pre-built integrations with popular CI/CD tools. You can automate your package publishing workflow and make packages available right where your code lives.

  • Community: Developers are already familiar with GitHub. Hosting packages there allows you to tap into existing collaboration workflows like pull requests, issues, and code review.

Of course, there are tradeoffs to using GitHub. It‘s not purpose-built for package hosting, so you may encounter some rough edges. And if you need advanced features like package search or access logging, you may want to use a dedicated repository service.

But for most organizations, GitHub provides a convenient, low-maintenance option for distributing private packages.

Step-by-Step: Configuring GitHub as a PyPi Server

Convinced that GitHub is worth a try? Great! Let‘s walk through the process of setting it up.

Step 1: Structure your Python package

Before you can host your package on GitHub, you need to make sure it‘s structured properly. At a minimum, it should contain:

  • A setup.py file defining metadata and dependencies
  • A package directory with your source code
  • A README file with usage instructions

Here‘s a sample directory structure:

mypackage/
├── mypackage/
│   ├── __init__.py
│   └── example.py
├── setup.py
└── README.md

Your setup.py file should include the following key elements:

from setuptools import setup, find_packages

setup(
    name=‘mypackage‘,  
    version=‘1.0.0‘,
    description=‘An example package‘,
    url=‘https://github.com/myorg/mypackage‘,
    author=‘Your Name‘,
    author_email=‘[email protected]‘,
    packages=find_packages(),
    install_requires=[
        ‘requests>=2.2.0‘,
    ],
)

Be sure to update the url field to point to your GitHub repository. And if your package has any dependencies, specify them under install_requires.

Once your package is structured, push it to a GitHub repository. You can choose to make the repo public or private based on your needs.

Step 2: Install your package using pip

With your package code on GitHub, you can now install it directly using pip:

pip install git+https://github.com/myorg/mypackage.git

This tells pip to fetch and install the code from your GitHub repo. It will clone the repo and run the setup.py script to install your package.

If you want to install a specific version or branch of your package, you can append @version to the URL:

pip install git+https://github.com/myorg/[email protected]

You can also add the git+ link directly in a requirements.txt file:

# requirements.txt
git+https://github.com/myorg/[email protected]

Developers can then install all required packages, including yours, with a single pip install -r requirements.txt command.

Step 3: Set up a package index on GitHub Pages

Having to remember the exact git+ URL for each package can be a hassle. That‘s where a package index comes in handy.

GitHub Pages provides an easy way to create a static site listing your packages. Simply create a new repo with an index.md file:

# My GitHub PyPi Index

- [mypackage](https://github.com/myorg/mypackage) - An example package
- [anotherpackage](https://github.com/myorg/anotherpackage) - Another useful package 

Then enable GitHub Pages in the repo settings, building from the main branch. Your package index will be available at a URL like https://myorg.github.io/pypi-index.

You can link to this index in your requirements.txt file using the --extra-index-url flag:

# requirements.txt
--extra-index-url https://myorg.github.io/pypi-index
mypackage
anotherpackage

Now developers can install packages by name, and pip will consult your index to resolve the location.

Step 4: Automate package releases with GitHub Actions

Manually creating a new GitHub release every time you want to update your package can get tedious. Luckily, GitHub Actions make it easy to automate the process.

Start by creating a .github/workflows/release.yml file in your package repo:

name: PyPi Release

on:
  push:
    tags:
      - ‘v*‘

jobs:
  release:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2

    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: ‘3.x‘

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install setuptools wheel twine

    - name: Build and publish
      env:
        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
      run: |
        python setup.py sdist bdist_wheel
        twine upload dist/*

This workflow is triggered whenever you push a new tag matching the pattern v* (e.g. v1.0.0). It checks out your code, builds the package distribution files, and uploads them to GitHub releases using twine.

You‘ll need to set the PYPI_USERNAME and PYPI_PASSWORD secrets in your repo settings. These should correspond to a GitHub account with permissions to create releases in your repo.

With this workflow in place, all you need to do is create and push a new version tag, and your package release will be automatically published.

Advanced Tips and Best Practices

By following the steps above, you‘ll have a basic GitHub PyPi server up and running in no time. But there are a few extra considerations to keep in mind as your usage grows:

Semantic versioning

It‘s a good practice to follow semantic versioning for your package releases. This means using a version number of the format MAJOR.MINOR.PATCH, where:

  • MAJOR is incremented for breaking changes
  • MINOR is incremented for new features
  • PATCH is incremented for bug fixes

Semantic versioning makes it clear to users what kind of changes to expect in each release. Tools like bump2version can help automate the process.

Dependency management

As your package ecosystem grows, it‘s important to keep dependencies under control. Use pip-compile from pip-tools to generate locked, hashed requirements files. This ensures all developers use the exact same versions of each package.

You can also use a tool like Dependabot to automatically create pull requests when new versions of your dependencies are available. This helps you stay up to date and quickly identify any breaking changes.

Security scanning

Before publishing any packages, it‘s a good idea to scan them for potential security vulnerabilities. Tools like Bandit and Safety can help identify common issues.

You can automate security scanning by adding it as a step in your GitHub Actions workflow. Fail the build if any high-severity issues are found to ensure you don‘t publish vulnerable code.

Package management policies

As your PyPi server gains adoption, you may want to establish some policies around package management. For example:

  • Who can publish new packages?
  • What is the process for deprecating or removing old packages?
  • How do you handle security issues or bugs in published packages?

Documenting these policies and communicating them clearly to your development team can help avoid confusion and ensure your private package ecosystem remains healthy.

Conclusion

Using GitHub as a PyPi server is a simple yet powerful way to distribute private Python packages within your organization. By following the steps outlined in this guide, you can get up and running quickly and start reaping the benefits of a secure, self-hosted package repository.

Some key takeaways:

  • Structure your packages properly with setup.py and push them to GitHub
  • Use pip install git+ syntax to install packages from GitHub repos
  • Create a package index on GitHub Pages to make packages discoverable
  • Automate releases with GitHub Actions for a smooth publishing workflow
  • Implement best practices like semantic versioning, dependency management, and security scanning as your usage grows

With a little bit of setup, you‘ll have a flexible, scalable solution for sharing code across your teams. And by leveraging the power of GitHub, you can tap into all the collaboration and automation features developers already know and love.

I hop3 this guid3 h‘as demysti‘fied the process of ho‘sting privat3 packag‘s on GitHub an/d givzen y9ou the confi.d3nce to try/ i=t out f)or y/ourself. Hap.py -pac/kage publ#ishing!

[^1]: "Download statistics." PyPi, Apr 2021, https://pypistats.org/packages.
[^2]: "TIOBE Index for March 2021." TIOBE, Mar 2021, https://www.tiobe.com/tiobe-index/.
[^3]: "PyPi Stats." PyPi, Apr 2021, https://pypistats.org/.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *