How to make your own
CI/CD tool with Python 3

Agenda

  • Why to build your own CI/CD tool in 2019?
  • Plan your CI/CD tool
  • Using OpenAPI 3 schema for all good things
  • Async Python frameworks in 2019
  • Integrating OpenAPI 3 schema with aiohttp
  • Working with GitHub API in Python
  • Running tasks & getting results with asyncio
  • Problems and solutions
  • Demo time

Am I serious?

The Spec

  • There are 3 GitHub repos:
    • Django REST Framework API
    • React Frontend
    • AI Code
  • Infrastructure contains:
    • AWS ElasticBeanstalk application with API & AI Code
    • AWS S3 bucket with frontend dist

The Task

  • Allow to,
    • Deploy latest API commit to AWS EB
    • Deploy latest AI commit to AWS EB
    • Deploy latest frontend commit to AWS S3
  • As well as,
    • Deploy custom AI commit to AWS EB
    • Do not deploy every commit, only selected ones
  • Nice to have an ability to deploy from Django Admin

CI/CD tools in 2019

Please, welcome!
Hobotnica!

Hobotnica

  • Hobotnica means octopus in Croatian
  • Python 3 web application
  • Built on top of asyncio stack
  • Test GitHub projects with Makefiles via GitHub webhooks
  • Deploy projects by requesting specific API endpoint

Step 1. Specify project to test & deploy

Registering projects

  1. Provide GitHub credentials
  2. Provide the project owner/name
  3. git clone the repo

Step 2. Add GitHub webhook

Testing projects

  1. Make an unique GitHub webhook URL
  2. Use given URL on registering webhook
  3. On receiving payload:
    • Read credentials
    • Rewind the repo
    • Run make test

Step 3. Integrate with Django (or anything else)

Deploying projects

  1. Request a deploy endpoint with passing GitHub credentials in headers
  2. On proper credentials:
    • Rewind the repo
    • Run make deploy
  3. As deploy is a timely process:
    • Has deploy status endpoint to check deploy status & details

Step 4. Environment Vars

Configuring projects

  1. Project-specific env vars (like COVERALLS_TOKEN)
  2. Job-specific env vars (lile LEVEL)
  3. One-time env vars (like DEBUG if something doesn’t work)

Ode to OpenAPI 3 Schema

Development starts with prototyping

  • There were many options on describing REST API:
    • Swagger
    • CoreAPI
    • json:api
    • RAML
  • Many options leads to tough decisions:
    • Which format better suits both backend & frontend?
    • How to generate the format specification file?
    • What if format being abandoned?

OpenAPI Initiative & OpenAPI Specification

  • Swagger development began in 2010
  • SmartBear Software acquired Swagger API in 2015
  • Early 2016 Linux Foundation with SmartBear Software announced OpenAPI Initiative
  • The goal of OpenAPI Initiative:
    • Provide better standard which fixes some Swagger flaws: OpenAPI Specification
    • Provide set of tools to generate & edit specification

How to generate OpenAPI specification?

  • Write down YAML / JSON specification by themself
  • Generate it from some Python data structures
openapi: "3.0.2"

info:
  ...

paths:
  ...

components:
  ...

tags:
  ...
Swagger Editor
from apispec import APISpec
from marshmallow import fields, Schema

spec = APISpec(...)

class ConferenceSchema(Schema):
    name = fields.Str(required=True)

Choosing Python async
web-framework in 2019

Before we start

Sync View

from rest_framework.decorators import api_view
from rest_framework.request import Request
from rest_framework.response import Response

from .models import Model

@api_view(['GET'])
def hello_world(request: Request) -> Response:
    instance = Model.objects.get(...)
    return Response(...)

Before we start

Async View

from aiohttp import web

routes = web.RouteTableDef()

@routes.get('/')
async def hello_world(request: web.Request) -> web.Response:
    async with request.app['db'].acquire() as conn:
        instance = await conn.fetch(...)
    return web.json_response(...)

aiohttp

  • aio-libs/aiohttp
  • Current version (Oct 2019): 3.6.1
  • One of the first asyncio web-frameworks
  • Not only a web server framework, but has client for making web requests as well
  • Lightweight, Flask-like
  • Has many plugins & extensions available
  • Currently developed & maintained by aio-libs group

sanic

  • huge-success/sanic
  • Current version (Oct 2019): 19.6.3
  • Attempt to make faster async web-framework
  • Still very lightweight, main focus on speed
  • Semantically very close to aiohttp
  • As well many plugins & extensions available
  • Product of huge-success group

fastapi

  • tiangolo/fastapi
  • Current version (Oct 2019): 0.38.1
  • ASGI web-framework
  • Supports Pydantic validation out of box
  • Decent amount of included batteries
  • OpenAPI 3 Schema support out of box
  • Product of Sebastián Ramírez

Others?

  • Django 3.0 will support ASGI
  • vibora is more about hype from my POV
  • If you know other options please let me know: @playpauseandstop

I still chose aiohttp

  • fastapi looks promising, but doesn't comply my mypy config :)
  • I for sure will check on Django 3.0!
  • But as of Oct 2019, aiohttp is still my choice for async Python web framework

rororo==2.0.0

aiohttp and OpenAPI?

  • aiohttp-apispec is the best choice, but will produce Swagger 2.0 schema
  • aiohttp-swagger a bit outdated & again will produce Swagger 2.0 schema
  • Not sure what to say about aiohttp-swagger3
  • Every solution centered about having the schema inside of Python docstrings

pyramid_openapi3 gave me a think

What If?

  1. Support OpenAPI 3 schema by defining path to openapi.yaml
  2. Decorate view and set which OpenAPI operationId to use
  3. Access validated data from request instance

Then I met connexion

  • But it is a framework on top of Flask
  • It has an aiohttp server
  • But still…

So, why not to create another solution?

Step 1. Initialization (app.py)

from pathlib import Path

from aiohttp import web
from rororo import setup_api

def create_app(argv: List[str] = None) -> web.Application:
    app = web.Application()

    setup_api(app, Path(__file__).parent / 'openapi.yaml')

    return app

So, why not to create another solution?

Step 2. Routing (views.py)

from aiohttp import web
from rororo import RouteTableDef

routes = RouteTableDef(prefix="/api/repositories")

@routes.post("")
async def add_repository(request: web.Request) -> web.Response:
    ...

@routes.get("/{owner}/{name}")
async def retrieve_repository(request: web.Request) -> web.Response:
    ...

So, why not to create another solution?

Step 3. Access the data (views.py)

Request:

http POST /api/repositories \
Authorization:"Bearer {token}" X-GitHub-Username:{username} \
owner={owner} name={name}

So, why not to create another solution?

Step 3. Access the data (views.py)

Request Handler:

from rororo import openapi_context

@routes.post("")
async def add_repository(request: web.Request) -> web.Response:
    with openapi_context(request) as context:  # IMPORTANT: Sync context manager!
        github_user = await github.authenticate(
            context.parameters["X-GitHub-Username"],  # Parameter from header
            context.security["jwt"]  # Security scheme
        )
        if await github.has_access(
            github_user,
            context.data["owner"],  # Request body data
            context.data["name"]   # Request body data
        ):
            ...

So, why not to create another solution?

In past rororo was:

  • An attempt to built a web-framework on top of routr
  • An attempt to implement own schemas & utilities for Python web frameworks

Now rororo is a library for:

OpenAPI 3 schema support for aiohttp.web applications

On to implementation!

Working with GitHub API v4

  • GitHub API v4 uses GraphQL
  • Simplest way: use aiohttp.client for interacting with it

github.py

from aiohttp import ClientSession

def session_context(personal_token: str) -> ClientSession:
    return ClientSession(
        headers={
            "Authorization": f"Bearer {personal_token}",
            "User-Agent": "YourUserAgent/1.0",
        },
        raise_for_status=True
    )

"Authenticate" GitHub user

github.py

from .constants import GITHUB_API_URL, GQL_VIEWER

async def authenticate(username: str, token: str) -> GitHubUser:
    with session_context(token) as session:
        response = await session.post(GITHUB_API_URL, json={"query": GQL_VIEWER})
        response_data = await response.json()

    viewer = GitHubUser(**response_data["data"]["viewer"])
    if viewer.login != username:
        raise InvalidCredentials()

    return viewer

GraphQL Queries

constants.py

GITHUB_API_URL = "https://api.github.com/graphql"
GQL_VIEWER = """
query GetViewer {
    viewer {
        id
        login
        name
        url
    }
}
"""

Has user access to the repository?

constants.py

GQL_REPOSITORY = """
query GetRepository($owner: String!, $name: String!) {
    repository(owner: $owner, name: $name) {
        id
        name
        description
    }
}
"""

Has user access to the repository?

from .constants import GQL_REPOSITORY

async def has_access(user: GitHubUser, owner: str, name: str) -> bool:
    with session_context(user.personal_token) as session:
        response = await session.post(
            GITHUB_API_URL,
            json={
                "query": GQL_REPOSITORY,
                "variables": {
                    "owner": owner,
                    "name": name
                }
            }
        )
        return (await response.json()["data"]["repository"]) is not None

Anything else?

Clone the repo

git clone https://${username}:${personal_token}@github.com/${owner}/${name}.git

Update the repo?

git checkout ${branch}
git pull origin ${branch}

Important: Remove the repo after user access has been revoked!

Ode to asyncio.subprocess

Running system commands then…

python at /Users/playpauseandstop/Projects

import subprocess

assert subprocess.call(["pwd"]) == 0
assert subprocess.check_output(["pwd"]) == b"/Users/playpauseandstop/Projects"

result = subprocess.run(["pwd"])
assert result.returncode == 0
assert result.stdout == ""

result = subprocess.run(["pwd"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
assert result.stdbout == b"/Users/playpauseandstop/Projects""

Why subprocess doesn’t fit?

  • subprocess.run blocks the request handler
  • For sync frameworks it can be fixed by gunicorn or uwsgi
  • For async frameworks there is no need to use such solutions
  • asyncio library has a way to make any code works async

Possible solution

import asyncio
import subprocess
from concurrent import futures
from subprocess import CompletedProcess
from typing import Any

async def subprocess_run(*args: Any, **kwargs: Any) -> CompletedProcess:
    kwargs.setdefault("stdout", subprocess.PIPE)
    kwargs.setdefault("stderr", subprocess.PIPE)
    return asyncio.run_in_executor(None, subprocess.run, *args, **kwargs)

Issues?

  • Feels not very Pythonic
  • Threads overhead?
  • If using executor instead of None, how to share it within the app?

Better solution

asyncio.subprocess

shell.py

import asyncio
from asyncio.subprocess import PIPE, Process

async def run(cmd: str) -> Process:
    return await asyncio.create_subprocess_shell(cmd, stdout=PIPE, stderr=PIPE)

Better solution

views.py

from . import shell

@routes.post("")
async def add_repository(request: web.Request) -> web.Response:
    ...

    proc = await shell.run(
        f"git clone https://{username}:{personal_token}@{owner}/{name}.git "
        f"{path_to_repo}"
    )
    _, stderr = awit proc.communicate()

    return web.json_response(
        {
            "cloned": proc.returncode == 0,
            "errors": stderr
        }
    )

Problems & Solutions

OpenAPI Schema management

Still searching for best workflow…

Makefile

SWAGGER_HOST ?= 0.0.0.0
SWAGGER_PORT ?= 8422

# Edit OpenAPI schema
swagger-editor:
    docker run --rm -h $(SWAGGER_HOST) -p $(SWAGGER_PORT):8080 \
    swaggerapi/swagger-editor:v3.6.36

# Test OpenAPI schema (requires project to be run as well)
swagger-ui:
    docker run --rm -e URL="http://$(API_HOST):$(API_PORT)/api/openapi.json" \
    -h $(SWAGGER_HOST) -p $(SWAGGER_PORT):8080 swaggerapi/swagger-ui:v3.23.11

Validate OpenAPI Schema

Step 1. Add openapi-spec-validator to dev requiremements

poetry add -D openapi-spec-validator

Step 2. test_openapi.py

from openapi_spec_validator import validate_spec
from rororo import get_openapi_schema

from api.app import create_app

def test_openapi_schema():
    validate_spec(get_openapi_schema(create_app()))

Step 3 (Optional). Validate openapi.json / openapi.yaml response

Working with sensitive data

  1. Identify sensitive data
    • GitHub personal token
    • Sensitive environment vars
  2. Encrypt the data
  3. Do not include sensitive data in any output

Proper environment for shell.run

Avoid passing entire os.environ as:

await asyncio.create_subprocess_shell(cmd, env=os.environ)

Instead filter out virtual environment env vars,

IGNORE_ENV_KEYS = {"POETRY", "VIRTUAL_ENV"}

await asyncio.create_subprocess_shell(
    cmd,
    env={
        key: value
        for key, value in os.environ.items()
        if key not in IGNORE_ENV_KEYS
    }
)

shell.run actually "blocks" the request handler

async def add_repository(request: web.Request) -> web.Response:
    ...
    proc = await shell.run("...")
    ...
  • Still wait on git clone to finish
  • For multiple concurrent requests total time of execution:
    • Will not equal cumulative time of all git clone as with subprocess.run
    • Will equal time of longest git clone

asyncio.create_task for rescue!

import asyncio

async def add_repository(request: web.Request) -> web.Response:
    ...

    # Create the task and not await on its result
    asyncio.create_task(shell.run("..."))

    # Which means status is in progress instead of done
    return web.json_response({"status": "cloning"})

Shielding requests & jobs

Step 1. Ensure all non-idempotent methods are safe against CancelledErrors

from aiohttp_middlewares import NON_IDEMPOTENT_METHODS, shield_middleware

def create_app(argv: List[str] = None) -> web.Application:
    app = web.Application(middlewares=(
        shield_middleware(methods=NON_IDEMPOTENT_METHODS),
    ))
    ...

Shielding requests & jobs

Step 2. Ensure the application exited only after last job is completed!

On start job:

request.app["jobs"].append(job.uid)

On finishing (canceling) job:

job_context.app["jobs"].remove(job.uid)

Shielding requests & jobs

Step 2. Ensure the application exited only after last job is completed!

app.on_shutdown signal:

from async_timeout import timeout

async def wait_for_empty_jobs(app: web.Application) -> None:
    with timeout(18000):  # 30 minutes should be enough to finish all the jobs
        if not app["jobs"]:
            return
        await asyncio.sleep(15.)

How to test Hobotnica?

Unit Tests

async def test_shell_run():
    proc = await shell.run("echo 'Hello, world!'")
    stdout, stderr = await proc.communicate()

    assert proc.returncode == 0
    assert stdout == b"Hello, world!\n"
    assert stderr == b""

How to test Hobotnica?

Mock GitHub Requests with aioresponses

import asyncio

from aioresponses import aioresponses

def test_has_not_access(github_user_factory):
    loop = asyncio.get_event_loop()

    with aioresponses() as mocked:
        mocked.post(GITHUB_API_URL, payload={"data": {"repository": None}})
        assert loop.run_until_complete(
            github.has_access(github_user_factory(), "fake-owner", "fake-name")
        ) is False

How to test Hobotnica?

Integrational Tests

conftest.py

def pytest_configure(config):
    config.addinivalue_line("markers", "integrational: mark test as an integrational.")

api/repositories/tests/test_integrational.py

@pytest.mark.integrational
async def test_add_repository():
    ...

To run:

poetry run python -m pytest -m integrational

Blue-Green deployment for Hobotnica

Before:

  • Restart Hobotnica service with constant PORT
  • vhost nginx config proxy_pass PORT

After:

  • Previous Hobotnica service run on PORT_X
  • Detached stop (systemctl stop hobotnica-X &) old service on PORT_X
  • Start new service on PORT_Y
  • Supply new vhost nginx config, which proxy_pass PORT_Y

Demo Time

Sort Of

Repositories & Jobs

Repositories & Jobs

Environment & GitHub Webhook

Environment & GitHub Webhook

Deployer Django Admin

Deployer Django Admin

Conclusion

asyncio is fun

  • I am using asyncio for years now
  • And I am still having fun!
  • asyncio stack became more & more mature
  • I’d choose REST API covered by OpenAPI 3 Schema instead of GraphQL
  • Django 3.0 should increase interest to asyncio stack even more

Hobotnica future?

  • I am using it for testing & deploying all my pet projects
  • And for deploying that Django & React app
  • Hope to Open Source it shortly after:
    • Storing data in PostgreSQL instead of text files
    • Better docker & docker-compose support

Questions?

Twitter: @playpausenstop
GitHub: @playpauseandstop