How to make your own
CI/CD tool with Python 3

Agenda

Why to build your own CI/CD tool in 2019?
Plan your CI/CD tool
Using OpenAPI 3 schema for all good things
Async Python frameworks in 2019
Integrating OpenAPI 3 schema with aiohttp
Working with GitHub API in Python
Running tasks & getting results with asyncio
Problems and solutions
Demo time

Am I serious?

The Spec

There are 3 GitHub repos:
- Django REST Framework API
- React Frontend
- AI Code

Infrastructure contains:
- AWS ElasticBeanstalk application with API & AI Code
- AWS S3 bucket with frontend dist

The Task

Allow to,
- Deploy latest API commit to AWS EB
- Deploy latest AI commit to AWS EB
- Deploy latest frontend commit to AWS S3
As well as,
- Deploy custom AI commit to AWS EB
- Do not deploy every commit, only selected ones
Nice to have an ability to deploy from Django Admin

CI/CD tools in 2019

Travis CI — trust issues
Circle CI — expensive for non-OSS
Drone.io — deploy feature was in progress (in May 2019)
GitLab CI — we’re using GitHub
GitHub Actions — not available for us (in May 2019)

Please, welcome!
Hobotnica!

Hobotnica

Hobotnica means octopus in Croatian
Python 3 web application
Built on top of asyncio stack
Test GitHub projects with Makefiles via GitHub webhooks
Deploy projects by requesting specific API endpoint

Step 1. Specify project to test & deploy

Registering projects

Provide GitHub credentials
Provide the project owner/name
git clone the repo

Step 2. Add GitHub webhook

Testing projects

Make an unique GitHub webhook URL
Use given URL on registering webhook
On receiving payload:
- Read credentials
- Rewind the repo
- Run make test

Step 3. Integrate with Django (or anything else)

Deploying projects

Request a deploy endpoint with passing GitHub credentials in headers
On proper credentials:
- Rewind the repo
- Run make deploy
As deploy is a timely process:
- Has deploy status endpoint to check deploy status & details

Step 4. Environment Vars

Configuring projects

Project-specific env vars (like COVERALLS_TOKEN)
Job-specific env vars (lile LEVEL)
One-time env vars (like DEBUG if something doesn’t work)

Ode to OpenAPI 3 Schema

Development starts with prototyping

There were many options on describing REST API:
- Swagger
- CoreAPI
- json:api
- RAML
Many options leads to tough decisions:
- Which format better suits both backend & frontend?
- How to generate the format specification file?
- What if format being abandoned?

OpenAPI Initiative & OpenAPI Specification

Swagger development began in 2010
SmartBear Software acquired Swagger API in 2015
Early 2016 Linux Foundation with SmartBear Software announced OpenAPI Initiative
The goal of OpenAPI Initiative:
- Provide better standard which fixes some Swagger flaws: OpenAPI Specification
- Provide set of tools to generate & edit specification

How to generate OpenAPI specification?

Write down YAML / JSON specification by themself
Generate it from some Python data structures

openapi.yaml

openapi: "3.0.2"

info:
  ...

paths:
  ...

components:
  ...

tags:
  ...

Swagger Editor

apispec

from apispec import APISpec
from marshmallow import fields, Schema

spec = APISpec(...)

class ConferenceSchema(Schema):
    name = fields.Str(required=True)

Choosing Python async
web-framework in 2019

Before we start

Sync View

from rest_framework.decorators import api_view
from rest_framework.request import Request
from rest_framework.response import Response

from .models import Model

@api_view(['GET'])
def hello_world(request: Request) -> Response:
    instance = Model.objects.get(...)
    return Response(...)

Before we start

Async View

from aiohttp import web

routes = web.RouteTableDef()

@routes.get('/')
async def hello_world(request: web.Request) -> web.Response:
    async with request.app['db'].acquire() as conn:
        instance = await conn.fetch(...)
    return web.json_response(...)

aiohttp

aio-libs/aiohttp
Current version (Oct 2019): 3.6.1
One of the first asyncio web-frameworks
Not only a web server framework, but has client for making web requests as well
Lightweight, Flask-like
Has many plugins & extensions available
Currently developed & maintained by aio-libs group

sanic

huge-success/sanic
Current version (Oct 2019): 19.6.3
Attempt to make faster async web-framework
Still very lightweight, main focus on speed
Semantically very close to aiohttp
As well many plugins & extensions available
Product of huge-success group

fastapi

tiangolo/fastapi
Current version (Oct 2019): 0.38.1
ASGI web-framework
Supports Pydantic validation out of box
Decent amount of included batteries
OpenAPI 3 Schema support out of box
Product of Sebastián Ramírez

Others?

Django 3.0 will support ASGI
vibora is more about hype from my POV
If you know other options please let me know: @playpauseandstop

I still chose aiohttp

fastapi looks promising, but doesn't comply my mypy config :)
I for sure will check on Django 3.0!
But as of Oct 2019, aiohttp is still my choice for async Python web framework

`rororo==2.0.0`

aiohttp and OpenAPI?

aiohttp-apispec is the best choice, but will produce Swagger 2.0 schema
aiohttp-swagger a bit outdated & again will produce Swagger 2.0 schema
Not sure what to say about aiohttp-swagger3
Every solution centered about having the schema inside of Python docstrings

pyramid_openapi3 gave me a think

What If?

Support OpenAPI 3 schema by defining path to openapi.yaml
Decorate view and set which OpenAPI operationId to use
Access validated data from request instance

Then I met connexion

But it is a framework on top of Flask
It has an aiohttp server
But still…

So, why not to create another solution?

Step 1. Initialization (app.py)

from pathlib import Path

from aiohttp import web
from rororo import setup_api

def create_app(argv: List[str] = None) -> web.Application:
    app = web.Application()

    setup_api(app, Path(__file__).parent / 'openapi.yaml')

    return app

So, why not to create another solution?

Step 2. Routing (views.py)

from aiohttp import web
from rororo import RouteTableDef

routes = RouteTableDef(prefix="/api/repositories")

@routes.post("")
async def add_repository(request: web.Request) -> web.Response:
    ...

@routes.get("/{owner}/{name}")
async def retrieve_repository(request: web.Request) -> web.Response:
    ...

So, why not to create another solution?

Step 3. Access the data (views.py)

Request:

http POST /api/repositories \
Authorization:"Bearer {token}" X-GitHub-Username:{username} \
owner={owner} name={name}

So, why not to create another solution?

Step 3. Access the data (views.py)

Request Handler:

from rororo import openapi_context

@routes.post("")
async def add_repository(request: web.Request) -> web.Response:
    with openapi_context(request) as context:  # IMPORTANT: Sync context manager!
        github_user = await github.authenticate(
            context.parameters["X-GitHub-Username"],  # Parameter from header
            context.security["jwt"]  # Security scheme
        )
        if await github.has_access(
            github_user,
            context.data["owner"],  # Request body data
            context.data["name"]   # Request body data
        ):
            ...

So, why not to create another solution?

In past rororo was:

An attempt to built a web-framework on top of routr
An attempt to implement own schemas & utilities for Python web frameworks

Now rororo is a library for:

OpenAPI 3 schema support for aiohttp.web applications

On to implementation!

Working with GitHub API v4

GitHub API v4 uses GraphQL
Simplest way: use aiohttp.client for interacting with it

github.py

from aiohttp import ClientSession

def session_context(personal_token: str) -> ClientSession:
    return ClientSession(
        headers={
            "Authorization": f"Bearer {personal_token}",
            "User-Agent": "YourUserAgent/1.0",
        },
        raise_for_status=True
    )

"Authenticate" GitHub user

github.py

from .constants import GITHUB_API_URL, GQL_VIEWER

async def authenticate(username: str, token: str) -> GitHubUser:
    with session_context(token) as session:
        response = await session.post(GITHUB_API_URL, json={"query": GQL_VIEWER})
        response_data = await response.json()

    viewer = GitHubUser(**response_data["data"]["viewer"])
    if viewer.login != username:
        raise InvalidCredentials()

    return viewer

GraphQL Queries

constants.py

GITHUB_API_URL = "https://api.github.com/graphql"
GQL_VIEWER = """
query GetViewer {
    viewer {
        id
        login
        name
        url
    }
}
"""

Has user access to the repository?

constants.py

GQL_REPOSITORY = """
query GetRepository($owner: String!, $name: String!) {
    repository(owner: $owner, name: $name) {
        id
        name
        description
    }
}
"""

Has user access to the repository?

from .constants import GQL_REPOSITORY

async def has_access(user: GitHubUser, owner: str, name: str) -> bool:
    with session_context(user.personal_token) as session:
        response = await session.post(
            GITHUB_API_URL,
            json={
                "query": GQL_REPOSITORY,
                "variables": {
                    "owner": owner,
                    "name": name
                }
            }
        )
        return (await response.json()["data"]["repository"]) is not None

Anything else?

Clone the repo

git clone https://${username}:${personal_token}@github.com/${owner}/${name}.git

Update the repo?

git checkout ${branch}
git pull origin ${branch}

Important: Remove the repo after user access has been revoked!

Ode to `asyncio.subprocess`

Running system commands then…

python at /Users/playpauseandstop/Projects

import subprocess

assert subprocess.call(["pwd"]) == 0
assert subprocess.check_output(["pwd"]) == b"/Users/playpauseandstop/Projects"

result = subprocess.run(["pwd"])
assert result.returncode == 0
assert result.stdout == ""

result = subprocess.run(["pwd"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
assert result.stdbout == b"/Users/playpauseandstop/Projects""

Why `subprocess` doesn’t fit?

subprocess.run blocks the request handler
For sync frameworks it can be fixed by gunicorn or uwsgi
For async frameworks there is no need to use such solutions
asyncio library has a way to make any code works async

Possible solution

import asyncio
import subprocess
from concurrent import futures
from subprocess import CompletedProcess
from typing import Any

async def subprocess_run(*args: Any, **kwargs: Any) -> CompletedProcess:
    kwargs.setdefault("stdout", subprocess.PIPE)
    kwargs.setdefault("stderr", subprocess.PIPE)
    return asyncio.run_in_executor(None, subprocess.run, *args, **kwargs)

Issues?

Feels not very Pythonic
Threads overhead?
If using executor instead of None, how to share it within the app?

Better solution

asyncio.subprocess

shell.py

import asyncio
from asyncio.subprocess import PIPE, Process

async def run(cmd: str) -> Process:
    return await asyncio.create_subprocess_shell(cmd, stdout=PIPE, stderr=PIPE)

Better solution

views.py

from . import shell

@routes.post("")
async def add_repository(request: web.Request) -> web.Response:
    ...

    proc = await shell.run(
        f"git clone https://{username}:{personal_token}@{owner}/{name}.git "
        f"{path_to_repo}"
    )
    _, stderr = awit proc.communicate()

    return web.json_response(
        {
            "cloned": proc.returncode == 0,
            "errors": stderr
        }
    )

Problems & Solutions

OpenAPI Schema management

Still searching for best workflow…

Makefile

SWAGGER_HOST ?= 0.0.0.0
SWAGGER_PORT ?= 8422

# Edit OpenAPI schema
swagger-editor:
    docker run --rm -h $(SWAGGER_HOST) -p $(SWAGGER_PORT):8080 \
    swaggerapi/swagger-editor:v3.6.36

# Test OpenAPI schema (requires project to be run as well)
swagger-ui:
    docker run --rm -e URL="http://$(API_HOST):$(API_PORT)/api/openapi.json" \
    -h $(SWAGGER_HOST) -p $(SWAGGER_PORT):8080 swaggerapi/swagger-ui:v3.23.11

Validate OpenAPI Schema

Step 1. Add openapi-spec-validator to dev requiremements

poetry add -D openapi-spec-validator

Step 2. test_openapi.py

from openapi_spec_validator import validate_spec
from rororo import get_openapi_schema

from api.app import create_app

def test_openapi_schema():
    validate_spec(get_openapi_schema(create_app()))

Step 3 (Optional). Validate openapi.json / openapi.yaml response

Working with sensitive data

Identify sensitive data
- GitHub personal token
- Sensitive environment vars
Encrypt the data
Do not include sensitive data in any output

Proper environment for `shell.run`

Avoid passing entire os.environ as:

await asyncio.create_subprocess_shell(cmd, env=os.environ)

Instead filter out virtual environment env vars,

IGNORE_ENV_KEYS = {"POETRY", "VIRTUAL_ENV"}

await asyncio.create_subprocess_shell(
    cmd,
    env={
        key: value
        for key, value in os.environ.items()
        if key not in IGNORE_ENV_KEYS
    }
)

`shell.run` actually "blocks" the request handler

async def add_repository(request: web.Request) -> web.Response:
    ...
    proc = await shell.run("...")
    ...

Still wait on git clone to finish
For multiple concurrent requests total time of execution:
- Will not equal cumulative time of all git clone as with subprocess.run
- Will equal time of longest git clone

`asyncio.create_task` for rescue!

import asyncio

async def add_repository(request: web.Request) -> web.Response:
    ...

    # Create the task and not await on its result
    asyncio.create_task(shell.run("..."))

    # Which means status is in progress instead of done
    return web.json_response({"status": "cloning"})

Shielding requests & jobs

Step 1. Ensure all non-idempotent methods are safe against CancelledErrors

from aiohttp_middlewares import NON_IDEMPOTENT_METHODS, shield_middleware

def create_app(argv: List[str] = None) -> web.Application:
    app = web.Application(middlewares=(
        shield_middleware(methods=NON_IDEMPOTENT_METHODS),
    ))
    ...

Shielding requests & jobs

Step 2. Ensure the application exited only after last job is completed!

On start job:

request.app["jobs"].append(job.uid)

On finishing (canceling) job:

job_context.app["jobs"].remove(job.uid)

Shielding requests & jobs

Step 2. Ensure the application exited only after last job is completed!

app.on_shutdown signal:

from async_timeout import timeout

async def wait_for_empty_jobs(app: web.Application) -> None:
    with timeout(18000):  # 30 minutes should be enough to finish all the jobs
        if not app["jobs"]:
            return
        await asyncio.sleep(15.)

How to test Hobotnica?

Unit Tests

async def test_shell_run():
    proc = await shell.run("echo 'Hello, world!'")
    stdout, stderr = await proc.communicate()

    assert proc.returncode == 0
    assert stdout == b"Hello, world!\n"
    assert stderr == b""

How to test Hobotnica?

Mock GitHub Requests with aioresponses

import asyncio

from aioresponses import aioresponses

def test_has_not_access(github_user_factory):
    loop = asyncio.get_event_loop()

    with aioresponses() as mocked:
        mocked.post(GITHUB_API_URL, payload={"data": {"repository": None}})
        assert loop.run_until_complete(
            github.has_access(github_user_factory(), "fake-owner", "fake-name")
        ) is False

How to test Hobotnica?

Integrational Tests

conftest.py

def pytest_configure(config):
    config.addinivalue_line("markers", "integrational: mark test as an integrational.")

api/repositories/tests/test_integrational.py

@pytest.mark.integrational
async def test_add_repository():
    ...

To run:

poetry run python -m pytest -m integrational

Blue-Green deployment for Hobotnica

Before:

Restart Hobotnica service with constant PORT
vhost nginx config proxy_pass PORT

After:

Previous Hobotnica service run on PORT_X
Detached stop (systemctl stop hobotnica-X &) old service on PORT_X
Start new service on PORT_Y
Supply new vhost nginx config, which proxy_pass PORT_Y

Demo Time

Sort Of

Repositories & Jobs

Environment & GitHub Webhook

Deployer Django Admin

Conclusion

asyncio is fun

I am using asyncio for years now
And I am still having fun!
asyncio stack became more & more mature
I’d choose REST API covered by OpenAPI 3 Schema instead of GraphQL
Django 3.0 should increase interest to asyncio stack even more

Hobotnica future?

I am using it for testing & deploying all my pet projects
And for deploying that Django & React app
Hope to Open Source it shortly after:
- Storing data in PostgreSQL instead of text files
- Better docker & docker-compose support