Automating Workflows

As part of my team’s development environment, I set up a CI pipeline for our repo. CI or Continuous Integration is a devops practice where developers often merge their code to trunk of the repository. Since commits are made often, it’s important to determine whether or not new features or code pass the application’s integration and unit tests. There are tons of options when it comes to setting up a CI pipeline but the way I chose was Github Actions since our repository is on Github. Let’s take a look at my workflow and go through it step by step.

Github Actions uses YAML format to define a series of steps to take when performing actions with your codebase. First though, we have to tell github actions when to run this workflow by using on.

name: CI Workflow
run-name: Lint and Test
# Whenever a developer opens up a pull request against the main branch of our repository, this workflow will run.
on: 
  pull_request:
    branches:
      - "main"

Next, are the parts where we define what to actually do. In our case, we have a number of microservices and our front end as part of our monorepo. Since our microservices are meant to be standalone applications, I decided to write our workflow to first check if any files in a microservice changed before running the unit tests. However, let’s first install the necessary tools to lint the project and make sure it conforms to our code style guidelines and pep8 standards.

jobs:
  #----------------------------------------------
  #          install and run linters
  #----------------------------------------------
  python_linting:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"

      - name: Load pip cache if it exists
        uses: actions/cache@v3
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip
          restore-keys: ${{ runner.os }}-pip

      - name: Install python code quality tools
        run: python -m pip install isort black flake8 yamllint
      - name: Run isort, black, & flake8
        working-directory: ./services
        run: |
          isort .  --profile=black --filter-files --line-length=100
          black . --line-length=100 --check
          flake8 . --max-line-length=100 --extend-ignore=E203
      - name: Run yamllint
        run: yamllint . -d relaxed --no-warnings

Next up, we detect which of our microservices (or frontend) had file changes. That way we can spin up test workers to install the dependencies for each of those applications and test them independently. Say our etl-service, prediction-api, and frontend changed in this pull request. This part of the workflow will output a list of changed microservices/frontend something like [etl-service, prediction-api, frontend]

      #----------------------------------------------
      #       Detect Changes to Microservices and Frontend
      # 
      #----------------------------------------------
  detect_changes_job:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: read
    outputs:
      services_changed_arr: ${{steps.services_filter.outputs.changes}} # arr of libs (services) changed
      frontend_changed_bool: ${{ steps.frontend_filter.outputs.frontend }}

    steps:
      - name: Detect which/if any services changes
        uses: dorny/paths-filter@v2
        id: services_filter
        with:
          filters: |
            etl-service:
              - 'services/etl-service/**'
            neural-network:
              - 'services/neural-network/**'
            prediction-api:
              - 'services/prediction-api/**'
            utilities:
              - 'services/utilities/**'

      - name: Detect if any frontend changes
        uses: dorny/paths-filter@v2
        id: frontend_filter
        with:
          filters: |
            frontend:
              - 'frontend/**'

Finally, we spin up a matrix of jobs, install the dependencies for each part of the project, then run our tests. In github actions, a matrix is basically a for loop. We simply loop through our list of changed services and perform the same actions on each. The frontend is a little different to test since it’s written in Javascript so we define a different action to take for that compared to our python microservices. What’s nice about a matrix approach is that our workers can run concurrently and share cached files which gets our production code out faster!

      #----------------------------------------------
      #       Run matrix of jobs based on services changed
      # 
      #----------------------------------------------
  run_ci_on_service:
    needs: detect_changes_job # detect_changes_job result
    if: needs.detect_changes_job.outputs.services_changed_arr != '[]' # [etl-service, neural-network, ...]
    strategy:
      matrix:
        services_to_test: ${{ fromJSON(needs.detect_changes_job.outputs.services_changed_arr) }} # parse
    runs-on: ubuntu-latest
    steps:
      - name: Check out repository
        uses: actions/checkout@v3

      - name: Set up python 3.10
        id: setup-python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"

      - name: Load cached venv
        id: cached-poetry-dependencies
        uses: actions/cache@v3
        env:
          LOCKFILE_LOCATION: "**/${{matrix.services_to_test}}/poetry.lock" # workaround for using variable in the hashFiles function below
        with:
          path: ./services/${{matrix.services_to_test}}/.venv
          key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles(env.LOCKFILE_LOCATION) }}

      - name: Load cached .local
        uses: actions/cache@v3
        with:
          path: ~/.local
          key: localdir-${{ runner.os }}-${{ hashFiles('.github/workflows/main.yaml') }}

      - name: Install/configure Poetry
        uses: snok/install-poetry@v1.3.3
        with:
          virtualenvs-create: true
          virtualenvs-in-project: true

      - name: Install project dependencies # if cache doesn't exist
        working-directory: ./services/${{matrix.services_to_test}}
        if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
        run: poetry install

      - name: Set modified dirname envvar # this is due to pycov needing the module name and not a folder
        id: modify-dirname
        run: |
          export modified_dirname=$(echo ${{ matrix.services_to_test }} |  sed 's/-/_/g')
          echo "dirname=$modified_dirname" >> $GITHUB_ENV

      - name: Install sndfile dep for utilities Librosa
        if: matrix.services_to_test == 'utilities'
        run: |
          sudo apt-get install libsndfile1-dev

        # todo: add in coverage report and thresholds
      - name: Run tests and generate report
        working-directory: ./services/${{matrix.services_to_test}}
        run: |
          poetry run pytest --cov=${{env.dirname}} ./tests

      #----------------------------------------------
      #       Run CI on frontend
      # 
      #----------------------------------------------

  run_ci_on_frontend:
    needs: detect_changes_job # detect_changes_job result
    if: needs.detect_changes_job.outputs.frontend_changed_bool == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Setup Node and cache npm dependencies
        uses: actions/setup-node@v3
        with:
          node-version: 16
          cache: "npm"
          cache-dependency-path: frontend/package-lock.json

      - name: Load cached node_modules
        id: cached-node-modules
        uses: actions/cache@v3
        with:
          path: frontend/node_modules
          key: node_modules-${{ hashFiles('frontend/package-lock.json') }}

      - name: Install node_modules # if cache doesn't exist
        working-directory: frontend
        if: steps.cached-node-modules.outputs.cache-hit != 'true'
        run: npm install

      - name: Run npm lint script
        working-directory: frontend
        run: npm run lint

It was a fun process to set up and has paid dividends (not literally I’m not getting paid for this) for my team’s development practices. Devops is awesome!

Fahad Awan's Capstone Blog

Leave a Reply Cancel reply