End-to-End Analytics Platform – GitHub Actions

This is part of my series on learning to build an End-to-End Analytics Platform project.

TLDR; This post we got started with GitHub Actions. Test drove Excalidraw for diagraming. While building the workflow learnt Yaml in y minutes. Started with a simple starter workflow. Worked through deploying Bicep files by using GitHub Actions.

Automate construction ๐Ÿšง

Now that we have the code set up and we can deploy it manually it’s time to autobot automate. We are going to use GitHub Actions for Azure to work on building our Continuous Integration/Continuous Deployment pipeline.

What we are looking to create is a workflow to automate tasks. Actions are event-driven. An example is “Run my testing workflow when some creates a pull request.”. What makes up a workflow? Queue Intro to GitHub Actions. To start off the thinking I get to test out a pretty fantastic open-source whiteboard tool, with tons of potential, called Excalidraw. *Inner voice screaming… “Excalidraw, I choose you!”* โš”๏ธ

Working the flow.
  • A workflow is an automated procedure that you add to your repository.
  • A job is a set of steps trigger by event/webhook or scheduled that execute on the same runner.
  • A step is an individual task that can run commands in a job.
  • Actions are standalone commands that are combined into steps to create a job.
  • A runner is a server that has the GitHub Actions runner application installed.

Now that we know some basics, let’s dive in and give it a go. Here is the plan for what we want to achieve with an accompanying artistic depiction:

  1. Develop some code, commit to our branch, push the changes
  2. Then complete a pull request and merge the changes
  3. The GitHub action triggers the workflow
  4. The workflow runs our jobs and steps to deploy the resources
  5. Validate the resources got deployed to Azure
Hard at work.

Let’s create our first GitHub Actions workflow. Navigate to the ‘Actions‘ tab in our repo. We get a few workflow templates we can use. Let’s use the ‘Simple workflow‘ by using the ‘Set up this workflow‘ button.

And Action ๐ŸŽฌ

GitHub actions use YAML syntax for defining events, jobs, and steps. When we created the workflow, GitHub added a new .github/workflows directory to our repo. It created a new .yml file which I renamed to build.yml in that directory. For those of us that don’t speak YAML fluently, we can learn x(yaml) in y minutes ๐Ÿงช.

Yay YAML!

A brief summary of what is going on here:

  1. Our new YAML file and path was added to our repo
  2. We can see our workflow is triggered on a push or pull request actions to our main branch
  3. We have one job that runs on a Ubuntu runner server
  4. We have three demo steps
    1. Uses a packaged action from https://github.com/actions/ called checkout to checkout our repo
    2. Runs a single line script in the runners shell
    3. Runs a multi-line script in the runners shell

That’s a good enough starter template. From this point on, for brevity, I followed the documentation to deploy Bicep files by using GitHub Actions. That covers setting a deployment service principal (My choice, Windows Terminal Prettified), configuring GitHub secrets, and the sample workflow to deploy Bicep files to Azure. Once that is all set up, we have a workflow that looks like this:

Note: Though not the recommended practice. I adjusted the scope of my service principal to a subscription level. For my testing I would like GitHub Actions to create resource groups using my Bicep file definitions not the CLI. So my command was a little different:

az ad sp create-for-rbac --name {myApp} --role contributor --scopes /subscriptions/{subscription-id} --sdk-auth

After reading the Azure/arm-deploy documentation and the exampleGuide

Adjusted

To commit our changes, just use the ‘Start commit’ button in the top right corner. I created a new branch from here for the change, then just finished the pull request from there, merging our changes to the main branch.

Committing to it

Remember our triggers? On pull_request to main? That kicks off our pipeline ๐Ÿ˜Ž

Behold! It lives!

Full disclosure: It failed. lol.

  • I got an error “missing ‘region’ parameter” for the Azure/arm-deploy action
  • I adjusted the file path for the ‘deployAnalyticsPlatform.bicep’ file from root to the src/bicep directory.
  • I also modified the trigger to only fire when a push is done to the main branch. That prevents us running the pipeline twice, once for the pull request, then again when the merge us run.

So we learn ๐Ÿ˜‰

Re-calibrating

Quick update to the code. Run through the GitHub flow again and we are back in business. When we navigate into the Action we can see a bunch of information. Why the workflow was triggered. What’s the status. Which jobs are running.

In flight

When we click on the job, we can drill into the runner logs as well. This helps a bunch in debugging workflows. An example, is that we have a property for the storage account which is read-only:

Only reading errors

The deployment succeeded though which I think is great progress!

Deployed with action!

That’s it! All done. Explored GitHub actions. Created service principals and GitHub secrets. Learnt some YAML. Broke and fixed our workflows. Then successfully deployed resources from Bicep code.

๐Ÿœ

P.S. A really simple video that also helped me rapidly establish some key points quickly was: GitHub Actions Tutorial – Basic Concepts and CI/CD Pipeline with Docker

End-to-End Analytics Platform – Bicep What-If deployment

This is part of my series on learning to build an End-to-End Analytics Platform project.

TLDR; After I refactored my code to use modules I found that Bicep supports ‘What-If’ operations which explain what the code is going to do before deploying it. This post I do a short test on that. Found an issue not showing Azure Synapse resource creation. Then browsed the Bicep GitHub repo to search issues related to What-If operations. Didn’t find what I was hoping for, so logged my first public GitHub issue ๐Ÿ˜.

Update: The issue we encountered seems to be related to another preflight improvement which is being worked on but is a “…bit of a gnarly, low level issue so please be patient ๐Ÿ™‚. I was amazed to see how quickly Bicep the team responded on this.

What happens when I push this button? ๐Ÿค”

So after my previous post on factoring in some Bicep best practices for code reuse I noticed that Bicep supports ‘What-If’ operations.

az deployment sub create --name '<name of deployment>' --location '<location name>' --template-file '<path to bicep file>' --confirm-with-what-if

Side note: I had to change the VS Code theme to save us all from the agony of lime green on light grey background reading.

What’s nice is we get a breakdown of changes that we are about to apply to our environment. I think that is awesome.

Terminal output of a Bicep deployment what-if operation.
I have one question. Explosions? ๐Ÿงจ

Yes, for the eagle-eyed reader, I realised my storage account name is an Azure Region name hahaha ๐Ÿ˜‚

Looking at the terminal output, reading top to bottom, I can see:

  • We are about to deploy at the subscription scope.
  • We are deploying a Azure Data Lake Gen2 Storage Account with blob container and all their configuration goodness.
  • We are deploying an Azure Synapse… wait a minute…

What was weird was that I didn’t see the Synapse Workspace. I checked the deployment details/output and it was there.

Azure portal deployments screen.
Deployed

I wondered if the reason it didn’t output the Azure Synapse Resource during the What-If was because I didn’t define any output variables for it which I did for the storage account.

Bicep output code.
Putting more out.

I updated my variables, added output variables for my synapse.bicep module, then ran the What-If again. Aaaaand…. nothing changed. Considering Bicep is an Open Source project on GitHub we get to search for issues with ‘What-If’ operations. So, we get to create a issue ๐Ÿ˜ Taking the things learnt over the past few posts on

GitHub issue summary.
de bug ๐Ÿ›

That’s it. Done. Created our first public issue: what-if operation doesn’t seem to include all bicep defined or created resources ยท Issue #3682 ยท Azure/bicep (github.com).

The what-if behaviour doesn’t block us at this stage. The deployment works so at this point I think we are set for the next section to work on getting this into a GitHub Actions pipeline.

๐Ÿœ

Infrastructure as Code (IaC) reuse

Photo by RODNAE Productions from Pexels

This is part of my series on learning to build an End-to-End Analytics Platform project.

TLDR; I made improvements to the Infrastructure as Code from the previous post by following best practices and promoting code reuse. Continued with parameters, but extended the code with scopes, modules, variables, functions, operators, and outputs. There is a list of Bicep best practices that is worth looking into.

Divide and conquer

We can use modules to group a set of one or more resources to be deployed together. We can reuse modules for better readability and reuse. They basically get converted to nested ARM templates from what I understand.

The first part that I want to move int a module is the data lake storage account and resolve dependencies. When that’s done, repeat the process for the other resources that we want to deploy.

Moving day.

Next up, update modules to use parameters and variables where possible to avoid hard coded values. We should be in a position where the module is bit of code that can be called with a set or parameters. Note that when the resources are in the same file, you can reference them directly. An example from my previous post was were I reference the datalake resource.

Same file resource references.

A module only exposes parameters and outputs to other Bicep files. When we move the data lake resource creation to a module, we need to leverage outputs which can then be passed between modules. The idea is to call a module -> deploy the resource -> output important things -> pass those things to another module as input parameters. So, the same property I referenced before now becomes an output in the module of the storage account:

Output for output.

Output variables can now be used in the main script as inputs to another module, etc. We just reference them using the module.output syntax.

Outputs as inputs.

We use operators in our deployments for things like conditional deployments.

On one condition.

Expanding on the use of parameters and variables, functions are a great way to drive flexibility and reuse into your deployments. Getting runtime details, resource references, resource information, arrays, dates, and more. Just remember most work at all scopes, some don’t. When they don’t you will probably figure that out with errors. One way to use them is to inherit the resource group location during resource deployment. In our case, setting variables with the resource group location, appending a deterministic hash string suffix for the storage account name from the resource group, or even enforcing lower case of names then using the variables for deployment.

Variables and functioning captain ๐Ÿ‘ฉโ€โœˆ๏ธ

FYI, the weird looking string notation ‘${var}‘… that’s call ‘string interpolation‘. Pretty simple compared to other ways I’ve had to write parameterised strings before with all kinds of place holders, parameters, and functions. I like!

As a good practice we use parameter decorators to control parameter constraints or metadata. Things like allowed values, lengths, secure strings, etc.

Prettier.

What we do next in our main deployment file is to change the scope. That way we can deploy at the subscription level which let’s us create resource groups in bicep instead of the Azure CLI which we did in the previous post.

Scoping things out ๐Ÿ”ญ

Note: It’s preferred in most cases to put all parameters/variables at the top of the file.

One other point of interest is that when we change the scope, our module to deploy resources error because they can’t be deployed at the subscription level only the resource group level. Make sense. So we need to change their scope in deployment.

Scope inception.

Polishing up the current solution with these practices was good learning. I continued with the approach across all modules and files. Then ran a few tests to make sure the resources deploy as expected.

That covers it off for this post. What I think we will do next is work on setting up a CI/CD pipeline in GitHub to build and deploy these resources into Azure.

๐Ÿœ