End-to-End Analytics Platform

As part of my plan for Real Projects. I recently came across a few very interesting links for inspiration:

Now I don’t have much experience in any of those areas. It got me thinking though. What if I used a mash up of those architectures, technologies, and processes to build and learn about an end-to-end analytics platform. Source to AI/API.

So that’s the plan. I am going to work on build a project that can try bring together those services in a single project. Along the way I am thinking about not only learning about the technologies but about the processes and methodologies that surround them such as DevOps/DataOps/MLOps.

As I get my hand dirty with learning these things the code can be found here: data-gauntlet 🐱‍👤

Azure Environment Setup

So to kick things off we need to put together an environment in Azure. We need setup some scaffolding to support our efforts to build interesting things. What we are looking for is something quick that gives us flexibility in our deployment, integration, and management of resources. The Azure hub-spoke topology. This works great. It’s easy to deploy, extend, manage, and maintained. It’s used in large enterprise deployments right the way down to our lab. It’s a Swiss Army Knife architecture and perfect for learning. Trying new services or features just means spinning up a new spoke, deploying resources, and configuring them mostly without breaking the other spokes. Things get tricky with services that need specific integration configurations, possible to do, just tricky.

For the time being we keep it really simple. Hub and Spoke. Manually create a hub resource group and a single spoke. This will change as we explore other services and solutions. To do this, here is the thinking:

  • Create a hub
    1. Create a hub resource group
    2. Deploy a virtual network to the resource group with a default address space and subnet.
    3. Create a virtual network gateway with it’s own subnet.
    4. Set up point-to-site VPN connectivity, which works for a single or few clients.
  • Create a spoke
    1. Create a spoke resource group
  • Get a refreshing beverage and peruse the interweb for our first project and architecture fit.

The hub

This is the entry point for my hybrid network traffic. What I don’t cover here is the subnetting, that’s an exercise for the reader 😁 Getting started is the game right now, so I have this:

  • Virtual network
    • Subnet: default
    • Subnet: gateway (hosts the gateway for our Point-to-Site VPN setup)
  • Virtual network gateway
  • Public IP address

Looking at extending this at a later stage to include subnets and resources like an IaaS jump server or Azure Firewall.

The spoke

A spoke holds resources I think work simply together. Resource groups are used to group related to each other or that you want to manage together. An example of this would be deploying an Azure Synapse Analytics workspace to a spoke with it’s supporting services (e.g. Key Vault, Azure Data Lake Gen2, etc.). The spoke allows us to do just that.

Looking ahead

Considering our usage, data classification, budget, the current architecture should be good for now. Better to start and learn than being paralyzed by analysis in this case. Things might change things as we learn. Looking ahead Azure DevOps or GitHub integration will be on the map as well. Though it does make for easier learning doing things manually to understand it, get things moving, then automate all the things 🤖. For now keep it really simple.

Development environment setup

No, I am not talking about luxurious battle station set up with StreamDecks, DSLR cameras, lighting, RGB keyboards, etc. Someday, maybe, when I am smart enough to figure out all the audio visual stuff. What I am talking about though is the setup for my development on my machine.

Couple of things I am working with:

  • Visual Studio Code Insiders Edition
  • Visual Studio 2019 Enterprise Edition
  • Azure Data Studio
  • SQL Server Management Studio
  • Docker Desktop (running under WSL2, not that I needed WSL2, I just thought it might be worth trying out)
  • Windows Terminal
  • Git for Windows

Visual Studio Code

The not so new kid on the block. I find the extensions really good. There seems to be a bunch of investment in this tool. It covers a really wide range of uses for what I do. Download it here: Visual Studio Code

Visual Studio 2019

Look I haven’t used this in a while. If I actively start working with it again, I will loop back on this. I generally don’t do much customisation with Visual Studio. I do generally make sure that I have it geared for Azure, Database Projects, and either C# or Python development. I find it does most of what I need to do out the box.

Azure Data Studio

The data sister of Visual Studio Code. Makes sense to have. Get started here: Azure Data Studio

SQL Server Management Studio

Ye old SSMS. This has been my world for the past few years as a SQL Server DBA and Consultant. Download it over here: SQL Server Management Studio

Docker Desktop

Containers are a hot topic. An area that I am looking to explore a little more. Thankfully, the team at Docker make it really easy to get up an running on my Windows machine. Get started over here: Docker Desktop

Windows Terminal

I love this thing. I was not a “command line guy”. The more I work in The Cloud, the more I find myself enjoying it. Considering I spend more time there, why not make it pretty 🦋. At this point I welcomed Scott Hanselman’s Pretty Prompt post, go check it out. Get started here: Windows Terminal

Git for Windows

Git for Windows. VSCode does have Git integrated, I have this for some other reasons.

That’s pretty much if for now. I used to use Notepad++ but I found that I can do most of what I wanted with VSCode.

Real projects

I have been working with relational databases, specifically SQL Server, for a while now. It’s not fun having a bicycle shop that loses money as the primary example database for me to play with. It doesn’t really let me work on something with a reasonable amount of data. Or even tackle a end-to-end solution, because the thing is already in the end state. For a long time I wondered how I can build something realistic to take “real” data and factor it into a solution. Not that the solution will be the greatest, but it is meant as a learning exercise. Who knows what could happen next, I see, learn from and work with very talented people everyday across the interwebz.

The plan

  1. Get a development environment set up.
  2. Pick a topic or architecture
  3. Find supporting services or data. Think public data sets.
  4. Get a source code repo going.
  5. Set up a way to track my work and manage deployments.
  6. Start building a minimal viable product and keep it going.
  7. Monitor it, optimise it, extend it, enhance it, energise it 🛸.
  8. Learns all the things.. ok maybe not all, but something..

Why?

Well, things move so quickly in the tech world today. I have a tough time keeping up. I didn’t go to university, I didn’t start out in IT (or even a inclination to be in it), and there are a bunch of other reasons why I landed the role of impostor syndrome in every job I had so far. I think there are very many people in that same boat, we might be sharing a seat. By God’s grace many people showed me mercy as I started in this industry, and many who I look up to. With this approach I get to continually build something end-to-end learning as I go. More importantly I try give back what I learn. Whether it’s setting up an IDE, learning more about source control, dealing with CI/CD issues, and expanding my skills in optimisation. This way I set up a tangible and long-lived learning experience that I pray others can use and learn from. It’s going to be raw, it’s new, it is a bit scary for me, but I hope you enjoy it.

That being said. Let’s get to it.

Want to keep track of my progress? Take a look at my project tracking.