Telemetry logging to InfluxDB and Grafana on Raspberry Pi

This is part of my series on learning to build an End-to-End Analytics Platform project.

TLDR; After a suggestion from a friend (Waylon Payne LinkedIn) we start learning about time series database InfluxDB. We set up InfluxDB on the Raspberry Pi by creating a database. Work on getting Grafana installed and running. We write, troubleshoot, and learn a bunch logging data to InfluxDB. Finally we create a dashboard in Grafana to display on Sense HAT telemetry.

Begin the Influx 🌌

Last time we tackled writing out SenseHAT readings to a csv on the Pi. Now though, we level up by working on writing that data to a database more suited for streaming log data, in this case InfluxDB.

The InfluxDB documentation has a section for installing on a Raspberry Pi. Now while I was discussing this with my friend, he suggested before strolling down the path of flashing a new OS I should read this article Installing InfluxDB & Grafana on Raspberry Pi. Skipping step 0 is the plan in our case. Another post for references was Datalogger example using Sense Hat, InfluxDB and Grafana. Big thanks to Simon Hearne and Circuits.dk, their posts really helped guide my thinking even though I chose to do things a little differently. 😎

First up, some updates.

sudo apt update
sudo apt upgrade -y
bash terminal apt update
Ah yes.. updates..

Updates ran pretty quickly. The next part is getting the InfluxDB packages.

wget -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -
source /etc/os-release
echo "deb https://repos.influxdata.com/debian $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/influxdb.list

A few things here to learn from the previous code snippets:

  • apt – Is a command line package/software management tool on Debian (Debian Wiki) like search, installation, and removal.
  • etc directory – Holds core configuration files. Found a nice Linux directory structure for beginners post.
  • wget – Is a command line package/software for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols (Debian Wiki).
  • tee – Reads the standard input and writes it to both the standard output and one or more files (GeeksForGeeks).
  • source – Reads and execute the content of a file (GeeksForGeeks).

Best I can understand at the moment is that we get and store a public key that allows us to authenticate/validate the InfluxDB package when we download it by echoing the latest InfluxData stable release package and adding the package deb record to the source.list directory in a new file which seem to allow apt-get to pick up future updates. That kicked off the install of InfluxDB. How do we know? The console says so..

bash terminal InfluxDB installation
The influx begins!

It installs InfluxDB version 1.8.9 (not 2.0 which is the latest at the moment). Keep that in mind when working with documentation. Upgrading to 2.0 we can leave for the future. Onward!

sudo systemctl unmask influxdb.service
sudo systemctl start influxdb
sudo systemctl enable influxdb.service

More things to learn:

Found the command to check a service status with the –help switch for the systemctl command. These all feel reasonably familiar coming from working a little with PowerShell and the Windows terminal.

sudo systemctl --help
sudo systemctl status influxdb.service
bash terminal InfluxDB service status check
Running, running, running.

The service is up, active, and running. That means we should be able to connect to it. We can do that by logging into the Influx CLI from the terminal. Then creating a database. Creating a user. Finally, granting the user permissions.

influx

create '<yourdatabase>'

use '<yourdatabase>'

create user '<yourusername>' with password '<yourpassword>' with all privileges

grant all privileges on '<yourdatabase>' to '<yourusername>'
bash terminal Influx CLI

Ah familiar territory! A database! Now we have:

  • A database service running.
  • A database created.
  • A user that has more than enough permissions to interact with the database.

Grafana

We want a way to visualise the telemetry that’s going to be written into the database. Grafana gives us the ability to create, explore and share all of your data through beautiful, flexible dashboards and we can run the service on the Pi. We’re taking the same approach as we did for InfluxDB to get Grafana up and going. Getting all the packages, installing them, running updates, and validating the services.

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

sudo apt update && sudo apt install -y grafana

sudo systemctl unmask grafana-server.service
sudo systemctl start grafana-server
sudo systemctl enable grafana-server.service

sudo systemctl status grafana-server.service
bash terminal Grafana service status check
So much fitness 🏃

Once the service is up we should be able to connect to it on port 3000.

Grafana login page

Yay! We’re connected. Let’s log in with the user name and password ‘admin‘ then reset the password. After the login process is done, we’ll land on the homepage for our Grafana instance running on the Pi. I’m actually excited about this 😄.

We need a way to connect Grafana to our InfluxDB database. On the Home page is a ‘Data Sources‘ tile which we can follow to add a data source.

Grafana home page
Source of the action 💥

We can use the search box to lookup a connector for InfluxDB. Once we have that we just select it.

Grafana data source search page
Search the unknown 🔮

From there we configure the settings for the connector.

Grafana data source settings
The way through the mountains 🏔️

Credentials to make our way into the database.

Grafana data source credential settings
You shall pass 🧙‍♂️

Finally, we save and test the connector to make sure its all working.

Grafana source page connection test
Green.. Green is good 🟢

Good news. The scaffolding is in place. Now we need to get data into the database then configure some dashboards.

Bilbo Loggings 🪵

“If ever you are passing my way,” said Bilbo, “don’t wait to knock! Tea is at four; but any of you are welcome at any time!”

– Bilbo Baggins

Did someone say tea? Time for a spot of IoT! Now to get our IoT device capturing data. A quick swish of our telemetry logging code and we have a starting point. All we need to do is figure out how to log data to InfluxDB not the csv.

There is a library for working with InfluxDB hosted on PyPi. We need to download and install the packages locally. Remembering to use our Python virtual environment. Though this time we are going to give the VS Code Python environments integration the spotlight. Invoke the Command Palette Ctrl + Shift + P. Start typing and select the option for “Python: Select Interpreter“.

VSCode command palette
Lost in translation

Look what VS Code recommends…

VSCode command palette Python interpreter selection
Recommended interpretation

Our virtual environment. It’s so smart. I think it reads my blog drafts 🤣. After we chose the interpreter, VS Code switches context to our virtual environment context. It even reminds us in the status bar at the bottom of the window. So thoughtful!

VSCode status bar
Our friendship will continue.

🚧 Slight detour 🚧

Slight digression from our regular blog flow...

Now, if you are briskly following along and haven’t switched to the venv in the terminal, then you will probably run into an error like this. Which might lead you on a wild goose chase across fields of learnings and wonder.

bash terminal error installing InfluxDB Client
Goose chase begins 🦢

Naturally, we go looking to see if we find any packages for influxdb-client:

pip search influxdb-client

Which the PyPI XMLRPC API politely, in a crimson message, lets us know things are not as peachy as we hoped:

bash terminal error downloading InfluxDB client
⚠️!Unmanageable load!⚠️

Fault: <Fault -32500: “RuntimeError: PyPI’s XMLRPC API is currently disabled due to unmanageable load and will be deprecated in the near future. See https://status.python.org/ for more information.”>

After updating the pip version in the venv, we check on the status which the error message suggested. After a very interesting read, a smidge of despair, hope emerges… I said to myself, “Self, why does that terminal not have the venv prefix?“. That’s when I realised the true source of the problem. Me. I forgot to activate the venv 😅.

For the terminal we still need to activate the virtual environment. To do this on the Pi we can run:

source <yourvenv>/bin/activate
bash terminal activate Python venv
Activates PEBKAC fix to end goose chase 🦢

Behold!!! It lives!!

When we do that, the terminal actually changes a little, giving us a visual cue that we are in a Python virtual environment. Now to get supporting packages installed so that we can write Python code for InfluxDB. Taking a look at the InfluxDB Client Python GitHub Repo or the influxdb-client PyPI project.

pip install influxdb-client
bash terminal installing InfluxDB client
Goose = chased! 🦢

Sometimes Python can’t catch the programmer being the error.

~ me

🚧 Slight detour ends 🚧

The packages are installed. The logging can begin. To get started we need to import the influxdb_client into our project.

from sense_hat import SenseHat
from datetime import datetime
from influxdb_client import InfluxDBClient, Point

Yet again, we find a pebble in our shoe… While trying to import the sense_hat library modules in the REPL an error presented itself which seemed related to the way the numpy library was installed.

Python terminal numpy error
numpy dumpty had a great fall 🤣

The error message helps a ton! Jumping over to the common reasons and troubleshooting tips helps with options to solve the issue on a Raspberry Pi, if we use our original error “libf77blas.so.3: cannot open shared object file: No such file or directory“. I opted for the first option to install the package with apt-get:

sudo apt-get install libatlas-base-dev

Then tried to enter the python REPL again (just typing “python” while in the terminal) and importing the SenseHat module.

Python terminal module error
No one by that name here..

I am beginning to feel like a module hunter 🏹. I tracked down a RaspberryPi thread which led me to a comment on a GitHub issue for the RTIMU module error. To be clear, this doesn’t seem to be an issue when I am running in the global Python scope. Only an issue in the virtual environment. The folks were kind enough to provide a way to install this with a pip command. Here we go:

pip install rtimulib
Python terminal module successful installation
Sparks joy ✨

Yes! It works!

I tried initially to write a Python function that would write to and query the database. It wasn’t long before I ran into an error trying to connect to the database using the Python function.

Python error message Flux query service disabled.
Flux capacitor disabled ⚡

For v1.8 it’s disabled and we need to enable it. To edit the file we can use nano a Linux Command Line text editor.

[http]

  # ...

  flux-enabled = true

  # ...
sudo nano /etc/influxdb/influxdb.conf

Which opens the file in nano for us to edit.

Editing influxdb config in nano
A little text editing

The settings are changed. To bring them into effect we need to restart the service.

sudo systemctl restart influxdb.service
bash terminal influx service start
Starting the engine

No much to go on. I found what seems to be a potential workaround. There is a comment further down in this thread on InfluxDB not starting that talks about adjusting a sleep setting for a start up file. Worth a try. Using nano again, we open the file and make the change.

Editing influxdb systemd start sh file in nano
bash terminal influxdb service status check
Running, running, running 🏃

Time to write the code that will log records to our database. The idea is simple. Run a loop. Every few seconds get a Sense HAT reading. Log the reading to our InfluxDB. Stop the loop when we interrupt the program.


from sense_hat import SenseHat
from datetime import datetime
from influxdb_client import InfluxDBClient, Point

timestamp = datetime.now()
delay = 15
sense = SenseHat()
host = "localhost"
port = 8086
username = "grafanabaggins"
password = "<NotMyPrecious>"
database = 'shire'
retention_policy = 'autogen'
bucket = f'{database}/{retention_policy}'

def get_sense_reading():
    sense_reading = []
    sense_reading.append(datetime.now())
    sense_reading.append(sense.get_temperature())
    sense_reading.append(sense.get_pressure())
    sense_reading.append(sense.get_humidity())
    return sense_reading

# This method will log a sense hat reading into influxdb
def log_reading_to_influxdb(data, timestamp):
    point = ([Point("reading").tag("temperature", data[1]).field("device", "raspberrypi").time(timestamp), Point("reading").tag("pressure", data[2]).field("device", "raspberrypi").time(timestamp), Point("reading").tag("humidity", data[3]).field("device", "raspberrypi").time(timestamp)])
    client = InfluxDBClient(url="http://localhost:8086", token=f"{username}:{password}", org="-")
    write_client = client.write_api()
    write_client.write(bucket=bucket, record=point)

# Run and get a reading Forrest
def run_forrest(timestamp):
    try:
        data = get_sense_reading()
        log_reading_to_influxdb(data, timestamp)
        while True:
            data = get_sense_reading()
            difference = data[0] - timestamp
            if difference.seconds > delay:
                log_reading_to_influxdb(data, timestamp)
                sense.show_message("OK")
                timestamp = datetime.now()
    except KeyboardInterrupt:
        print("Stopped by keyboard interrupt [CTRL+C].")

I struggled for a while trying to figure out the bucket/token variable to what I was able to do in the 1.8.9 CLI easily. I revisited the Python client library and noticed a specific callout for v1.8 API compatibility which has an example that helped me define the token. It wasn’t long before we got the script running and data was being logged to the database.

We’re getting there!

To the Shire

Before we get started on logging data to the database we need to understand some key concepts in InfluxDB. It won’t be the last time I visit that page, these concepts are foreign to me. I learnt to use InfluxQL which is a SQL language to work with the data. There are some differences between Flux and InfluxQL that you might want to keep in mind. I had a tricky time figuring out how to execute Flux queries initially after I wasn’t getting any data back from my Flux commands in a Python function, but saw that you could invoke a REPL to test queries with). To keep things simple though, I opted for InfluxQL. We can launch the Influx CLI from the terminal and query our data.

influx

SHOW DATABASES

USE <database>

SELECT * FROM <table>
Influx queries returning results
Successfully captured! 🪤

Let’s see if we can build a dashboard to visualise the data we are logging. We can connect to our Grafana server again. Head to the home page. There is a “Explore” menu item that is a quick way for us to query our data and experiment. Once the window opens up we select our data source connection from the drop down box and begin building a query with a wonderfully simple interface.

Visual query building 🤩

It’s at this point we realise that our logging design might not be correct. What I was expecting was that I could use the columns in the SELECT and WHERE clauses. Apparently not. I initially thought that design would work better because I understood that tags were indexed, not fields, so querying the tags would be faster. Good in theory, but I couldn’t reference the tag in the SELECT and WHERE clauses. My initial mental model needed tweaking. A change to the logging function to log to a single point, not three, with multiple fields.

So this:

point = ([Point("reading").tag("temperature", data[1]).field("device", "raspberrypi").time(timestamp), Point("reading").tag("pressure", data[2]).field("device", "raspberrypi").time(timestamp), Point("reading").tag("humidity", data[3]).field("device", "raspberrypi").time(timestamp)])

Changed to this:

point = ([Point("reading").tag("device","raspberrypi").field("temperature", data[1]).field("pressure", data[2]).field("humidity", data[3]).time(timestamp)])

Minor InfluxDB management needed in future to clean up the database. For now though, we have our ‘frodologgins‘ database which is empty. I ran the logging function against the new database and…

* Chef’s kiss *

It works as expected! A quick updated to the Grafana connection settings to switch to the new database. With the updates in place we now get the expected results in the drop down. We can see the fields we want to display and chart.

We can try reconcile the point, tags, and field in the Python code to how we are querying it with InfluxQL. Slowly sharpening our mental model and skills. The query reads as follows:

  • From our database
  • Query our readings for the default/autogen retention policy
  • Where the device tag value is raspberrypi
  • Return the last temperature field reading
  • Group by ten second intervals

One thing I wasn’t quite sure of is the way that the time range worked in Grafana with the data logged in the database. The query looked correct but no data was returned. I was looking at a window from now-1d to now initially. It seemed logical to me, “find me all the data points from yesterday to now“. The Inspector in Grafana helps get the query and then we can use that to run the query in the Influx CLI to test the queries.

Inspector Clouseau 🕵️

I eventually adjusted that to now to now+1d which in my mind is “back to the future” 🔮🚗, but it worked. I think this comes down to how the dates are stored (i.e. timezone offsets) and the functions evaluation. I’ll dig into that later, for now this works, we have data showing on a graph.

Grafana explore graph sample
Graph

Let’s take the learnings and apply it to building the dashboard. Head to the home page. There is a “Dashboards” tile we can use to build our first dashboard.

Grafana home page
Dash lightning! 🌩️

It opens up a new editing window. I chose an empty panel. From there we can edit the panel in a similar way to what we did with the Explore window. In the upper right corner we can choose the type of chart.

Grafana explore chart type selection
Serious time ⏳

There are a bunch of options from changing the charts, adjusting threshold values for the gauges, applying units of measure, and so much more. For our case that’s “Time series“.

That’s it! Use the same approach to build out the other charts. I added “Gauge” visuals as well with the corresponding query.

Grafana final dashboard displaying readings
Ice, Ice, Baby 🧊

Learnings 🏫

We made it! It took a while but we did it. Failure is a pretty good teacher. I failed a bunch and learnt a more. That’s not wasted time. It’s worth just getting hands on and trying different things out to build the mental model and skills. I have a long way to go to really understand Python, InfluxDB, Grafana, and Linux but I’ve made progress and learnt new things which is a blessing.

Until next time.

🐜

Logging SenseHAT telemetry on the Raspberry Pi

This is part of my series on learning to build an End-to-End Analytics Platform project.

TLDR; I’ve been working on upping my Python game 🐍. This post we got started by creating a Python virtual environment. Then built a sense HAT data logger with the Raspberry Pi to write the readings to a local file on the Pi.

Going virtual 💾

If you are wondering how I am writing code remotely on the Pi, go check out setting up remote development on the Raspberry Pi using VS Code. We are using the same approach here to get connected and working on our Pi.

Part of this journey is growing my skills. I chose Python as a programming language. Not diving into too many details. It just gives a range of capabilities (web through to machine learning) with a single language. No need to switch too much while learning all the techs in this series. Works for me.

While upping my Python game I came across something called virtual environments. A little primer on virtual environments. I think I have a reasonable grasp on how to start using them for better package management.

Not going full virtualenvwrapper yet though. “Hey, I just met you and this is crazy, but here’s my bookmark, browse it later maybe. #justsayin’

To that note, let’s set up a virtual environment. First, check our Python versions on the Pi:

python3 --version
Snake in eagle shadow 🐍🦅

We have Python3 installed on the Pi. That means we should have the venv capability built-in. Let’s give it a whirl!

python3 -m venv noobenv
Environment cultivated

When we do that a new folder gets created in our repo. It has a bunch of folders and files related to the “inner workings” of how virtual environments work <- science 👩‍🔬.

Logging the things 🪵

The goal here is that we have an IoT device that is capturing data from the sensors. It has a bunch of sensors we are going to use, which is exciting. Honestly, the more I work with it, the more amazing it is to me.

from sense_hat import SenseHat
from datetime import datetime

sense = SenseHat()

def get_sense_reading():
    sense_reading = []
    sense_reading.append(datetime.now())
    sense_reading.append(sense.get_temperature())
    sense_reading.append(sense.get_humidity())
    sense_reading.append(sense.get_pressure())
    sense_reading.append(sense.get_orientation())
    sense_reading.append(sense.get_compass_raw())
    sense_reading.append(sense.get_accelerometer_raw())
    sense_reading.append(sense.get_gyroscope_raw())

    return sense_reading

We create a function (get_sense_reading) that we can call repeatedly. Then use the SenseHat functions (e.g. get_temperature) to get readings from the different sensors. To get them all in a single object/row, we can use a list (sense_reading). Then append each reading to the list. Once we have them, we return the list object.

Witness the quickness ⚡

We add a for loop to our code to call the get_sense_reading function a few times are print the results to the terminal window. We can run the program (main.py) by calling the Python 3 interpreter and passing the file name to it. That loads the code, executes the loop, prints the results.

python3 main.py

Now to add data to a file on the device. We’ll use a CSV for now, then adapt it later based on our needs. We can use the the sense_reading object returned by the get_sense_reading function and write that to the file using the csv library.

from csv import writer

timestamp = datetime.now()
delay = 1

with open("logz.csv", "w", newline="") as f:
    data_writer = writer(f)
    data_writer.writerow(['datetime','temp','pres','hum',
	                  'yaw','pitch','roll',
                      'mag_x','mag_y','mag_z',
                      'acc_x','acc_y','acc_z',
                      'gyro_x', 'gyro_y', 'gyro_z'])

    while True:
        data = get_sense_reading()
        difference = data[0] - timestamp
        if difference.seconds > delay:
            data_writer.writerow(data)
            timestamp = datetime.now()

We start with a timestamp because we want to calculate a delay interval, say 1 second, between writes to the file. The open the file and write a header row (writerow) to the file. We use a while loop to collect readings, then once we exceed the delay interval, write the row to the file. We need to update the timestamp otherwise we will write a row on every pass after we exceed the timestamp the first time.

Testing seems to be working and we can log data to a file on the device. VS Code integrated terminal really is fantastic at running multiple and side by side shell/terminal windows.

A tail of two terminals 💻

Awesome! It works. We have a program logging data to a csv file at a defined interval. Tail simply prints end of file content. A few lines is all we need to double check things are working. Last thing left.. shut down the Pi remotely. Usually I would use a shutdown command. I gave a new command a try “Halt”.

sudo halt
Halt! Who goes there? 🛑

Looks like that worked 🙂 The connection got terminated and VS Code detects that, and tries to reconnect. Pretty slick. We managed to start putting new Python skills to use. Learnt how to create a virtual environment for better package management. Then collecting and writing telemetry from the SenseHat to local storage on the Pi.

That’s it for now.

🐜

VS Code Setting Remote Development on Raspberry Pi

This is part of my series on learning to build an End-to-End Analytics Platform project.

TLDR; The goal is to work remotely on the Raspberry Pi. We added the VS Code Remote Development extension pack. Used the Remote – SSH extension which is part of the pack to connect to the Pi remotely over the network. Set up set up key based/passwordless SSH authentication. Then to remember the host we added an entry to our SSH config file through VS Code. Finally, got started with the Sense HAT and wrote some code to do stuff on the Pi!

Working remote 🧑‍🌾

We want to work remotely on the Pi (the artist formerly known as Raspberry Pi 🕺). Sticking with VS Code, there is an extension to help us do remote development:

VS Code remote extension text.
Remote extensions

This Remote Development extension pack includes three extensions:

  • Remote – SSH – Work with source code in any location by opening folders on a remote machine/VM using SSH. Supports x86_64, ARMv7l (AArch32), and ARMv8l (AArch64) glibc-based Linux, Windows 10/Server (1803+), and macOS 10.14+ (Mojave) SSH hosts.
  • Remote – Containers – Work with a separate toolchain or container based application by opening any folder mounted into or inside a container.
  • Remote – WSL – Get a Linux-powered development experience from the comfort of Windows by opening any folder in the Windows Subsystem for Linux.

Now the one we are after for the time being is the Remote – SSH extension which allows me to connect to the Pi over SSH. It’s not as simple as that though. Look at the architectures supported. We need to make sure our Pi has one of those architectures. Looking at the extension documentation we can see it supports: ARMv7l (AArch32) Raspbian Stretch/9+ (32-bit).

When we previously SSH into the Pi we get a glimpse of the architecture version:

Command line text with Raspberry Pi version.
Version dePi

To get the OS version we can use the cat command to find the release information:

cat /etc/os-release
Command line text with Raspberry Pi version.
Buster Pi

We have a supported configuration. Let’s try to connect remotely via SSH to our Pi with VS Code using our new extension pack. Open the Command Palette and type “remote-ssh”. Look for the “Connect to Host…” option:

VS Code command prompt with text.
Contact 📡

Then select the platform for the host. In our case Linux. Raspbian, the Pi-specific OS, is a Linux distribution based on Debian. Fill in the login credentials to finish establishing a connection.

One thing I spotted was the prompt in the bottom right corner of VS Code which was connecting via SSH, nice, and “Installing VS Code Server”. Wait, what? Which looking at the architecture, seems related to the way VS Code Remote Development works. A bit more than I am going to dig into here:

VS Code status bar with installation prompt.
Deploying

Eureka! We are connected remotely to the Pi! Take a look at the VS Code status bar in the bottom left. Notice that it says “SSH:<ip of Pi>” and the Terminal window is a bash shell running connected to the Pi.

VS Code terminal with remote SSH connection.
Ssh.. Terminal likes Pi too

Now that we are connected remotely to the Pi. Let get’s started with the Sense HAT. First things first, software updates. I am learning about Linux as I go here. Standard users by default aren’t allowed install applications on a Linux machine. To update the software we need to escalate privileges. The “Run as Administrator” in Linux terms seems to be “sudo“. I’m team “super user do” just sounds epic. Then apt-get update/upgrade to invoke the package handling utility to update or install the newest version of packages on the system.

sudo apt-get update
sudo apt-get upgrade
VS Code terminal with installation status messages.
Yep.. updates

I ran both commands (only one shown for brevity). They ran like a charm 🍀. The upgrade pulled all it needed, created a few diversions, seemed to unpack it’s bags, set itself up, and process what just happened. I don’t know about you, but our Pi seems to have gone through a big phase in it’s life 🤣. Our Software is updated. Now to install the sense HAT software:

sudo apt-get install sense-hat
VS Code terminal with package status messages.
Already the new and shiny

Nice! Our sense HAT software is up to date. Let’s start writing some code. What is awesome though is that there is an online emulator (trinket.io) that you can use to code the sense HAT in your browser if you don’t have one! Next up, we figure out how to set up new directories for the code with mkdir:

mkdir repos
cd repos
mkdir sense-hat-noob
cd sense-hat-noob

Now once that’s done, VS Code can pick up the new folder and we can open it using the regular “Open Folder” option in the “File” menu:

VS Code command prompt with filepath text.
The Pi files

Short detour here. You might keep getting prompted for the user password. That got annoying fast so I set up key based/passwordless SSH authentication. Then to remember the host we can add an entry to our SSH config file through VS Code.

VS Code SSH target extension.
Remember me

Now to add a file and write some code. To do that we are using the touch command. The moment we do that, VS Code noticed it’s a Python file extension and popped-up to ask if we want to install the recommended Python extensions.

VS Code remote Python extension text.
Why did it have to be snakes? 🐍🤠

What is interesting is that it suggests the extension should be installed on the Pi 🤔 Again, this is related to the architecture for remote development. I tried not smiling about this, but apparently this extension has preferences…

VS Code remote Python host extension text.
There not here

Okay. We have reasonably secure remote connectivity, remote extensions, code file, no we need some code. Thanks to the new remote extension we have IntelliSense(Pylance) running:

VS Code Python Pylance intellisense.
I sense intellisense

I wrote some code to display a message on the LED face. There are a bunch of parameters for the function, let’s keep it simple here:

from sense_hat import SenseHat
sense = SenseHat()
sense.show_message("Hello world")

Viola! This sparks joy! The Pi displays it elation! I got a snap of the exclamation marks scrolling across the screen (pretty cool catching the one LED off as the characters move across the face).

Raspberry Pi Sense HAT led display with exclamation marks.
Spark Joy

That’s it for this one. Tons of new learning experiences. We are making progress! If you have any ideas or suggestions on things that can be done better. Let me know.

Until next time, take care.

🐜


Infrastructure as Code (IaC) reuse

Photo by RODNAE Productions from Pexels

This is part of my series on learning to build an End-to-End Analytics Platform project.

TLDR; I made improvements to the Infrastructure as Code from the previous post by following best practices and promoting code reuse. Continued with parameters, but extended the code with scopes, modules, variables, functions, operators, and outputs. There is a list of Bicep best practices that is worth looking into.

Divide and conquer

We can use modules to group a set of one or more resources to be deployed together. We can reuse modules for better readability and reuse. They basically get converted to nested ARM templates from what I understand.

The first part that I want to move int a module is the data lake storage account and resolve dependencies. When that’s done, repeat the process for the other resources that we want to deploy.

Moving day.

Next up, update modules to use parameters and variables where possible to avoid hard coded values. We should be in a position where the module is bit of code that can be called with a set or parameters. Note that when the resources are in the same file, you can reference them directly. An example from my previous post was were I reference the datalake resource.

Same file resource references.

A module only exposes parameters and outputs to other Bicep files. When we move the data lake resource creation to a module, we need to leverage outputs which can then be passed between modules. The idea is to call a module -> deploy the resource -> output important things -> pass those things to another module as input parameters. So, the same property I referenced before now becomes an output in the module of the storage account:

Output for output.

Output variables can now be used in the main script as inputs to another module, etc. We just reference them using the module.output syntax.

Outputs as inputs.

We use operators in our deployments for things like conditional deployments.

On one condition.

Expanding on the use of parameters and variables, functions are a great way to drive flexibility and reuse into your deployments. Getting runtime details, resource references, resource information, arrays, dates, and more. Just remember most work at all scopes, some don’t. When they don’t you will probably figure that out with errors. One way to use them is to inherit the resource group location during resource deployment. In our case, setting variables with the resource group location, appending a deterministic hash string suffix for the storage account name from the resource group, or even enforcing lower case of names then using the variables for deployment.

Variables and functioning captain 👩‍✈️

FYI, the weird looking string notation ‘${var}‘… that’s call ‘string interpolation‘. Pretty simple compared to other ways I’ve had to write parameterised strings before with all kinds of place holders, parameters, and functions. I like!

As a good practice we use parameter decorators to control parameter constraints or metadata. Things like allowed values, lengths, secure strings, etc.

Prettier.

What we do next in our main deployment file is to change the scope. That way we can deploy at the subscription level which let’s us create resource groups in bicep instead of the Azure CLI which we did in the previous post.

Scoping things out 🔭

Note: It’s preferred in most cases to put all parameters/variables at the top of the file.

One other point of interest is that when we change the scope, our module to deploy resources error because they can’t be deployed at the subscription level only the resource group level. Make sense. So we need to change their scope in deployment.

Scope inception.

Polishing up the current solution with these practices was good learning. I continued with the approach across all modules and files. Then ran a few tests to make sure the resources deploy as expected.

That covers it off for this post. What I think we will do next is work on setting up a CI/CD pipeline in GitHub to build and deploy these resources into Azure.

🐜

Azure Environment Setup

So to kick things off we need to put together an environment in Azure. We need setup some scaffolding to support our efforts to build interesting things. What we are looking for is something quick that gives us flexibility in our deployment, integration, and management of resources. The Azure hub-spoke topology. This works great. It’s easy to deploy, extend, manage, and maintained. It’s used in large enterprise deployments right the way down to our lab. It’s a Swiss Army Knife architecture and perfect for learning. Trying new services or features just means spinning up a new spoke, deploying resources, and configuring them mostly without breaking the other spokes. Things get tricky with services that need specific integration configurations, possible to do, just tricky.

For the time being we keep it really simple. Hub and Spoke. Manually create a hub resource group and a single spoke. This will change as we explore other services and solutions. To do this, here is the thinking:

  • Create a hub
    1. Create a hub resource group
    2. Deploy a virtual network to the resource group with a default address space and subnet.
    3. Create a virtual network gateway with it’s own subnet.
    4. Set up point-to-site VPN connectivity, which works for a single or few clients.
  • Create a spoke
    1. Create a spoke resource group
  • Get a refreshing beverage and peruse the interweb for our first project and architecture fit.

The hub

This is the entry point for my hybrid network traffic. What I don’t cover here is the subnetting, that’s an exercise for the reader 😁 Getting started is the game right now, so I have this:

  • Virtual network
    • Subnet: default
    • Subnet: gateway (hosts the gateway for our Point-to-Site VPN setup)
  • Virtual network gateway
  • Public IP address

Looking at extending this at a later stage to include subnets and resources like an IaaS jump server or Azure Firewall.

The spoke

A spoke holds resources I think work simply together. Resource groups are used to group related to each other or that you want to manage together. An example of this would be deploying an Azure Synapse Analytics workspace to a spoke with it’s supporting services (e.g. Key Vault, Azure Data Lake Gen2, etc.). The spoke allows us to do just that.

Looking ahead

Considering our usage, data classification, budget, the current architecture should be good for now. Better to start and learn than being paralyzed by analysis in this case. Things might change things as we learn. Looking ahead Azure DevOps or GitHub integration will be on the map as well. Though it does make for easier learning doing things manually to understand it, get things moving, then automate all the things 🤖. For now keep it really simple.

Development environment setup

No, I am not talking about luxurious battle station set up with StreamDecks, DSLR cameras, lighting, RGB keyboards, etc. Someday, maybe, when I am smart enough to figure out all the audio visual stuff. What I am talking about though is the setup for my development on my machine.

Couple of things I am working with:

  • Visual Studio Code Insiders Edition
  • Visual Studio 2019 Enterprise Edition
  • Azure Data Studio
  • SQL Server Management Studio
  • Docker Desktop (running under WSL2, not that I needed WSL2, I just thought it might be worth trying out)
  • Windows Terminal
  • Git for Windows

Visual Studio Code

The not so new kid on the block. I find the extensions really good. There seems to be a bunch of investment in this tool. It covers a really wide range of uses for what I do. Download it here: Visual Studio Code

Visual Studio 2019

Look I haven’t used this in a while. If I actively start working with it again, I will loop back on this. I generally don’t do much customisation with Visual Studio. I do generally make sure that I have it geared for Azure, Database Projects, and either C# or Python development. I find it does most of what I need to do out the box.

Azure Data Studio

The data sister of Visual Studio Code. Makes sense to have. Get started here: Azure Data Studio

SQL Server Management Studio

Ye old SSMS. This has been my world for the past few years as a SQL Server DBA and Consultant. Download it over here: SQL Server Management Studio

Docker Desktop

Containers are a hot topic. An area that I am looking to explore a little more. Thankfully, the team at Docker make it really easy to get up an running on my Windows machine. Get started over here: Docker Desktop

Windows Terminal

I love this thing. I was not a “command line guy”. The more I work in The Cloud, the more I find myself enjoying it. Considering I spend more time there, why not make it pretty 🦋. At this point I welcomed Scott Hanselman’s Pretty Prompt post, go check it out. Get started here: Windows Terminal

Git for Windows

Git for Windows. VSCode does have Git integrated, I have this for some other reasons.

That’s pretty much if for now. I used to use Notepad++ but I found that I can do most of what I wanted with VSCode.

Real projects

I have been working with relational databases, specifically SQL Server, for a while now. It’s not fun having a bicycle shop that loses money as the primary example database for me to play with. It doesn’t really let me work on something with a reasonable amount of data. Or even tackle a end-to-end solution, because the thing is already in the end state. For a long time I wondered how I can build something realistic to take “real” data and factor it into a solution. Not that the solution will be the greatest, but it is meant as a learning exercise. Who knows what could happen next, I see, learn from and work with very talented people everyday across the interwebz.

The plan

  1. Get a development environment set up.
  2. Pick a topic or architecture
  3. Find supporting services or data. Think public data sets.
  4. Get a source code repo going.
  5. Set up a way to track my work and manage deployments.
  6. Start building a minimal viable product and keep it going.
  7. Monitor it, optimise it, extend it, enhance it, energise it 🛸.
  8. Learns all the things.. ok maybe not all, but something..

Why?

Well, things move so quickly in the tech world today. I have a tough time keeping up. I didn’t go to university, I didn’t start out in IT (or even a inclination to be in it), and there are a bunch of other reasons why I landed the role of impostor syndrome in every job I had so far. I think there are very many people in that same boat, we might be sharing a seat. By God’s grace many people showed me mercy as I started in this industry, and many who I look up to. With this approach I get to continually build something end-to-end learning as I go. More importantly I try give back what I learn. Whether it’s setting up an IDE, learning more about source control, dealing with CI/CD issues, and expanding my skills in optimisation. This way I set up a tangible and long-lived learning experience that I pray others can use and learn from. It’s going to be raw, it’s new, it is a bit scary for me, but I hope you enjoy it.

That being said. Let’s get to it.

Want to keep track of my progress? Take a look at my project tracking.