Loading Private Python Libraries to Alteryx at Runtime

The python tool in Alteryx is a great resource that I often use but it can easily get very cluttered with lots of reusable code that could easily be moved out and imported as a library.

Unfortunately by default you can’t just use the standard import statement since the location of your workflow at runtime is not in pythons path so it doesn’t search for libraries there.

Take this example where I have a python file called carrierpy.py located in the same directory as my Alteryx workflow. The file contains a simple function to return the dates for the start and end of the previous two week period.

import datetime
def last_week():
    today = datetime.datetime.today()
    weekday = today.weekday()
    start_delta = datetime.timedelta(days=weekday, weeks=2)
    start_of_week = today - start_delta
    end_of_week = start_of_week + datetime.timedelta(days=13)
    return (start_of_week, end_of_week)

If you try to load this library from the python tool in Alteryx (import carrierpy as cr) you’ll see this error. The python tool cannot find the library as it’s not searching the current workflow path.

Module not found

Standard ways to access the current working directory also cannot be used as they return the path to the temp location used by the Jupyter notebook, not the directory the Alteryx workflow is saved in.

Workflow Constant Solution

To allow this to work we need to use the Alteryx module that we import at the start of every python tool.

In the python tool add the following code at top of the notebook to read the current path from the Alteryx library and add this to to sys.path. Then we can load modules from the local directory without issues.

from ayx import Alteryx
import pandas as pd
import sys

# get the path of the Workflow Directory
path = Alteryx.getWorkflowConstant('Engine.WorkflowDirectory')

# append this path to the python sys.path

# import from current directory
import carrierpy as cr

dates = cr.last_week()
# initialise data of lists. 
data = {'Date':[dates[0]]} 
# Create DataFrame 
df = pd.DataFrame(data) 

Sure enough this works and if we run the Jupyter notebook we see the correct data printed (make sure you run the entire workflow first otherwise the input path will not be populated and cached).

In the Alteryx workflow messages we now see the data is correctly passed out of the python tool.

Successful output

The browse tool also shows the date being output from the python tool 🐍

Correct date output into workflow

Now you can create your own libraries of often used python functions and import them into your workflows as required ♻️

Converting xls to xlsx files in Alteryx

There are numerous threads on the Alteryx Community where people are asking how to covert xls files to xlsx. The solutions generally suggest one of two approaches, either

  • Read the file in with an Input Data tool then output again in the new format
  • Use the Run Command tool to use an an Excel component to do the conversion

In my (admittedly edge!) case the server doesn’t have Excel installed and I don’t want to pay for a license just for this. Plus the file cannot be read natively by the input tool as it requires pre-processing in python first (it’s a really weird Excel file that looks like an PDF invoice but in Excel 😬)

LibreOffice Batch Converting

My solution is to use LibreOffice. It’s open source, free to use and includes batch conversions that can be run from the command line. By default the UI will not be shown when run in this way.

The basic command to do the conversion is shown below.

"C:\Program Files\LibreOffice\program\soffice.exe" --convert-to xlsx "C:\input_path\file.xls" -outdir "C:\output_path"

–convert-to xlsx should be followed by the name of the xls file to convert and the -outdir will be used to write the xlsx files to. The files will have the same name as the original just with the new extension.

Integrating this into an Alteryx workflow is just like any other using the Run Command to run an external program.

  • Use a Directory tool to read all the xls files
  • A Formula tool to create the command line for each file
  • A Run Command to first write the flow to a batch file and then run the file

The most difficult part is configuring the Run Command. The configuration should look like below so the flow is first written to a file ending with .bat followed by the tool running this newly created batch file.

The xlsconvert.bat file that is created should look something like this with a line per conversion.

"C:\Program Files\LibreOffice\program\soffice.exe" --convert-to xlsx "C:\Customer projects\Project\Alteryx\Data\Invoice - SBRY0191928.XLS" -outdir "C:\Customer projects\Project\Alteryx\Data\"
"C:\Program Files\LibreOffice\program\soffice.exe" --convert-to xlsx "C:\Customer projects\Project\Alteryx\Data\Invoice - SBRY0192237.XLS" -outdir "C:\Customer projects\Project\Alteryx\Data\"
"C:\Program Files\LibreOffice\program\soffice.exe" --convert-to xlsx "C:\Customer projects\Project\Alteryx\Data\Invoice - SBRY0192914.XLS" -outdir "C:\Customer projects\Project\Alteryx\Data\"

Each file will be processed turn and written to the -outdir you specified in the formula tool. Voilà.

Converted files together with original

Conversion Using the Python Tool

If you prefer to use python instead of the run command tool it’s very easy to run the same command. This code will run LibreOffice in headless mode again and silently convert the files from xls to xlsx format.

The only import required is subprocess which is already available to Alteryx in the default install.

from ayx import Alteryx
import subprocess
import os
import pandas as pd

# prepare outbound data
data = {"Files":[]}

# path with files to convert (assume in Data subdirectory to workflow)
PATH = Alteryx.getWorkflowConstant('Engine.WorkflowDirectory') + "Data"

# path to LibreOffice your executable
EXE = 'C:\Program Files\LibreOffice\program\soffice.exe'

# loop files, convert and get converted filename for outputting
for file in os.listdir(PATH):
    if file.lower().endswith("xls"):
        subprocess.run([EXE, '--convert-to','xlsx',os.path.join(PATH, file),'-outdir',PATH])
        filename, file_extension = os.path.splitext(file)
        data['Files'].append(os.path.join(PATH, filename, file_extension.lower().replace("xls", "xlsx")))


I personally prefer the python method over the run command purely because I find it more flexible and easier to set up but it works the same either way.

You could of course also use this method for any other Analytics platform such as KNIME, it could also be integrated into an RPA solution using UiPath or similar tool.

Jupyter Notebook Shortcut on MacOS using Anaconda

It’s not immediately clear on Mac OS how to start a Jupyter Notebook if you’re using Anaconda.

The actual executable is located at /Users/*YOURUSER*/opt/anaconda3/bin/jupyter-notebook but it can be a pain to either type the full link or even worse start the Anaconda application first just to open a notebook.

The easiest way I found is to make a symlink in your /usr/local/bin directory (make one if it doesn’t already exist) using the following command.

$ cd /usr/local/bin
$ ln -s /Users/*YOURUSER*/opt/anaconda3/bin/jupyter-notebook 

Then you can start jupyter-notebook from anywhere in the terminal by just typing jupyter-notebook.

Alternative Method

I had to use the above solution since conda was not in my path but it seems there’s another solution that’s even simpler. Just open a terminal, navigate to your anaconda installation and run the conda with the following arguements.

$ cd opt/anaconda3/bin
$ ./conda init zsh 

Brew Permissions Error

I recently installed Homebrew on my MacBook and immediately hit permissions issues (I’m running Catalina v 10.15.2 (19C57) and it seems permissions are an issue for many apps).

Trying to install wget gave this error.

Last login: Wed Nov 20 08:55:40 on ttys000
bob@Bobs-MacBook-Pro ~ % wget https://creativedata.stream
zsh: command not found: wget
bob@Bobs-MacBook-Pro ~ % brew install wget
warning: unable to access '/Users/bob/.config/git/attributes': Permission denied
warning: unable to access '/Users/bob/.config/git/attributes': Permission denied
warning: unable to access '/Users/bob/.config/git/attributes': Permission denied

If you look at the permissions on the .config folder you immediately see the problem.

bob@Bobs-MacBook-Pro ~ % ls -la|grep .config
total 72
drwx------ 3 root staff 96 Nov 3 19:44 .config

By default it’s owned by root and my user has no permissions set. The fix is simple, change the owner to your user and give 744 permissions (must be run as sudo). Problem solved 😎

bob@Bobs-MacBook-Pro ~ % sudo chown -R bob:staff .config
bob@Bobs-MacBook-Pro ~ % sudo chmod -R 744 .config

The Knowledge Curve

Having recently changed jobs to become a Consultant after working 15 years in the Retail Industry I think about this chart a lot.

I’m personally at a stage where I think I know quite a bit but I’m mostly overwhelmed by how much more there is to know 🤯

I also need to become more comfortable with the fact keeping up with new technologies is almost hopeless, but is also not required.

With age and experience you realise the actual technology matters less than you think, what really matters are general techniques to solve problems, ways of isolating issues, seeing the big picture in your head and so on.

In other words you build mental models of how things work and apply that to problem solving. Things that only come with age and experience.

IoT Sensors – Saving Collected Data

Now that my sensors are successfully collecting data and the data is being processed by a Node-RED flow on my Raspberry Pi the final step is to add some persistence to the data.

As it stands the data is constantly streaming and being displayed on a dashboard but after that it’s discarded so there’s no way to see the min or max temperature for say the last 24 hours. We need a database to store and query the data.

If you intend to use a Raspberry Pi as I have done then I wouldn’t recommend setting the database up on the SD card as the read-write traffic can easily wear out the card prematurely. For this reason I mounted an external USB drive to the Pi and create the SQLite database on the external drive.
You can read this post to learn how to mount an external drive in Linux.

Choosing a Database

Three choices spring to mind that could work equally well on a small device with limited computing power, SQLite, MySQL and PostgreSQL. All are readily available to install and reasonably easy to set up but for our data SQLite stands out as being by far the easiest to set up and administer.

The one major downside to consider with SQLite is that it’s a file based database meaning that you won’t be able to access the data from your LAN or the internet (at least it’s not recommended). If you need to do this consider PostgreSQL or MySQL.

To install the SQLite node in Node-RED follow these instructions.

Database Setup

I use the command line to create a database for our sensor data as it’s very easy and requires no extra install. First open a terminal, create the directory and then a database, mine is called sensors in /mnt/hdd/sqlite/

$ mkdir /mnt/hdd/sqlite/
$ cd /mnt/hdd/sqlite
$ sqlite3 sensors
SQLite version 3.27.2 2019-02-25 16:06:06
Enter ".help" for usage hints.

Now your in the sqlite3 command prompt so we can go ahead and create a table for our data.

create table sensordata (location text, temp float, humidity float, timestamp integer);

Our table is very simple and just has four fields, location, temperature, humidity and timestamp. You can close sqlite3 using .quit. That’s all that’s required outside of Node-RED, our database is ready for data!

Using Node-RED with SQLite

The sqlite node requires little configuration, just add the node to the canvas and open the configuration. In the database entry click on the pencil icon to open the second screen shown below.

In the right hand screen type the location of the database you just created and make the mode Read-Write. Do Not make the mode Read-Write-Create otherwise the node will create a new database for you and will overwrite the one we already created.

The node is now configured. To write or query data in the database send SQL insert statements to the node via the msg.topic so leave the SQL Query entry as-is.

Inserting Sensor Data

To inset data we create a msg.topic containing the inset statement we require, for example.

msg.topic = "insert into sensordata values('" + location + "'," + dectmp + "," + dechum + "," + unixtime+")"

Another item to consider is that our beacons send new data every 3 seconds which is great for our live dashboard but is probably excessive for saving data. To reduce the volume of data we can add a delay node set to only allow one message per minute.

Just make sure if you have multiple sensors this applies per sensor otherwise we risk losing data form some locations.

Node-RED delay node
Allow one message per minute

We can check the data is being correctly inserted into the database using the sqlite3 command line again. Open the database in SQLite3 and query the database.

$ sqlite3 sensors
SQLite version 3.27.2 2019-02-25 16:06:06
Enter ".help" for usage hints.
sqlite> select count(*) from sensordata;

We see that the database has 5915 rows of data. We can now query the database to get min and max statistics per location for example and also show this data on the dashboard.

sqlite> select location,min(temp),max(temp),avg(temp) from sensordata group by location;

The flow to save the data is shown below showing the nodes to throttle the data for each sensor.

Writing data to the database
Writing data to the database

Adding Statistics

Node-RED historical sensor data
Statistics flow for historical data

This flow creates three buttons on the dashboard, one to show statistics for the last hour, one for the last 24 hours and the other for the last 7 days. The buttons are set up to send numerical data in hours, 1, 24 and 168 and the function node then creates an SQL statement for the required period by modifying the timestamp.

// converts hours injected to seconds
timespan = msg.payload*60*60

// gets current time in milliseconds, converts to seconds and subtracts the 'lookback'
msg.topic = "select max(temp) tmphigh,min(temp) tmplow,max(humidity) humhigh,min(humidity) humlow from sensordata where location='inside' and timestamp >= " + (Math.round((new Date()).getTime() / 1000)-(timespan))
return msg;

This function converts hours sent from the button node to seconds and then creates an SQL statement that filters the data using a WHERE statement that calculates the current UNIX time and minuses the required period.

Once the query is sent to SQLite the final nodes are to format the output and display the data as text on the dashboard along with a text label that changes depending on which button was pressed last.

Sensor statistics taken from SQLite
Sensor statistics taken from SQLite

We could also add statistics for when the min and max temperatures were recorded with this SQL.

sqlite> SELECT location,datetime(timestamp, 'unixepoch', 'localtime'),temp FROM sensordata WHERE temp=(SELECT min(temp) FROM sensordata);
inside|2019-10-30 06:18:23|20.91796875
inside|2019-10-30 06:18:23|20.91796875
inside|2019-10-30 06:19:23|20.91796875
inside|2019-10-30 06:30:22|20.91796875
inside|2019-10-30 06:31:22|20.91796875
inside|2019-10-30 06:31:22|20.91796875

We see the min temp has many records starting at 06:18:23 and ending 06:31:33.


We have now setup sensors, collected the data using Node-RED and MQTT, created a dashboard to show the live streaming data and also used a SQLite database to store the data for later querying.

This solution could easily be extended with extra sensors of the same type or we could add other sensors such as Hall sensors (to monitor doors or windows opening), CO2 sensors and so on.

We could also add analytics to our solution by querying the data from the database.

IoT Sensors – Data Collection

IoT Network Overview

In the first part I talked mostly about the sensors and the gateway so in this article I’ll show you how to collect, filter and store that data from the sensors.

My setup is running locally on my LAN and uses a Raspberry Pi 3B+ to process the data. I’ll be using Node-RED to process the data as it’s ideally designed to work with IoT devices.

Since everything is on my LAN I’ll not be covering securing the setup which you should do if you plan to send the data to a server on the internet.

As a prerequisite Node-RED must already be installed on you computer (pi, computer, laptop etc. that you will use). To install Node-RED on a Raspberry Pi follow these instructions, Running on Raspberry Pi.

Collecting Data with MQTT

Collecting and displaying the raw data from the gateway is very simple and only required two nodes, an MQTT In node and a debug node to display the data.

Simple MQTT data collection

The MQTT In node should be set to the same topic that you created in the MQTT setting in the gateway so messages sent from the gateway will be received in your flow. In my case I used a topic called sensor.

MQTT In node configuration

If you deploy the node you will immediately see messages appearing in the debug panel. In fact I was surprised just how many messages are collected as they scroll by at an amazing rate. It’s like a fire hose of data being streamed into the node!

This is of course because the BLE gateway is not just getting signals from your sensors but from every device it can see, which turns out to be quite a lot! This includes, phones, tablets, speakers, TVs and so on.

One second of data collection looks something like this.


The basic format of each message is the same.

$report type,tag id,gateway id,rssi,raw packet content,*unix epoch timestamp

Where the message parts are defined below.

report type GPRP: general purpose report . SRRP: active scan response report
tag id MAC address or ID of tag/beacon
gateway id MAC address of gateway’s BLE
rssi RSSI of tag/beacon
raw packet content Raw packet received by the gateway
unix epoch timestamp Optional timestamp when NTP is enabled

Filtering the Data

To filter the data so we only see the messages sent from our sensors is made simple by the fact that the message contains the MAC address of the originating message.

Therefore we can add a switch node to only allow process our MAC addresses. In this case MAC address AC233FA054C6 will go to output 1, AC233FA054C7 to output 2 and everything else to output 3.

If we don’t connect any downstream nodes to output 3 they will be effectively discarded. By splitting our sensors by output we are also able to label them so in the following nodes I use a function to identify which device is inside and which is outside.

Flow split by sensor MAC address/location

Here the sensor being sent to output 1 is the one inside so the function creates a flow variable called ‘location‘ I’ll use later in the flow.

Output 1 function

Message Slots

Now we’ve removed all the data not related to our sensors you notice that we still get two different messages from each sensor.

If you remember when we set up the S1 sensor beacon we only activated two slots. This means that the sensor will actually send two messages, one with some general info and another with our temperature and humidity data. If we had left all six slots activated we would receive six different messages.

The actual messages look like these.


If we take the first message payload only (0201060303E1FF0E16E1FFA10864C654A03F23AC5331) we can see that it’s split into header and data sections as shown below.

Info Slot data

This turns out to be the info slot. Note that the message is in hex format so battery level 64 actually equals 100 in decimal and 5331 equals S1, or the name of the sensor.

The next slot we assume must be our temperature and humidity data.

Sensor Slot Data

Since the headers for the temperature/humidity data are slightly different (they end in 01 instead of 08) from the info data we can now modify our switch node so we only get our sensors and we only see the temperature/humidity slot.

Converting Temperature and Humidity

The next challenge is actually getting the temperature and humidity from the data. You might think this is also hex encoded but you’ll see that hex 1763 is 5987 in decimal and 3C70 is 15472 so it seems we have some conversion to do.

Actually this data is encoded in signed 8.8 fixed point notation! Fortunately I’m not the first person to ask the question regarding decoding sensor data so the answer is conveniently on Stack Overflow, as always 🤟

If we use the function shown below with our values for temp and humidity we get the correct data out.

function FixedPoint(fraction){
  this.fraction = fraction;

FixedPoint.prototype.calculate = function(value){
  let intValue = parseInt(value, 16);
  let signed = (intValue & 0x8000) > 0 ? -1 : 1;
  return signed * intValue / Math.pow(2, this.fraction);

let decodData= new FixedPoint(8);

Our data point of 1763 converts to 23.3203125 which looks like a reasonable temperature and the humidity of 3C70 converts to 60.4375.

To be sure the data agrees you can open the BeaconSET+ app and check the readings there.

Check temp and humidity from the BeaconSET+ app

Final Function

Putting these steps together I have a single function to process the data, it

  • Get the sensor locations from the saved variables (switch node)
  • Uses split() to get the timestamp from the message
  • Converts UNIX timestamp to JavaScript Date object
  • Use split() to get the advertising payload from the message
  • Extract tempo and humidity strings
  • Convert to hex and decode from fixed point to decimal
  • Create a msg.payload for writing to the debug window.
// Convert IUnix time to Date
function UnixToDate(unix_timestamp){
    return new Date(unix_timestamp*1000);

// signed 8.8 fixed point functions
function FixedPoint(fraction){
  this.fraction = fraction;
FixedPoint.prototype.calculate = function(value){
  let intValue = parseInt(value, 16);
  let signed = (intValue & 0x8000) > 0 ? -1 : 1;
  return signed * intValue / Math.pow(2, this.fraction);

//get location from variable
var location = flow.get('location') || '';

let decoder= new FixedPoint(8);

//get timestamp from message
unixtime = parseInt(msg.payload.split(",")[5])

//get advertising message
data = msg.payload.split(",")[4]

// get temperature
hextmp = "0x" + data.substr(28,4)
dectmp = decoder.calculate(hextmp);

//get humidity
hexhum = "0x" + data.substr(32,4)
dechum = decoder.calculate(hexhum);

// create msg.payload
msg.payload = "Location:" + location + "\n" +
                "Temp: " + dectmp + "c" + "\n" +
                "Humidity: " + dechum +"%" + "\n" +
                "Time: " + UnixToDate(unixtime)
return msg;

When deployed we see the following data in the debug window. Success!

Collected sensor data output in Node-RED

Our full flow in Node-RED is shown below, very small for what we have achieved.

Full flow in Node-RED

How to use Amazon S3 from KNIME

This post follows on from my S3 post usng Node-RED so you can read that for a brief into to S3. Using S3 with KNIME is extremely easy as there is already an S3 node available to install.

Amazon S3 Connection

The only node that needs installing is the Amazon S3 Connection.

KNIME Amazon Cloud Connections

All the other nodes required to upload, download or list remote files are already installed in the IO section of the Node Repository in KNIME as shown here.

Amazon S3 connection in the IO section

A simple workflow to upload a file and list remote files in the S3 bucket looks like this.

S3 Credentials

The S3 connection node is easy to set up and I recommend using the standard AWS credentials file so your do not save your id and key in the KNIME workflow. It’s clearly documented on the AWS site.

The only trick is to create folder starting with a dot (.) is not possible on Windows using Windows Explorer. To create the folder you must open a command line and use the mkdir command.

C:\Users\username>mkdir .aws

S3 Connection Node

Configure the node to use your credentials file and select your AWS region. Test the connection before continuing to make sure everything works as expected.

Upload to S3

Simply choose the target directory on S3 you wish to upload your files to, in this case I’m uploading to a sub directory called nodered. As well as the S3 connection this node expects a list of URIs as shown in the initial flow overview. This is standard KNIME and uses a List Files node followed by a String to URI node.

List S3 Files

As with uploading this node asks for the S3 directory to list and by default will list all the files in you bucket.

Here the remote files are displayed.

Format and Mount External Hard Drives in Linux

Assuming you’ve just bought a new external hard drive or you have one that’s already in use these are the steps you need to take to make it usable with Linux.

⚠️ Be aware that following these steps will reformat the drive causing all data to be permanently erased. You’ve been warned.

First connect the USB drive to the Linux computer and open a terminal session. Run the following command.


This will out put something like this.

UUID                                 NAME        FSTYPE   SIZE LABEL  MODEL
                                     sda                931,5G        External_USB_3.0
19b2561e-cd18-46dd-bf87-b176c7cdd7a0 └─sda1      ext4   931,5G        
                                     mmcblk0              7,4G        
5203-DB74                            ├─mmcblk0p1 vfat     256M boot   
2ab3f8e1-7dc6-43f5-b0db-dd5759d51d4e └─mmcblk0p2 ext4     7,2G rootfs 

Note the name of the attached drive, in my case sda1 and run the command below using the name of your drive. This command will format the drive as ext4.

$ sudo mkfs.ext4 /dev/sda1

The output of this command will include a UUID, copy this string for the next step. Open the /etc/fstab file using a text editor and (nano in my example) add this line using the UUID you copied from the previous command.

$ sudo nano /etc/fstab

# add this line
 UUID=this_is_your_uuid /mnt/hdd ext4 rw,nosuid,dev,noexec,noatime,nodiratime,auto,nouser,async,nofail 0 2

Make a directory to mount the drive to, /mnt/hdd in this example, and mount the drive. The mount point should match the fstab file you just created.

$ sudo mkdir /mnt/hdd
$ sudo mount -a

Check the file is mounted by running df (disk free).

$ df /mnt/hdd/
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/sda1      960379496 78000 911447092   1% /mnt/hdd

Finally change the owner of the mounted drive to your normal user, in my case pi (it’s currently owned by root).

$ sudo chown -R pi:pi /mnt/hdd/