IoT Sensors – Getting Started

IoT sensors and Bluetooth gateway
IoT starter pack

Introduction

I’ve been interested in setting up some home monitoring/automation since getting my first Raspberry Pi a couple of years ago. After using the device for various development projects I finally decided on my first project, try to make a basic home temperature/humidity monitor that cold measure both indoor and outdoors.

The idea is to end with a simple dashboard that can be accessed from a mobile device with the current temperature and maybe a chart with some history. This will require me to both access the live data and to store it for later analysis.

Apart from being a leaning experience and ending with a useful product I also had a few basic requirements for the project.

I didn’t want to run wires around my flat and I wanted an open, non-propitiatory platform that could be extended with other sensors at a later date. I also wanted access to the ‘raw’ data coming from the sensors so I could process and store as I desire.

This means no Apple HomeKit, Samsung Smartthings or Google Home based system, which although probably easy to set up is inherently ‘closed’.

With this goal in mind I selected a Bluetooth Low Energy (BLE) system using beacons and a gateway to collect and forward the data (or advertising data in beacon speak).

The Hardware

Sensors

Minew S1
Minew S1

For the sensors I bought two Minew S1 Temperature/Humidity sensors that can be used both indoors and outdoors. These gather data and send the data via Bluetooth at regular intervals that can be determined by you. In my case I collect the data using a Bluetooth Gateway described next but the data could be collected by an app or other device.

The sensor can be configured easily via an app on both Android and iOS.

Bluetooth Gateway

The gateway is an Ingics IGS01S and collects data from Bluetooth devices and forward the payloads (data) via your WiFi. It’s basically a bridge between Bluetooth and WiFi. It can be configured to send the data via multiple methods such as MQTT or HTTP POST (think api).

The device is very small (54mm X 41mm x 18mm) and runs off micro usb using very little power.

Ingics IGS01S Gateway
Ingics IGS01S Gateway

Configuration of Hardware

Configuring the S1 Sensor

This is well described many places so I’ll be brief. I should mention that to turn the sensor on is a 3 second press on the button on the base, the same will turn the beacon off. To set the beacon up:

  1. Install the BeaconSET+ app from the App Store or Google Play
  2. Follow this guide to disable unused slots (saves power and makes later steps easier)

I set my beacon to send data every 3 seconds as this is adequate for my needs.

Configuring the Gateway

This is also well documented as INGICS have an iGS01S User Guide in pdf format. All configuration is done via the built in web configuration tool and can be done from any device.

To summarize:

  1. Add antenna and power on (plug in)
  2. Gateway starts in AP mode allowing WiFi connections
  3. Connect to AP via WiFi from a device with web browser
  4. Access the web configuration portal at 192.168.10.1
  5. Configure with you own WiFi settings
  6. Configure the gateway application setting (MQTT or HTTP POST)
  7. Optionally add BLE filters to limit which beacons are forwarded

After configuring and rebooting you can then access the web portal from your own WiFi as it will join your local network and get an IP address from your router. I recommend giving the gateway a fixed IP address by logging into your router and reserving the IP as this will make it easier to locate in future (although it has no effect on functionality).

Applications Settings (MQTT or HTTP)

My settings for the Application tab are shown below. I’m using an MQTT client (Node-RED) running on my Raspberry Pi at 192.168.1.118. Port 1883 is the standard MQTT port and I’ve chosen a topic called sensor that I’ll be using later when we set up Nod-RED.

Gateway Application Configuration for MQTT
Gateway Application Configuration for MQTT

BLE Filters

By default the gateway will collect and forward all Bluetooth data it detects so if you wish to only see your sensors you’ll have to filter out the ‘noise’ in your processing application. Another alternative is to set BLE filters in the gateway, this is documented here, an extract is shown below.

Payload filter is used as filter to keep specified beacons by using payload matching.
Assume your beacon has below report:
$GPRP,0007802DDB1E,C946A6500A33,-43,0201061AFF660200215BC6010015000000F00000000
If your beacons has a fixed field "6602" in above report, you can set
Payload Pattern: 0201061AFF6602
Payload Mask: 0000000000FFFF
Then the gateway will only forward the report when "pattern & mask" matches 6602.
Ex. To match iBeacon:
Payload Pattern: 0201061AFF4C00
Payload Mask: FFFFFFFFFFFFFF

Where you decide to filter doesn’t make that much difference but of course if you filter in your downstream application you’ll be sending a greater volume of data from the gateway. If this isn’t an issue it’s probably easier just to send everything from the gateway.

Another filtering option is to use the RSSI (received signal strength indicator) slider. Moving it further to the right will filter out weaker signals so signals from distant sources will be removed.

The BLE filters are configured in the Advanced tab in the gateway.

Advanced configuration tab
Advanced configuration tab

Time Stamping

To enable the gateway to add the timestamp to each message you must enable the NTP server in the System tab. This enables the gateway to look up the current time from an online NTP server at regular intervals.

Enable ntp server
System tab with NTP settings

Once configured the gateway will begin collecting beacon data and forwarding to the MQTT server. We haven’t set up the server yet so the messages will not be received but you won’t see any errors in the gateway.

Next Steps

Now we have the sensor and gateway working in the next part I’ll move to reading the sensor data from the gateway using Node-RED.

Recovering Node-RED Flows After Changing Hostname

I recently changed the hostname on my Raspberry Pi and was rather surprised (and initially worried) when the next time I started Node-RED all my flows has disappeared 😱

It turns out that the flows are connected to the computer hostname by their naming convention. Someone even filed a bug regarding this unexpected behavior.

The reason lies in the naming of the configuration files. If you take a look in the .node-red directory you’ll see something like this.

pi@thor:~ $ cd .node-red/
pi@thor:~/.node-red $ ls
flows_raspberrypi_cred.json  lib           package.json       settings.js
flows_raspberrypi.json       node_modules  package-lock.json

Here’s you see two files that both contain the hostname (raspberrypi).

The fix is simple, just rename the files replacing the old hostname with the new hostname.

pi@thor:~ $ mv flows_raspberrypi_cred.json flows_thor_cred.json
pi@thor:~ $ mv flows_raspberrypi.json flows_thor.json

Node-RED will need to be restarted to pick up the new configuration.

pi@thor:~ $ sudo systemctl restart nodered

Piping Bitcoin RPC Commands

If you want to get the block header of the latest block generated on the bitcoin blockchain using bitcoin-cli it’s a little tricky (and hard to say!). You need to first find the latest block number (height), then find the hash of that block and then get the header using the hash.

Since the getblockheader command expects a blockhash as a parameter I use pipes to feed the result of one command into the next.

The pipe runs the following commands in this order.

  • First get the chain height using getblockcount
  • Feed this result to getblockhash to get the hash
  • Feed this result to getblockheader
  • Result is the header of the latest block

The result is a one line command to get the latest block header!

$ bitcoin-cli getblockcount | xargs bitcoin-cli getblockhash | xargs bitcoin-cli getblockheader
{
  "hash": "00000000000000000001e372ae2d2bc91903bd065d79e126461cd2bf0bbe6b3d",
  "confirmations": 1,
  "height": 600417,
  "version": 545259520,
  "versionHex": "20800000",
  "merkleroot": "e58f963d486c0a626938851ba9bfb6e4886cabcf2302573f827ca86040f997a3",
  "time": 1571688192,
  "mediantime": 1571685251,
  "nonce": 1693673536,
  "bits": "1715a35c",
  "difficulty": 13008091666971.9,
  "chainwork": "000000000000000000000000000000000000000009756da038619f842bfff6b6",
  "nTx": 2577,
  "previousblockhash": "0000000000000000000dbd8aada824ee952e87ef763a862a8baaba844dba8af9"
}

IoT with Node-RED and Python

Raspberry Pi + Node-RED + Python + MQTT

Now I have two Raspberry Pis running, one as a Bitcoin full node and the other mostly used as a dev/experimentation machine I decided it’s time to put the dev machine to some use.

I’d also like to learn more about IoT (Internet of Things) and how they are wired together and communicate so this is a great opportunity to ‘Learn by Doing’.

To this end I’ve started to experiment with the MQTT messaging protocol that is commonly used for IoT devices.

To start, what is MQTT?

MQTT (Message Queuing Telemetry Transport) is an ISO standard, lightweight, publish-subscribe network protocol that transports messages between devices.

MQTT on Wikipedia

This allows us to very easily send sensor data between devices without having to invent the communication medium ourselves. Most IoT gateways support MQTT out of the box and it’s widely supported across many programming languages (list here).

As a test I’ll create a Node-RED flow on my Raspberry Pi that will publish (send) messages to a local MQTT server, these messages will then be ‘read’ by a python script running on my Windows laptop. I’ll also add a flow where the python script on Windows publishes messages that are then read by the Node-RED flow.

Node-RED Flow

MQTT Node-RED flow
MQTT Node-RED flow

MQTT in and out nodes are included as part of the standard installation on Node-RED so creating a flow is trivially easy. All the MQTT part is contained in a single node while the rest of the flow is just creating the message to send.

Publish Flow

MQTT Publish flow
Publish flow

The inject nodes are just to manually trigger the flow. The true trigger causes the exec node to execute a command on the Raspberry Pi, in this case it gets the system temperature. This is then published to the MQTT server in the ‘iot‘ topic.
The command to get the system temperature on a Raspberry Pi is shown here.

$ /opt/vc/bin/vcgencmd measure_temp

Topics in MQTT are just ways to keep different messages together, if you publish to a specific topic then other clients subscribed to the topic will receive the messages.

Subscribe Flow

MQTT Subscribe flow
Subscribe flow

The lower two nodes are used to subscribe to a topic I’ve called ‘python‘. This is triggered when the python script publishes to the topic and the message will be outputted to the debug console in Node-RED.

Configuring the MQTT Nodes

By default the MQTT nodes use a local server on port 1883 that is already set up for you. Unless you want to use your own server or a remote server just leave these as-is. The topic is entirely up to you, just make sure you use the same topic in the client used to read the messages.

MQTT server configuration
MQTT server configuration

MQTT Python Script

For the python client running on my laptop I’ll use the Eclipse Paho library. To install use:

pip install paho-mqtt

The full script looks like this.

import paho.mqtt.client as mqtt
import os

# The callback for when the client receives a CONNACK response from the server.
def on_connect(client, userdata, flags, rc):
    print("Connected with result code "+str(rc))

    # Subscribing in on_connect() means that if we lose the connection and
    # reconnect then subscriptions will be renewed.
    client.subscribe("iot")

# The callback for when a PUBLISH message is received from the server.
def on_message(client, userdata, msg):
    print("Topic: {} / Message: {}".format(msg.topic,str(msg.payload.decode("UTF-8"))))
    if(msg.payload.decode("UTF-8") == "Reply"):
        client.publish("python", os.environ.get('OS',''))

client = mqtt.Client()
client.on_connect = on_connect
client.on_message = on_message

# Use the IP address of your MQTT server here
SERVER_IP_ADDRESS = "0.0.0.0"
client.connect(SERVER_IP_ADDRESS, 1883, 60)

# Blocking call that processes network traffic, dispatches callbacks and
# handles reconnecting.
# Other loop*() functions are available that give a threaded interface and a
# manual interface.
client.loop_forever()

The code is well commented but essentially it creates a connection to the MQTT server (created by the Node-RED flow on my Pi). Replace the IP address with your local server or use 127.0.0.1 if the script runs on the same computer as the server.

The script then waits for messages in the ‘iot‘ topic and when received it prints the message to the console. If the message is ‘Reply’ then the script also publishes a message (the Windows OS version) to the ‘python‘ topic which will be picked up by the Node-RED flow and displayed there.

Putting it Together

To start sending and receiving messages first deploy the Node-RED flow and then start the python script. Running the python script returns this showing the script is now waiting for messages.

>python mqtt.py
Connected with result code 0

Injecting the ‘true‘ node will query the Pi for the system temp and send this to the ‘iot‘ topic on the MQTT server which the python script will pick up and display as shown below. Here I ran the flow four times so we get four messages with temperatures displayed in python on my laptop.

Topic: iot / Message: temp=48.3'C
Topic: iot / Message: temp=48.3'C
Topic: iot / Message: temp=48.9'C
Topic: iot / Message: temp=48.3'C

If I now send the ‘Reply‘ message from Node-RED we see this in python.

Topic: iot / Message: Reply

In Node-RED we see a debug message with the message sent from python to the ‘python‘ topic we subscribed to in Node-RED (β€œWindows_NT”).

Node-RED debug output

Testing from iOS

In the app store there are quite a few MQTT clients available. I tried a few but MQTTool was the most reliable for me. It allows you to connect to a server and both publish and subscribe to topics. Just connect to your MQTT server and test!

Next Steps

This was a trivial example of using MQTT to send and receive messages but the next plan is to extend this with sensor data that can be send to Node-RED running on a virtual server.

This way I can securely make sensor data available form the internet as well as choosing to store the data in a database or cloud storage service.

Introduction to Image Classification using UiPath and Python

Image classification
A python!

After my previous post showing image classification using UiPath and Python generated many questions about how to implement the same I decided to expand upon the theme and give a more detailed description about how to achieve this.

My starting point was thinking how I might integrate UiPath with python now that it’s integrated within the platform.

I find thinking of potential solutions and use cases just as much fun as actually making the automations and it feeds the creative mind as well as the logical.

Python is also growing explosively right now and this leads to a vast array of possibilities.

To see just how python is growing see this article from Stock Overflow.

Programming language popularity
Python Growth!

The first thing to point out is that although I used UiPath as my RPA platform this could in theory be any platform that supports python. I also use Alteryx and this could easily be integrated into an Alteryx workflow to programatically return the classifications and confidence levels.

Note that these instructions cover installation on Windows but the python and Tensorflow parts could easily be done on Mac or Linux, just follow the instructions.

Basic Requirements

Python

Obviously this won’t work without Python being installed πŸ™„ I used Python 3.5 but any Python 2 or 3 version should work. Download Python for free and follow the installation instructions. There’s a useful guide to the installation process at realpython.com. This is easy and will take 10 minutes to get going.

Tensorflow

This is the python library that does the heavy lifting of the image classification. Or in the words of Wikipedia:

TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks.

Wikipedia

It’s completely free to use and released by Google. I won’t go into the installation in detail since it’s well documented on the Tensorflow site, but the whole process took me another 10 minutes. Easy as πŸ₯§

Python 3.X installs the pip package manager by default so in my case installing Tensorflow was as simple as typing the following command into the command line.

pip install --upgrade tensorflow

UiPath

RPA (Robotic Process Automation) is also growing exponentially right now so it’s a great time to learn how it works and what benefits it can bring.

The RPA software market overall witnessed a growth of 92 to 97 percent in 2017 to reach US$480 million to $510 million. The market is expected to grow between 75 and 90 percent annually up to 2019.

If you don’t already have RPA software and want to integrate into an automated solution you can download and use the community version of UiPath for free (Windows only).

UiPath is very powerful and yet easy to use, but apart from the technology a major advantage they have is the large and growing community so solutions are often posted to their forums. On top of that they have free training available online. What’s not to like?

Download the Tensorflow Models

Let’s get into the details now starting with downloading the models we will use.

If you use git you can clone the Tensorflow models repository using this link; https://github.com/tensorflow/models

Otherwise you can use a browser and navigate to the above page and then download the models as a zip file using the link in the top right corner.

Clone from Github

Save the zip file to your computer and unzip somewhere on your C: drive. The location isn’t important as long a you know where it is.

Download the Pre-trained Model

This can be done in one of two ways, either:

Method 1

Find the location of the unzipped models file from the previous step and go into the following directory (in my case the root directory is called models-master), models-master > tutorials > image > imagenet

Once there open a command prompt and run the python file called classify_image.py

imagenet output
Your first classified image

If all goes to plan this downloads the pre-trained model to your C drive and saves it to C:\tmp\imagenet, it also runs the classifier on a default image in the downloaded folder. As you can probably work out the image is of a panda πŸ™‚

giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89632)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00766)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00266)
custard apple (score = 0.00138)
earthstar (score = 0.00104) 
panda

If you got this far, well done, you’ve already done image classification using python and Tensorflow!

If you get warning in the output, as I did, you can safely ignore these assuming the classifier still produces the output. These are purely because we are using the default Tensorflow library that is designed to work across as many CPUs as possible so does not optimise for any CPU extensions.

To fix these you would need to compile Tensorflow from source which is out of scope for this tutorial (see here for more info, https://stackoverflow.com/questions/47068709/your-cpu-supports-instructions-that-this-tensorflow-binary-was-not-compiled-to-u)

Method 2

Alternatively you can take a shortcut and download the pre-trained model directly from here, http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz

Extract the files (you’ll need something like 7-zip for that) and save them to C:\tmp\imagenet so it looks like this:

Modify the Image Classifier for Automation

The classify_image.py python script could easily be used directly with python but as it stands the script only prints the data to the command line and does not return any data from the function.

We could just change the script to write the output to a text file which UiPath could read, but it’s much cleaner and more efficient if we alter the code to return a list from the classifier function which can be converted into a .NET object in UiPath.

This also gives us the advantage that we can load the script just once into UiPath and then call the classifier function each time we need to classify an image saving considerable time and resources.

The modified python file (‘robot_classify_image.py‘) can be downloaded from my github repository, https://github.com/bobpeers/uipath, and placed somewhere where it can be called from your automation workflow.

To test the file works you can call it from a command line as follows.

C:\>robot_classify_image.py <full_path_to_image> <number_of_predictions>

For example this will return three predictions on the bike image.

C:\>robot_classify_image.py "C:\work\UiPath\Images\images\bike.jpg" 3

By default the script will not print the results to the console but if you wish to see them simply uncomment the print() line in the script:

for node_id in top_k:  
    human_string = node_lookup.id_to_string(node_id)
    score = predictions[node_id]
    #enable print for testing from command line
    #print('%s (score = %.5f)' % (human_string, score)) 
    returnValue.append("{0:.2%};{1}\n".format(score,human_string))

Note that if you saved the pre-trained model somewhere other than C:\tmp\imagenet you can edit the python to point to the saved location by replacing all the instances of ‘/tmp/imagenet/‘ with your path (be sure to keep the forward slashes).

UiPath Workflow

Most of the hard work is now done. It’s only left for us to integrate this into a UiPath workflow which is simple 😊

Use a Python Scope

All the python activities must be contained inside a Python Scope container.

Set the path to your installation path and target to x64 for 64 bit systems or x86 for 32 bit. Leaving the version as auto will auto-detect your version.

Load the python script

First we load the python script into UiPath using a Load Python Script activity.

Setting the result of this activity to a variable, in my case called pyScript.

Invoke the Python Method

Next we use the Invoke Python Method activity that will actually run the classification method.

In the Invoke Python Method activity enter the name of the function to call (‘main‘) along with the script object from above as ‘Instance’.

The function ‘main’ expects two arguments (‘Input Parameters’), the full path to the image file and the number of predictions required, sent as an array using variables in my case.

The function returns a Python Object called pyOut in my case.

Get Return Values

The Get Python Object takes the returned value (pyOut) from the previous activity and converts it into an array of strings (which is the return value from python)

We can then loop through the array and extract each line from the prediction and use for further processing or display on a callout as I did in the video.

All finished, take a coffee on me πŸ˜…

Summary

Once the basics are set up, using the classifier is extremely easy and returns values very quickly. As you look at more images you’ll also realise that sometimes the model is not certain on the results so make sure you check the confidence level before continuing processing.

My suggestion would be to automate anything over 80-90%, depending on the use case of course, and putting everything else aside for manual handling.

The classifier uses about 1000 classes to identify objects but you could always retrain the classifier on your own images. The Tensorflow documents are here, https://www.tensorflow.org/hub/tutorials/image_retraining if you want a challenge.

Have fun πŸ€–πŸ’ͺ

How to use Amazon S3 from Node-RED

Node-RED + Amazon S3

Amazon S3 (Simple Storage Service) is a very commonly used object storage solution that’s cheap to use and highly reliable. Think of it as a file system in the cloud with enterprise features that you can use to store almost anything.

Amazon S3

This guide assumes you already have a working Amazon S3 account and you have created a storage bucket along with a user authorized to read and write to the bucket. You must also have the Key ID and Secret Key for the user so we can authenticate from Node-RED.

Node-RED Flow

Open Node-RED and add the node-red-node-aws palette. This will install nodes for reading, writing and watching for events in your bucket.

To test create a simple flow like below where you input some data using the inject node, append the data to a text file and then upload the file to your S3 bucket using the amazon s3 out node.

Node-RED test flow
Node-RED test flow

The configuration of the amazon S3 out node should look like this:

  • AWS is where you enter your AccessKeyID and Secret Access Key
  • Bucket is the name of the S3 bucket you created
  • Filename is the name of the file you want to create in S3 including any folder path
  • Local filename is the file you wish to upload
  • Region is the AWS region you S3 bucket is located in
Amazon S3 Out Configuration
Amazon S3 Out Configuration

That’s all there is to it, when you deploy and run the workflow the inject node will append the timestamp to the end of the upload.txt file and then upload the file to S3.

If you log into the S3 console you’ll see the file and contents.

Amazon S3 Console
Amazon S3 Console

Previewing the file contents in S3 shows the appended timestamps.

Amazon S3 file preview
Amazon S3 file preview

Linux Command Line Calendar

I’ve used Linux for almost 20 years and somehow never knew you could get a calendar on the command line 🀯🀯

Just type β€˜cal’ for the current month or cal followed by the year (β€˜cal 2019’ for example) to get a full year. See the man pages for details.

me@myserver:~$ cal 2019
                             2019
       January               February               March
 Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
        1  2  3  4  5                  1  2                  1  2
  6  7  8  9 10 11 12   3  4  5  6  7  8  9   3  4  5  6  7  8  9
 13 14 15 16 17 18 19  10 11 12 13 14 15 16  10 11 12 13 14 15 16
 20 21 22 23 24 25 26  17 18 19 20 21 22 23  17 18 19 20 21 22 23
 27 28 29 30 31        24 25 26 27 28        24 25 26 27 28 29 30
                                             31
    April                  May                   June
 Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
     1  2  3  4  5  6            1  2  3  4                     1
  7  8  9 10 11 12 13   5  6  7  8  9 10 11   2  3  4  5  6  7  8
 14 15 16 17 18 19 20  12 13 14 15 16 17 18   9 10 11 12 13 14 15
 21 22 23 24 25 26 27  19 20 21 22 23 24 25  16 17 18 19 20 21 22
 28 29 30              26 27 28 29 30 31     23 24 25 26 27 28 29
                                             30
     July                 August              September
 Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
     1  2  3  4  5  6               1  2  3   1  2  3  4  5  6  7
  7  8  9 10 11 12 13   4  5  6  7  8  9 10   8  9 10 11 12 13 14
 14 15 16 17 18 19 20  11 12 13 14 15 16 17  15 16 17 18 19 20 21
 21 22 23 24 25 26 27  18 19 20 21 22 23 24  22 23 24 25 26 27 28
 28 29 30 31           25 26 27 28 29 30 31  29 30
   October               November              December
 Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
        1  2  3  4  5                  1  2   1  2  3  4  5  6  7
  6  7  8  9 10 11 12   3  4  5  6  7  8  9   8  9 10 11 12 13 14
 13 14 15 16 17 18 19  10 11 12 13 14 15 16  15 16 17 18 19 20 21
 20 21 22 23 24 25 26  17 18 19 20 21 22 23  22 23 24 25 26 27 28
 27 28 29 30 31        24 25 26 27 28 29 30  29 30 31

Bitcoin RPC Commands over SSH Tunnel

SSH Port Forwarding Explained

If you’re running a Bitcoin full node and want to run RPC commands against the Bitcoin client from a remote machine the easiest and safest way to do this is using Port Forwarding over an SSH connection.

What is Port Forwarding used for?

Secure access to a port that is otherwise not listening on a public network interface. This is common with database servers like MySQL.
Encryption for for services that may not natively use encrypted connections.

Port Forwarding – https://docs.termius.com/termius-handbook/port-forwarding

This also gives you the flexibility of using Python (or another language) from the remote machine without having to install it on the Bitcoin node.

In my case I’m going to use Python in a Juptyer Notebook to query the node using Termius as the SSH client.

The Bitcoin node is running on my local network and does not accept RPC commands from the internet, but using port forwarding I’ll be able to query it from my laptop from any location.

Install an SSH Client

On Windows I recommend Termius as it’s very easy to use and has a nice graphical interface (it’s also available for Mac, Linux, Android and iOS) but you could use any SSH client (PUTTY for example).

First create an SSH host to the Bitcoin full node.

Termius hosts
Create an SSH host

Then create the forwarded port. On your local machine you can select any port that’s not in use, in my case I use port 10000.

When I connect to my local machine on port 10000 the port is then securely forwarded to the remote machine on port 8332, which is the port the Bitcoin RPC server listens on by default.

So 127.0.0.1:10000 becomes BITCOIN_NODE:8332

Termius port forwarding
Forward a local port to the Bitcoin RPC port (8332)

The configuration page should look something like this.

Configuring Port Forwarding
Configuration pane for port forwarding

Open the port by clicking connect.

Connect the port forwarding
Connect the forwarded port

Python Bitcoin Library

To use python with your Bitcoin node use the python-bitcoinrpc library. To install simply use:

pip install python-bitcoinrpc

Next get the rpcuser and password you added to your bitcoin.conf file

rpcuser=thisismyuser
rpcpassword=DONT_USE_THIS_YOU_WILL_GET_ROBBED_ijfr84ur84uof94ur9r4

Once installed create a connection to the node using these credentials. The IP will always be locahost (127.0.0.1) and the port is the same port you used for the forwarding, 10000 in my case.

from bitcoinrpc.authproxy import AuthServiceProxy, JSONRPCException
USERNAME = ******
PASSWORD = ******
IP = "127.0.0.1:10000"
    
rpc_connection = AuthServiceProxy("http://{}:{}@{}".format(USERNAME, PASSWORD, IP), timeout = 500)

One connected we can query the node using regular RPC commands. Here I get the last 10 blocks and return the block height, timestamp, number of transactions in the block, difficulty and nonce.

bci = rpc_connection.getblockchaininfo()
maxBlock = bci["blocks"]
for i in range(maxBlock,maxBlock-10,-1):
    bbh = rpc_connection.getblockhash(i)
    bh = rpc_connection.getblockheader(bbh)
    print(bh["height"],bh["time"],bh["nTx"],bh["difficulty"],bh["nonce"])

Command output:

597577 1570042322 2866 12759819404408.79 872368408
597576 1570041887 2921 12759819404408.79 2413129693
597575 1570041233 3406 12759819404408.79 2989319068
597574 1570039252 2884 12759819404408.79 3248003543
597573 1570038909 3061 12759819404408.79 259424928
...

This command returns the network statistics of the node.

net = rpc_connection.getnettotals()
print(net)
{'totalbytesrecv': 9043069394, 'totalbytessent': 83507300429, 'timemillis': 1570047435410, 'uploadtarget': {'timeframe': 86400, 'target': 5242880000, 'target_reached': False, 'serve_historical_blocks': True, 'bytes_left_in_cycle': 3230191636, 'time_left_in_cycle': 55100}} 

For a full list of the currently available API calls see the Bitcoin Developer Reference.

Generating New Product Names using Neural Networks

So everyone knows Machine Learning / Artificial Intelligence / Cognitive Computing, call it what you will, is the new marketing catchphrase for people trying to sell their software products and services. You can be sure if it’s not already baked in then it’s in the roadmap for 2020.

It used to be ‘Big Data’, but we got tired of hearing that, so a few control+h presses later and, hey presto, Machine Learning (ML) has arrived.

Don’t get me wrong, I’m convinced ML will have a profound effect in the coming years, but like most technologies, we overestimate the short term effect and underestimate the long term.

As the saying goes, the future is already here β€” it’s just not very evenly distributed.

I read lots of articles on ML that seem fantastic but it’s hard to get a grasp on something when you haven’t really used it for yourself. I wanted to know if ‘ordinary’ people can use it, and what for? To satisfy my curiosity I decided to see if I could train a neural network to generate product names for clothing based on the product names we are already using in IC Group.

Getting Training Data

Data is the raw material for Neural Networks and the more data the better. If you’re data is already big then great! If not then don’t worry, you can still get interesting results.

To feed the network I extracted the entire history of style names of our three core brands, namely Peak Performance, Tiger of Sweden and By Malene Birger.

After cleaning the data to remove numbers and other ‘junk’ (for example Peak Performance often start style names with the abbreviation ‘JR’ for junior ), the raw data consisted of the following number of style names.

  • Peak Performance: 7,590
  • Tiger of Sweden: 13,087
  • By Malene Birger: 15,419

Not a huge corpus of data to go with but hopefully it should be enough to generate something of interest.

How Does This Thing Work?

The type of Neural Network I used is technically called a Recurrent Neural Network, or RNN for short. It essentially takes training data and ‘learns’ patterns in the data by feeding the data through layers. It also has some ‘memory’ (called LTSM or Long / short term memory!) so that as well as the input to the layer having influence it also selectively remembers or forgets the result of previous iterations.

For text this means you can feed the network large passages of text and the network will ‘learn’ how to write new text without knowing anything about grammar, spelling or punctuation. If you feed it all of Shakespeare’s works and train enough it will generate text that looks like real Shakespeare but is completely new work!

It may sound pretty complicated (and it is) but as a user you don’t really need to know much to get started. There’s ready-to-use scripts everywhere on the internet (Github + Google are your friends) that have full instructions. It’s very much plug and play and took me about an hour to get started from scratch.

I’ve also included links at the bottom of the article pointing to the code I used.

Our Current Product Names (The Training Data)

To give you an idea what types of product names we currently use I selected a few at random to give you a taste. Note that they are all short names (no more than 10 characters) and are not always ‘real’ words or even names.

Product names
A sample of our current product names

The names tend to have a Brand ‘feel’, so for example By Malene Birger use softer, slightly exotic sounding names to fit their Brand image and target consumer. It will be fun to see if the Neural Network can get this detail right.

Training the Network

This process is surprisingly simple. Just feed the network a text file with all the current names, one file per brand, then run the training script, sit back and get a coffee or three.

Neural Network Training
Neural Network Training

Since the training data is fairly small this doesn’t actually take very long (it took me a couple of hours per brand using a virtual machine) but is highly dependent on a handful of parameters that can be set plus the capabilities of your computer. Probably the most important parameters are these:

  • Number of layers in the network
  • RNN size, this is the number of hidden unit (or nodes) in the network
  • Training Epochs, basically how long to train the model for

Basically more layers, more nodes in the layers and longer training gives better results but can take much longer and the benefit isn’t always worth the effort. Trial and error often works just as well!

Does This Thing Really Work?

After training the model we simply use it to generate new names. This is called sampling the model, you can generate samples using some starting text but in my case I just let the model pick a random starting point.

So here’s a sample of the names generated per brand.

Neural network results
Names generated from the neural network

Bearing in mind that the network knows nothing about language I think it did a remarkably good job of capturing the essence of the brands names.

To emphasise once again, the network doesn’t know anything about the constructs of words, what vowels are or anything else for that matter. It learns these patterns purely from the training data and then builds a model to generate new words using the same rules.

The model can be sampled over and over again so there’s an unlimited supply of names.

Can Neural Networks be Creative?

If we really want to play around we can change the parameters of the sampling to try and generate more creative names.

One of these parameters (called temperature) basically tells the network how confident it should be about the name (actually how confident it should be about the next letter in the generated word). If we turn up the temperature the model becomes more aggressive and suggests ‘wilder’ names.

Neural network generated names
Some more exotic examples

I would definitely buy a blazer from Tiger of Sweden called JUGOMAR or maybe my girlfriend would like a dress from By Malene Birger called CIBBAN or some Peak Performance ski pants called RANDEN.

Of course if we turn up too much on the creativity then it starts to generate some nonsense!

Crazy neural network generated names
It’s starting to go crazy!

But even in the weirdness we get names like FLAURELAYKS and KAWLAN which I think sound like great product names πŸ˜ƒ

Summing Up

This was of course all done for fun, but it shows that these types of networks are not impossible to use and someone with decent computer skills can get these up and running in a matter of hours.

If ML really is going to explode in the coming years then they will need to be easier to interact with than they are today. There will never be enough data scientists to satisfy demand, so just like spreadsheet programs made everyone a numbers whizz I expect user interfaces and APIs will be developed so less skilled users can create, train, and deploy ML models into production.

It Almost Makes Sense

As a final challenge I tried making new product descriptions by training the model on current descriptions. It almost makes sense but could maybe do with a bit more training πŸ˜‰

This is one for Peak Performance!

Stylish Mid feel shortany ski town, it with a shell is a fixent windproof, comfortable, keeping this fit delivers the wicking, breathable Joad.

References If You Feel Inspired To Try Yourself!

If you feel like reading more or even trying for yourself then the code for the RNN is available to download here.

https://github.com/jcjohnson/torch-rnn

And more general reading on generating text using an RNN is here.

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Getting Database and Table Sizes in Postgres

Total Database Size

This SQL simply gets the total size of the database in a human readable format.

SELECT pg_size_pretty(pg_database_size('postgres')) as db_size

List all Tables

This lists all the tables in the database public schema.

SELECT tablename FROM pg_catalog.pg_tables WHERE schemaname = 'public'

Search Schema for Column Name

I often need to search all the tables or views to find which contain a particular column. Replace ‘COLUMN_NAME’ with your column below.

SELECT t.table_schema,t.table_name
FROM information_schema.tables t
INNER JOIN information_schema.columns c 
      ON c.table_name = t.table_name 
      AND c.table_schema = t.table_schema 
WHERE c.column_name = 'COLUMN_NAME'
      AND t.table_schema not in ('information_schema', 'pg_catalog')
      AND t.table_type = 'BASE TABLE'
ORDER BY t.table_schema;

In this case I searched for all columns containing the word ‘order’.

Table Sizes

Retrieve the size per table in the public schema from largest to smallest.

SELECT nspname || '.' || relname AS "table_name",
        pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
     WHERE nspname = 'public'
     AND C.relkind <> 'i'
     AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC

Full Schema

SELECT * FROM information_schema.columns WHERE table_schema = 'public'