KNIME Database SQL Filter using the IN Operator

Sometimes using KNIME it would be nice to create a DB row filter based on a list of items. When not using DB tools you can use the reference row filter but this cannot be used with DB connections. The other standard method would be to use the DB Row Filter but this only supports a few operators such as = and like.

DB Row Filter options

If you want to filter by a long list that doesn’t follow a pattern it’s a lot of work to create the filter, plus it’s not dynamic.

Create a Dynamic “IN” Filter

My solution is to create a dynamic string that can be injected into the SQL using a variable and then used as part of an “IN” filter.

First create a list of the items you wish to use in the filter. In this case I’m using a static list but they could also come from a dynamically generated list.

Filter list

Next use a Group By node to concatenate the items into a single row using ‘,’ as a delimiter (note this is not just a comma but also the single quote marks). Do NOT group the items.

Grouping

Now we have along list of our items looing something like this, not we are missing the opening and closing quotes at the moment.

item1','item2','item3','item4','item5

To close the quotes use a String Manipulation node with this configuration.

Our list is now in the correct format to inject into the SQL.

Filter string

To add this to a DB Query first generate a variable from the table row.

Generate variable

To add to the DB query connect this node to the DB Query node and create an SQL statement using the IN operator as shown below.

SQL IN Operator

This will generate an SQL query something like this where the list we created is inserted into the SQL.

SELECT * FROM (SELECT * FROM "dbo"."Sales Header"
) AS "table" where "No_" in ('SO12345','SO12356','SO45633','SO95959','SO38475','SO84737','SO94836','SO94847','SO77363','SO99884')

The full flow looks like this when put together.

Full DB Filter flow using IN Operator

Default Include or Exclude New Columns in KNIME

It’s common when creating a new workflow that after the flow is finished the source data can change. The most common scenario is when connected to a database table or view and the data gets extra columns added.

In this scenario you need to make sure that your workflow doesn’t break caused by the introduction of new data.

The way to control is this through the Enforce exclusion and Enforce exclusion setting in many configuration dialogs.

Enforce exclusion

This means that the list of fields you add to the Exclusion list will always be removed from the flow but any new columns that may appear will automatically be added to the flow in the Include list

Enforce Inclusion

This is basically the opposite. Only fields included in the Include list will be added to the flow, any new columns appearing will automatically be added to the Exclude list and remove from the flow.

Default Include or Exclude?

In general I want the data coming from a data source, whether it’s a database or a file, to be static so new fields are not automatically fed through my workflow. If I want them to propagate I’d rather control this myself. For this reason I set this to Enforce inclusion for the initial nodes directly after loading the data.

For nodes later in my workflow I do want changes to propagate, so if I make a new column it will automatically pass through to downstream nodes. Therefore I set this to the default of Enforce exclusion for all the downstream nodes after initial data load.

The KNIME default for new nodes is Enforce exclusion which in many cases is fine but just be aware what effect this will have on your flow if your data source changes to include extra fields.

Getting OP_RETURN Bitcoin Messages

This is mainly a note to myself as every time I want to do this I end up searching through my history doing something like this.

history | grep hexyl

So here’s the code to extract the OP_RETURN message from the bitcoin blockchain. This needs to be run on the machine running your full node.

bitcoin-cli getblockhash 0|xargs -I '{}' bitcoin-cli getblock '{}' 0| xxd -r -p | hexyl -n 256

The output looks like this for the Genesis block.

OP_RETURN for the bitcoin Genesis block

Compound Improvements

The easiest way to earn money, improve a product or learn new skills is very simple. Compounding.

If you invest money today in an Index fund and just leave it for 30 years you’ll probably be shocked how little it increases in the first 20-25 years and how much it increases after that. Even a 5% return per year returned into the initial investment with give huge returns given long enough.

It’s the same for skills.

This chart show what a 1% improvement every day for a year looks like. After 365 days you have an almost 38x improvement! But see how the first 200 days don’t feel so special, this is when it’s easy to give up as progress is hard and improvement comes slowly.

This is the same chart for a 1% improvement per week for a year. After 52 weeks you have a 1.7x improvement, still not bad! If someone told you that if you improve 1% a week for a year you’ll be 70% better at the end of the year I think you’d be happy.

And finally 1% improvement per month for 12 months. This gives a 13% improvement after a year, still not a bad payoff for being 1% better at something per month.

Common Data Mistakes

Wrong way, if you see these mistakes turn back!
Turn Back!

This isn’t meant an an exhaustive list but just the most common errors I’ve seen over the years.

  • Taking the average of values that are already an average
  • Trying to join database tables on fields with different data types
  • Processing data you will later discard, do filtering early on
  • Using averages when mean or mode is better (the average person has 1,9999 legs)
  • Forgetting seasonality in data
  • Accidentally filtering out data by incorrect combining of AND/OR statements
  • Errors caused by fixed length string in databases (often space padded like ‘DataPoint ‘)
  • Accidentally missing data by being provided data from a source that already has hidden filters applied
  • Losing data when joining tables in SQL due to missing data in one table (use outer joins)
  • Poor data quality (countries stored as UK, GB, England etc. in the same field, phone number with 00 and + prefixes)
  • Null data breaking report filters and SQL where statements
  • Misusing null in Boolean fields (1 is true, 0 is false and null is unknown)
  • Storing data as generic data types (dates as text, numbers as text, Boolean as integers and so on)
  • Storing the same data point in different tables with different data types
  • Being inconsistent in naming conventions (my_table, MyTable), pick one and stick to it
  • Using cryptic names for fields instead of names that make sense (max_temp is OK m_tmp is ambiguous)

Money As Data

Transferring RMB to digital RMB on a smartphone in China

Current ‘Digital’ Money

As much as many people use banking apps and send money via their smartphones, money isn’t really digital yet.

What do I mean by ‘not digital’? Of course it doesn’t mean you need to physically transfer paper money and metal coins to buy a coffee. You can of course do this digitally on your phone.

The digital experience today is mostly a UX layer of digitisation over legacy underlying systems. Think about sending money from your home country to a friend or relative abroad. You open your mobile banking app, enter their bank IBAN number, a few other details along with the amount and currency and that’s it!

Seems great? Not really. Nothing has actually happened apart from you gave an instruction to your bank move an amount X from bank A to bank B. The transaction was really an *instruction* with no guarantee to be carried out in any particular timeframe or even to be carried out at all.

The other major difference to real digital money is that without a bank account tied to your phone you simply can’t transfer money. I can’t send money to a vending machine for example without the machine being owned by a company that runs a bank account on behalf of the machine.

This doesn’t mean that digital money isn’t coming soon. Just recently a major report was released produced by six Central Banks (The Bank of Canada, European Central Bank, Bank of Japan, Sveriges Riksbank, Swiss National Bank, Bank of England, Board of Governors of the Federal Reserve and Bank for International Settlements) that discusses the key requirements of a true digital currency issued by Central Banks.

Just recently PayPal have also announced support for Cryptocurrencies with native support on their platform.

Real Digital Money

Digital Money

What I call real digital money doesn’t need a bank account, it just needs a wallet to store the digital currency. The account number is the wallet address.

Of course this is how cryptocurrencies work using the blockchain, but digital currencies don’t have to use blockchains. They could also be run by a Central Bank and issued or redeemed 1:1 for the national currency. Think of a Euro and a Digital Euro both of equal value but one is way more flexible than the other.

Digital currencies issued by a Central Bank would allow direct payments to be made from Governments to citizens without going through the traditional banking system. This would also make it much easier to include the entire population since only a smartphone would be required, not only those with bank accounts.

When you send money digitally the money is actually transmitted in real time, not just a promise to do so at some point in the future. It’s funny to think that in the year 2020 you can’t settle payments out of banking hours, on weekends or on public holidays 🤯

Money is just ones and zeroes

Money as Data

To my mind one of the biggest advantages is that money can finally be transmitted and processed just like data or any other digital information. This also means it can be programmed and automated and no longer needs to have a separate flow from the rest of you processes.

Your data flow and your payment flow can be combined.

What do I mean by that?

Imagine you have a process where you extract some data for a client, process the data in some way and write the data back to some destination. Classic ETL stuff. Image also that you invoice the customer based on the volume of work done.

The machine to machine economy or M2M will also allow IoT to better fulfil its promise where remote sensors can measure but also make and receive payments.

As it stands today you would make the process for the data and in parallel make some reporting on the process itself so you can send the details to finance every month in order to create an invoice for the service provided.

The analytical process may write some data to a database every second and the payment flow starts by aggregating the data on a monthly basis to see how much work you did last month.

But there’s no reason why in future the process can’t also invoice itself and receive payment itself. It’s just another (or the same) interface into the clients systems, request payment along with an itemised invoice of the work done.

Precessing and Payment occur together

The customers system could validate the invoice versus the work done and make the payment automatically. Testing the payment process would just be another part of the implementation of the ‘normal’ data process.

You have a full audit of the transaction, get payment immediately and avoid involving finance at all after the initial set up. No need to reconcile payments weeks or months after the actual work was done.

  • Vending machines ordering exact stock levels from vending services in order to be filled up, selecting different suppliers based on most competitive prices.
  • Your retail POS could order and make payment on new stock when stock levels drop below a predetermined levels.
  • A delivery drone could pay a number of last mile delivery drones based on best price / speed / payload size etc.
  • Cars could invoice you based on the actual distance driven so car leasing agreements could work more like care sharing apps
  • Your laptop / phone could connect to WiFi routers around it and pay for Internet access. If paid by byte then it could automatically connect to multiple providers at the same time to improve speed.
  • APIs that you use today but are billed by subscription every month or year could be truly pay as you use with billing an integral part of the API

While other areas of tech have raced ahead money is still very much stuck in the 1970’s. Yes we have nice apps and user interfaces but the fundamentals underneath all this are largely unchanged. This will inevitably change in the coming years and digital currencies will play a large part in this.

How to Make Decisions 🤔

For those that like a nice process for making decisions here’s a Decision-Making Tree.

Decision-Making Tree

It may seem lighthearted but most decisions are not actually that important. Most of the time it’s probably better just to choose a path and move forward. If it’s wrong but trivial to fix later why bother delaying over the choice?

“Some decisions are consequential and irreversible or nearly irreversible – one-way doors – and these decisions must be made methodically, carefully, slowly, with great deliberation and consultation. If you walk through and don’t like what you see on the other side, you can’t get back to where you were before. We can call these Type 1 decisions. But most decisions aren’t like that – they are changeable, reversible – they’re two-way doors. If you’ve made a suboptimal Type 2 decision, you don’t have to live with the consequences for that long. You can reopen the door and go back through. Type 2 decisions can and should be made quickly by high judgment individuals or small groups.”

Jeff Bezos

Delete Files Using Alteryx

Yet another custom tool for Alteryx to fill a hole in the file handling capabilities without resorting to using the Run Command tool.

Available to download and install from Github (https://github.com/bobpeers/Alteryx_SDK_Delete) or you can grab a sample workflow including link to the tool from the Alteryx Public Gallery (https://gallery.alteryx.com/#!app/File-Delete/5f4770d8826fd309c4f9e427)

The tool uses the standard Python libraries and installs to the File System tab.

File Deter tool on the Alteryx File System tab

This tool accepts a single input. The tool should be mapped to the full path of the files to delete, usually provided by placing a Directory tool before this tool.

WARNING
Files deleted using this tool will be permanently deleted and are not recoverable. Use with care. You have been warned.

Successful operations will be output to the O-Output. If the file could not be deleted (most likely due to file locking issues) the output will be sent to the E-Output along with the error reason.

Here’s an example of the workflow in use.

Alteryx workflow using the File Delete tool

Happy Deleting 🗑😅

Credentials – Custom Alteryx Tool

Why Saving Credentials is Hard 🔐

Saving credentials for use in an Alteryx workflow is a very common use case. Maybe you use an API and need to save Client keys or secrets and getting these credentials into a workflow without hard coding is tricky.

It’s common to either read from a text file located in a protected location or use an Environment variable to save them. Not ideal especially when there’s a better solution.

Windows has a built-in credentials manager called, not surprisingly, Windows Credential Manager but Alteryx has no native way of accessing saved credentials from here.

The Solution

To solve this issue I’ve created a custom Alteryx tool using the Python SDK that allows you to read saved credentials at runtime into the workflow. The tool simply accepts the credentials name to retrieve and returns the user name and password.

Example workflow using Credential Tool

The credentials need to be saved in the Generic section of Credential Manager.

Generic Credentials section

Installing

The tool can be downloaded from my GitHub page.

GitHub Repository for Credentials Tool

Alternatively you can also download this workflow from the Alteryx Public Gallery and install using the link, https://gallery.alteryx.com/#!app/Credentials/5f3ff1160462d715e4d43869

Just double click the yxi installer file and the tool will appear in the Developer Toolbar.

Tool installed into the Developer Toolbar

I hope you find it useful and as always if you find any bugs please let me know using the comment form 🐞