Looking Up Offset Rows in PostgreSQL

A common task when either reporting or doing data analysis is to extract data from a database table or view and then lookup corresponding values from either previous or next rows.

A good example of this was a recent KPI report I made for E-commerce where the KPI depended not only on the total daily volumes of orders received but also the the total on each of the previous two days. Therefore I needed two extra columns with the previous days order volume (p1dayorders) and the order volume from two days previously (p2dayorders).

Using the Postgres LAG() function it’s easy to achieve as you can see below. The interesting part is highlighted in red.

Postgres LAG() function

The text formatted version that can be copied is available below.

SELECT osh."Division","CreationDateTime"::date,count(distinct osh."OrderNumber") as dailyorders,
	LAG(COUNT(DISTINCT osh."OrderNumber"),1)
		OVER (PARTITION BY osh."Division"
		ORDER BY "CreationDateTime"::date) as p1dayorders,
	LAG(COUNT(distinct osh."OrderNumber"),2)
		OVER (PARTITION BY osh."Division"
		ORDER BY "CreationDateTime"::date) as p2dayorders
FROM "OrderSyncHeader" osh
WHERE "osh"."CreationDateTime" >= '2019-07-01'::date
GROUP BY osh."Division",
	 osh."CreationDateTime"::date

The resulting output shows that we now have the daily order volumes in the dailyorders column, total from the previous day in the p1dayorders column while p2dayorders shows total from two days back.

Output from the LAG() function

Note the [null] values in the first two rows. This is caused by the data being outside of the window since we do not have previous days data for the first record returned. If you wish to return another value instead of NULL this is also possible by using the optional default argument. This code will return 0 for missing values.

LAG(COUNT(DISTINCT "osh"."OrderNumber"),1,0::bigint) 

The LEAD() function as you might guess does a similar task but instead of looking back looks forward. The syntax is otherwise identical.

Postgres LEAD() function

Gives this output with n1dayorders and n2dayorders being the following one and two days order counts.

Output from the LEAD() function

The Right Tool for the Job

Arguably the wrong tool for the job

During my career I’ve heard this countless times and to some extent it’s just taken as granted that you should always use the right tool for the job. Isn’t this obvious after all? Why would you knowingly choose the wrong tool?

But the conversations often miss the realities surrounding the choice. Decisions are not made in isolation. What does best even mean? Fastest, easiest to implement, doesn’t require consultants? Instead there’s a myriad of influencing factors:

  • Currently available skills
  • Currently available tools
  • Time frames and deadlines
  • Availability and pricing of new tools
  • Learning curve for new tools
  • Cost vs benefit / ROI
  • Expected lifetime of the solution

In this day and age there’s an almost unlimited number of ways to solve a particular problem. You want to extract data from a database, process it and present the results somewhere? If you’re an analyst you might do this in Excel, a programmer might use python, a business user might use a robot and so on. All of these are possible.

Tableau Prep

A growing trend also seems to be that tools and platforms increasingly have overlapping functionality making the decision even less clear.

Tableau has Prep which is basically an ETL-lite tool, robots can pull data directly from databases, almost everything can send mails, parse JSON and XML and connect to APIs.

This often comes up when talking about Robotic Process Automation, or RPA. I’ve used many ETL tools like Alteryx, KNIME and RapidMiner and they are great at extracting and processing data, but RPA can also be used to move and process data between systems. I wouldn’t recommend using RPA to pull millions of rows of data but it could be done.

Similarly a programmer might point out that a user interface could be automated using python and selenium at a fraction of the cost of an Enterprise RPA solution. This is technically true but if you don’t have a dedicated team of python experts in your organisation what does that help?

There is always a grey area when the problem could be solved in many different ways using an array of tools and platforms each with their own trade-offs.

In my experience from business the main limitations are both financial ones and available skill sets.

Let’s say your company has invested a significant sum in RPA. You have a direct connection to the database and need to extract and process a large volume of data. You know that an ETL platform would be better but how are you going to persuade your manager to invest? If you do invest do you then have right skills to get the full value from your investment? How long will the ‘best’ solution take to get up and running?

Instead of asking if this is the best tool for the job, ask whether compared to using the ‘right’ tool:

  • Are there significant risks to success
  • What functionality will I potentially miss
  • How would the process differ
  • Can it be easily maintained/extended
  • Will it last the expected lifetime of the process

Life is full of compromises and you often need to make do with what you have. When you only have access to a hammer everything needs to look like a nail 🔨🔨

WordPress File Size Limits on Nginx

So my WordPress is up an running and inevitably I hit a few roadblocks on the way. While trying to upload a video I encounter the classic ‘HTTP error’ that seems to be almost always due to file size limits on the server.

Wordpress http error on file upload
WordPress upload error

In my case the fix was simple but requires changing both setting for PHP and the Nginx web server. First fix PHP by logging into the server and running these commands.

cd /etc/php/7.2/fpm/
sudo nano php.ini

Find and edit these lines in the file, feel free to pick your own limits. Close (CTRL+X) and save (Y+ENTER) the file.

upload_max_filesize = 96M
post_max_size = 96M 

Then restart the PHP service.

sudo systemctl restart php7.2-fpm

Next we need to change the Nginx web server settings. Instead of doing this globally for the entire server I did this for my WordPress site only by editing the server block for Ngnix. Note that on my server I have the server block in the /etc/nginx/sites-available/ directory and use a symlink in the /etc/nginx/sites-enabled/ directory pointing to it. Replace your_server_name with your server name 🤔

cd /etc/nginx/sites-available/
sudo nano your_server_name

Simply add the line client_max_body_size 96M; to the server section of the file, close and save.

server {
         root /var/www/creativedata;
         index index.php index.html index.htm index.nginx-debian.html;
         server_name creativedata.stream www.creativedata.stream;
         client_max_body_size 96M;
         ...

Restart Nginx to load the change.

sudo systemctl restart nginx

Now when you upload media in WordPress you will see the new file size limit. Done!

WordPress file size limit

Sixteen Years of Learning

One of the greatest things about the internet is that nothing is forgotten. Of course this has also turned into one of it’s greatest risks with the rise of social media.

I used to run my own website starting in 2003 until around 2013, when I removed the site. It was written by myself in PHP with a MySQL database. Everything was hand coded from scratch including all the HTML, CSS and PHP. That’s what you can do before you have kids!

Fortunately the Wayback Machine has cached copies of almost the whole site so it’s easy to look back and see what I was playing around with back then. A virtual trip down memory lane.

Archived view of bobpeers.com

I was using Fedora Core 6 back then (I started on Fedora Core 4 if I remember correctly) which came on either as a DVD iso or spread across 6 CD iso files. You can still download it from their archives although I wouldn’t recommend it.

Fedora Core 6 archive repository
Fedora Core 6 archive

I was heavily into Linux at the time and had many pages on very specific Linux issues, mounting external logical volumes, installing Linux, installing VNC and SSH. Really nerdy stuff 🤓

There was also lots of general low level stuff. Connecting to IMAP and POP mail boxes using the command line. Not something you need to do every day. I also spent quite a bit of time compiling my own Linux kernels with the main aim being to decrease the boot time on my laptop. I got it down to about 15 seconds in the end 🔥🔥

I don’t spend as much time with the details these days and often choose products that ‘just work’.

I’ve got older and my time is more valuable now so I feel the need to focus more on what really gives value to learn.

The key is that these years spent ‘playing around’ taught me an enormous amount and gave me a much deeper understanding around technology. This has been immensely valuable to me in my career even if that was not the prime driver at the time.

Hello world!

This image sums up how things are going to be around here. It won’t be narrowly focused, maybe not even focused at all. More of an experiment in ‘Doing fun things with technology’™.

The main purpose of this site is to have a place to store my own content. Sounds really retro, blogs are so 00’s right? But as LinkedIn, Medium, Facebook etc. made sharing content free and easy they also made it theirs. You write for them and you live by their rules. If they decide to remove, hide or edit your content there’s nothing you can do. And if they one day cease to be then it’s goodbye to all your hard work.

This site is self hosted using WordPress on my own virtual server. My site, my rules, my content.

Welcome and enjoy!