Thursday, May 30, 2024

Binary Search

I am rewriting an old app I made where people can record measurements over time (days or weeks). One log has many measurements. The last time I did this, I used PostgreSQL, and I made a one-to-many relationship between the logs table and the measurements table. So there was a foreign key in the measurements table referencing the id column of the logs table.

The way the app worked, it pull 100% of the measurements for a log when presenting the log to the user on the Log "show page". I think this was a slow way to retrieve measurement data. Using an index to lookup individual, small, measurements seems very slow. Doing a scan of the measurements table instead of using the index also seemed slow.

Besides collecting the measurements, I asked the Postgres sort them by date. I needed (wanted) them to be in sorted order to render them in a visual graph.

This time around, I am still using PostgreSQL, but I am taking a no-SQL approach. I am storing all the measurements on the log in a JSONB column in the logs table.

To make this approach work, I have to keep the measurements sorted. There is probably a way to have Postgres sort by a field in the JSON but I don't want to bother with that, neither its programming support nor its computational cost. It just seems way simpler to maintain order among the measurements every time I write a log record to disk.

This means that when I add a measurement, I have to do a search to find the measurement's insertion point. Preferably a binary search.

I didn't have to do this when I was depending on SQL to do all the work for me. But aspiring for what I've stated is better performance and simpler data flow implied figuring this out for myself.

I wrote an implementation of all of this. I found bugs in my program. I initially tested the binary search and it looked good. But my graph was not rendering correctly at all. I investigated things on the front end and discovered that the measurements were not being added in sorted order. This was despite my early database unit tests that suggested that things were working. I invested things on the backend and found that I was miscalculating the insertion index. I investigated the insertion index calculation and found the bug. But how to fix it? This algorithm, which isn't really complicated in theory, was turning into a 4+ hour endeavour across two days, and it reminded me of how stupid I can be. I eventually, after some hacking and paperwork, found a simpler algorithm than the one I originally came up with, and a correct algorithm.

Shortly after the above, I had to implement binary search on the measurements array on the front end. Having the backend approach as relevant experience helped.

No comments:

Post a Comment