Embracing Impermanence (or how to check my sbt build works)

Stable trading relationships with nearby countries. Basic human rights. A planet capable of sustaining life. What do these three things have in common?

The answer is that they are all impermanent. One moment we have them, the next moment – whoosh! – they’re gone.

Today I decided I would embrace our new age of impermanence insofar as it pertains to my home directory. Specifically, I wondered whether I could configure a Linux installation so that my home directory was mounted in a ramdisk, created afresh each time I rebooted the server.

Why on earth would I want to do something like that?

The answer is that I have a Scala project, built using sbt (the Scala Build Tool), and I thought I’d clear some of the accumulated cruft out of the build.sbt file, starting with the configured resolvers. These are basically the repositories which will be searched for the project’s dependencies – there were a few special case ones (e.g. one for JGit, another for MarkLogic) and I strongly suspected that the dependencies in question would now be found in the standard Maven repository. So they could probably be removed, but how to check, since all of the dependencies would now exist in caches on my local machine?

A simple solution would have been to delete the caches, but that involves putting some effort into finding them, plus I have developed a paranoid streak about triggering unnecessary file writes on my SSD. So I had a cunning plan – build a VirtualBox VM and arrange for the home directory on it to be a ramdisk, thus I could check the code out to it and verify that the code will build from such a checkout, and this would then be a useful resource for conducting similar experiments in the future.

Obviously this is not quite a trivial undertaking, because I need some bits of the home directory (specifically the .ssh directory) to persist so I can create the SSH keys needed to authenticate with GitHub (and our internal GitLab). Recreating those each time the machine booted would be a pain.

After a bit of fiddling, my home-grown solution went something like this:

  • Create a Virtualbox VM, give it 8G memory and a 4G disk (maybe a bit low if you think you’ll want docker images on it; I subsequently ended up creating a bigger disk and mounting it on /var/lib/docker)
  • Log into VM (my user is daniel), install useful things like Git, curl, zip, unzip etc.
  • Create SSH keys, upload to GitHub / GitLab / wherever
  • Install SDKMAN! to manage Java versions
  • Create /var/daniel and copy into it all of the directories and files in my home directory which I wanted to be persisted; these were basically .ssh for SSH keys, .sdkman for java installations, .bashrc which now contains the SDKMAN! init code, and .profile
  • Save the following script as /usr/local/bin/create_home_dir.sh – this wipes out /home/daniel, recreates it and mounts it as tmpfs (i.e. a ramdisk) and then symlinks into it the stuff I want to persist (everything in /var/daniel)
#!/bin/bash
DIR=/home/daniel

mount | grep $DIR && umount $DIR

[ -d $DIR ] && rm -rf $DIR

mkdir $DIR

mount -t tmpfs tmpfs $DIR

chown -R daniel:daniel $DIR

ls -A /var/daniel | while read FILE
do
  sudo -u daniel ln -s /var/daniel/$FILE $DIR/
done
  • Save the following as /etc/systemd/system/create-home-dir.service
[Unit]
description=Create home directory

[Service]
ExecStart=/usr/local/bin/create_home_dir.sh

[Install]
WantedBy=multi-user.target
  • Enable the service with systemctl enable create-home-dir
  • Reboot and hope

And it turns out that this worked; when the server came back I could ssh into it (i.e. the authorized_keys file was recognised in the symlinked .ssh directory) and I had a nice empty workspace; I could git clone the repo I wanted, then build it and watch all of the dependencies get downloaded successfully. I note with interest that having done this, .cache is 272M and .sbt is 142M in size. That seems to be quite a lot of downloading! But at least it’s all in memory, and will vanish when the VM is switched off…

STEM Ambassadors In the Field

(Or a fun way to introduce local kids to programming)

A previous employer encouraged me to join the STEM Ambassador program at the end of 2017 (https://www.stem.org.uk/stem-ambassadors) and I willingly joined, wanting to give something back to society. The focus of the program is to send ambassadors into schools and local communities, to act as role models and to demonstrate to young people the benefits and rewards that studying STEM subjects can bring. I approached my local primary school (at the time my daughter was a pupil there) about the possibility of setting up an after-school computing club, and they jumped at the chance.

I started the club unsure what to expect, but with a lot of hope and some amount of trepidation. I took on groups of 10 or so KS2 pupils, teaching them the basics of loops, events, variables and functions, largely using Scratch (https://scratch.mit.edu/) and an eclectic mix of programmable robots that I’d acquired over the years (I have a few from Wonder Workshop https://www.makewonder.com/robots/ and also a pair of Lego Boost robots https://www.lego.com/en-gb/product/boost-creative-toolbox-17101). Running the club was extremely rewarding. Some of the kids were brilliant, and will no doubt have a great future ahead of them. Others mainly wanted only to drive the robots around – but I figured that as long as they were having fun then their and my time was well spent.

Then, in March 2020, the Covid-19 pandemic hit. Kids were sent home for months, and all clubs were cancelled, with no knowing when they might start up again. The pandemic has obviously been tough for everyone, but one of the hidden effects has been the impact on the education of our children. It will take years, probably, to know exactly what effect two years of lockdown has had on the attainment opportunities and mental heath of young people. Many of them missed out not only on in-person schooling, but also on all the additional extra-curricular opportunities like school visits, and also things like the STEM Ambassador program.

So now, two years and a change of jobs later, I thought it was about time I got myself back in the field, and start up my STEM activities again.

My first opportunity has been to run a “retro games arcade” stall at the school’s summer fair. This involved commandeering a tiny wooden cabin plonked the wrong-way-round on the edge of the school field, next to one of the temporary classrooms. To turn this into a games arcade I needed to black out the windows to make it dark enough inside to see a computer screen, then to run an extension lead out of the window of the classroom, and to quietly steal a few chairs and tables upon which to set up my “arcade consoles”. Blacking out the windows was achieved by covering them up with garden underlay and sticking drawing pins around the edges (much to the detriment of my poor thumbs).

The field and cabin in which I did my STEM Ambassadoring, with the (mostly willing) assistance of my daughter

For the arcade machines, I wrote two games in Scratch based around the classic arcade games “Defender” and “Frogger”. I set up two laptops to run these games, covering over all but the arrow keys, trackpad and spacebar with shiny card. My aim was to write games that the students could replicate themselves, if they wished. I wanted games simple enough that a small child could play, but would also be fun for an older child or a parent to play as well. The gameplay should ideally last for 1-4 minutes, and the player should be able to accumulate a high score. As the afternoon progressed I kept track of the highest two scores in each game so that the players with these scores could win a prize at the end of the afternoon.

If you’re interested in seeing these games then you can have a look here:

Defender: https://scratch.mit.edu/projects/711066827/

Frogger: https://scratch.mit.edu/projects/317968991/

Of course the afternoon in question was one of the hottest days of the year. I spent 3 hours diving into and out of the tiny sweltering cabin, caught between managing the queue, taking the 50p fee, handing out Pokémon cards to the players (I got a stack of them and gave one out to every player), explaining to the kids how to play the games, and keeping track of the ever-changing high scores. I did have willing help from my daughter (who especially liked taking the money) and my husband (who seemed adept at managing the queue). At some point I managed to eat a burger and grab a drink, but it was a pretty frenetic afternoon.

67 Bricks agreed to give me £50 to pay for prizes. I bought Sonic and Mario soft toys, a Lego Minecraft set, and a large pack of assorted Pokémon cards. I also washed up a Kirby soft toy that I found in my daughter’s “charity shop” pile and added that to the prize pool. Throughout the afternoon I kept track of the top two highest scores in both games, using the incredibly high-tech method of a white-board and dry-wipe marker. The hardest part was figuring out how to spell everyone’s name, and in moving the first-place score to second place every time a high-score was beaten. Oh, and making sure the overly-enthusiastic children didn’t wander off with poor Mario before the official prize giving ceremony.

As the afternoon progressed I encountered some kids who aced the games, and actively competed with each other to keep their place at the top of the leader board. Other children struggled to control the game and I had to give them a helping hand (quite literally – I said I would control the cursor keys while they controlled the space bar). And then there was the dad who was determined to win a prize for his child, and kept returning to make sure of his position on the leader board. But eventually the last burger was eaten, the arcade was closed, and the prizes announced. Four children went home happily clutching their prizes and the rest their collection of assorted Pokémon cards.

For the next step in my STEM Ambassador journey, I have agreed to start up the computing club again in September. I’m hoping to teach the children the skills to write their own arcade games in Scratch. Watch this space.

Things Customers Don’t Understand About Search

Note, this post is based on a dev forum put together by Chris.

Full-text search is a common feature of systems 67 Bricks build. We want to make it easy for users to find relevant information quickly often through a faceted search function. Understanding user needs and building a top notch user experience is vital. When building faceted search, we generally use either ElasticSearch (or AWS’s OpenSearch) or MarkLogic. Both databases offer very similar feature sets when it comes to search, though one is more targetted towards JSON based documents and the other, XML.

Search can seem magical at first glance and do some amazing things but this can lead to situations where customers (and UX/UI designers) assume the search mecahnism can do more than it can. We frequently find a disconnect between what is desired and what is feasable with search systems.

There are 2 main categories of problems we often see are:

  1. Customers / Designers asking for things that could be done, but often come with nasty performance implications
  2. Features that seem reasonable to ask for at first glance, but once dug into reveal logical problems that make developing the feature near impossible

Faceted Search

Faceted search systems are some of the most common systems we build at 67 Bricks. The user experience typicallly starts with a search box that they enter a number of terms into, hit enter and then be presented with a list of results in some kind of relevancy order. Often there is a count of results displayed alongside a pagination mechanism to iterate through the results (e.g. showing results 1-10 of 12,464). We also show facets, counts for how many results fit into different buckets. All this is handled in a single query that often takes less than 100ms which seems miraculous. Of course, this isn’t magic, full-text search systems use a variety of clever indexes to make searching and computing the facet counts quick.

Lets make a search system for a hypothetical website cottagesearch.com. Our first screen will present the user with some options to select a location, the date range they want to stay and how many guests are coming. We perform the search and show the matching results. How should we display the results and more importantly, how do we show the facets?

Let’s say we did a search for 2 bedroom cottages. We’ve seen wireframes for a number of occassions where the facet count for all bedroom numbers are displayed. So users see the number of results applicable to each bedroom count they would get if they didn’t limit the search to just 2 bedrooms (i.e. there aren’t that many 2 bed options, but look at how many 3 bed options are available). At first glance, this seems like a sensible design, but fundmanetally breaks how search systems work with faceting, they will return counts, but only for the search just done.

We could get around this by doing 2 searches, 1 limited by bedrooms and one that does not to retrieve the facet counts. This may seem like a sensible idea when we have 1 facet, but what do we do when we have more? Do we need to do multiple searches, effectively making an N+1 problem? How to we display numbers? Should the counts for the location facet include the limit of bedrooms or not? As soon as we start exploring additional situations we start to see the challenges the original design presents.

This gets harder when we consider non-exclusionary facets. Let’s say our cottage search system lets you filter by particular features, such as a wood burner, hot tub or dishwasher. Now, if we show counts of non-selected facets, what do these numbers represent? Do they include results that already include the selected facet or not? Here, the logic starts to break down and becomes ever more confusing to the end user and difficult to implement for the developer.

Other Complex Facet Situations

A common question we need to ask with non-exclusionary facets: Is it an AND or an OR search? The answer is very domain dependant, but either way we suggest steering away from facets counts in these situations.

Date ranges provide an interesting problem, some sites will purposefully search outside of the selected range so as to provide results near the selected date range. This may be a useful or annoying depending on what the user expects and is trying to achieve. Some users would want exact matches and would have no interest in results that do not meet the selected date range.

Ordering facets is also a questions that may be overlooked. Do you order lexographically or do you order by descending number of matches? What about names, year ranges or numeric values? Again, a lot of what users expect and would want comes down to the domain being dealt with and the needs of the users.

When users select a new facet, what should the UI do? Should the search immediately rerun and the results and facets update or should there be a manual refresh button the user has to select before the search is updated? An immediate refresh would be slower, but let users narrow down carefully, while a manual update would reduce the number of searches done, but then users may be able to select a number of facets in such a way that no results would be returned.

Hierarchies can also prove tricky. We often see taxonomies being used to inform facets, say subjects with sub categories. How should these be displayed? Again there are many solutions to pick from with different sets of trade-offs.

Advanced Search

Advanced search can often be a bit like a peacocks tail – something that looks impressive, but doesn’t contribute a fair share of value based on how much effort it takes to develop. A lot of designers and product owners love the idea of it but in practice, it can end up being somewhat confusing to use and many end users end up avoiding it.

Boolean builders exist in many systems where the designer of advanced search will insist on allowing users to build up some complex search with lots of boolean AND/OR options, but displaying this to users in a way they can understand is challenging. If a user builds a boolean search such as: GDP AND Argentina OR Brazil do we treat it as (GDP AND Argentina) OR Brazil or should it be interpreted as GDP AND (Argentina OR Brazil). We could include brackets in the builder, but this just further complicates the UI.

We frequently get bugs and feedback on advanced search, some of this feedback can amount to different users having contradictory opinions on how it should work. We would ask product owners to carefully consider “How many people will use it?” Google has a well build UI for advanced search that does away with the challenges of boolean logic by having separate fields for ANDs ORs and NOTs.

Google advanced search UI

An advanced search facility can introduce additional complexity when combined with facets. If an advances search lets you select some facets before completing the search, does this form part of the string in the search box? We have had mixed results with enabling power users to enter facets into search fields (e.g. bedrooms:3), but this can be tricky, some users can deal with it, but others may prefer a advanced search builder while others will rely on facets post search.

Summary

In conclusion, we have 3 main takeaways

  • Search is much more complex than it first appears
  • Facets are not magic, just because you can draw a nice wireframe doesn’t make it feasable to develop
  • Advanced search can be tricky to get right and even then, only used by a minority of users

We’ve build many different types of search and have experimented with a number of approaches in the past and we offer some tried and tested principles:

  • Make searches stateless – Don’t add complexity by trying to maintain state between facets changes, simply treat each change as a fresh search. That way URLs can act as a method of persistence and bookmarking common searches.
  • Have facets only display counts for the current search and do not display counts for other facets once one has been selected within that category.
  • Only use relevancy as the default ordering mechanism – You may be tempted to allow results to be ordered in different ways, such as published date, but this can cause problems with weakly matching, but recent results appearing first.
  • Don’t build an advanced search unless you really need to and if you have to, use a Google style interface over a boolean query builder.
  • Check that search is working as expected – Have domain experts check that searches are returning sensible results and look into using analytics to see if users are having a happy journey through the application (i.e. run a search and then find the right result within the first few hits).
  • Beware of exhaustive search use cases – As many search mechanisms work on some score based on relevancy to the terms entered, having a search that guarantees a return of everything relevant can be tricky to define and to develop.