The First 67 Bricks Architectural Kata

On the 13th of October, 67 Bricks held its first Architectural Kata as an opportunity for developers to practice and experience architectural design. Ted Neward, the original creator of architectural katas, puts forward the argument for them quite succinctly:

  “So how are we supposed to get great architects, if

  they only get the chance to architect fewer than

  a half-dozen times in their career?”

Ted Neward

The kata exercise was quite simple to prepare for and run. Beforehand I picked a number of interesting katas from Neal Ford’s list, suitably anglicised them and prepared handouts for both online and in-person teams. Then on the day, it was a case of assigning everyone into teams and explaining the steps we’d be going through that day:

  1. Gather into teams and read through the assigned case study
  2. Discuss for the next 50 minutes and come up with an architecture, making sure to document any assumptions. Teams could ask me questions about the case study and I’d happily provide extra detail
  3. Each team would then take turns presenting their architecture to the other teams and fielding questions from them
  4. Finally a voting phase where everyone else gets the chance to show thumbs up/down/sideways to indicate how well they thought the presenting team did

Drivers

As a technical consultancy, we are often involved in architectural design of systems we create or take part in creating. Each client is individual; they seek different opportunities, are subject to different constraints and have different technical strategies. It’s critical for 67 Bricks’ success to have skilled architects able to influence and design suitable systems. 

Though, how can we develop these skills? As Ted Neward identified, most only get a half-dozen tries at it over their career. While books, online courses and talks help, knowledge needs to be applied and feedback loops closed to truly improve. We’ve never tried an event like a Kata before and I was interested to see what we could learn from it.

Sharing ideas and learning from one another effectively can be a challenge for any organization and made all the harder thanks to COVID-19 prompting a rapid move to remote working. The exercise looked to provide a great opportunity for participants to meet and work with others they may not otherwise have the chance to.

Kata Afternoon

Initially I hoped to get some of the more experienced technical leads to kick off the afternoon by talking about how they architect systems. This proved difficult. Those who I spoke to either thought they didn’t know that much or didn’t feel like they could speak to an audience well about how to do architecture. As such these initial talks didn’t happen and raised a lot of questions around how to arm participants with enough knowhow to feel comfortable tackling the task. I fear this may have led to some teams struggling too much to learn and make effective progress.

Splitting teams took some thought, I wanted to be sure each team had a mix of experience and tried to aim for teams of individuals who haven’t worked together before. I’m glad I spent the time to do this, the Kata would not have had the same impact if everyone was in the same team they usually are (one of the Kata rules is to try and break up regular teams).

Running the actual Kata was reasonably straightforward, if a little awkward thanks to having to manage both online and an in-person group. Some online teams got stuck waiting for me to join their Zoom room to answer their questions. While the in-person group was easy to recognise if they needed a question answered or a nudge in the right direction. 

I tried to prepare some broader scope for each case study before the event in preparation for questions being asked, I quickly found myself having to improvise. I actually found this surprisingly fun. Some teams may have made things harder for themselves by asking too many detailed questions and trying to cover every single detail in their architecture.

The presentation phase had mixed results, some teams did a strong job and were able to present their ideas well, answer questions clearly and came up with really suitable architectures. Other teams struggled a bit more, both with coming up with a suitable architecture and being able to communicate it with everyone else.

Feedback and Lessons Learned

Feedback was overwhelmingly positive, lots of participants really enjoyed the experience. They liked working with different people and felt they learned a lot from one another about how they tackle this type of task.

I think aiming for mixed team compositions was good and allowed individuals to have interactions with colleagues they wouldn’t normally work with. I’ve had feedback a number of times that with the move to remote, developers felt a lot more insular in their teams and I hoped this gave a chance to break out of those groups.

Doing both in-person and remote teams did make running the Kata a little challenging. I felt that I couldn’t effectively keep an eye on all the teams making it tough to know when to nudge a team that may be getting a little stuck or provide an answer to a pressing question.

Next time I would look at including publishing consultants to act as clients. I perhaps enjoyed being a fickle client a little too much. Having a more dedicated individual to represent the client creates opportunities for encouraging more of a discussion, simulating something closer to the real world when we work with our clients. It would also make it easier to manage the event and our publishing consultants can gain some insight to the architecture process. 

Conclusions

Given the positive response, I would happily organise another Kata in the future. If you are thinking about running one for your own organisation, I would highly recommend it. It’s a great chance to meet and learn from other devs. I certainly learned a lot simply observing the various teams taking different approaches to their case studies and the variety of solutions they came up with.

What have I been listening to?

A while ago, Tim suggested we could have a #now-listening channel in our company Slack, in which people could post details of what they were listening to. It occurred to me that it might be a fun challenge to try to figure out from what I’d posted on there who my favourite artist was, and which was my most-listened-to album. So I rolled up my sleeves and got to work. This is an account of what I did and my various thought processes as I went along…

Challenge: figure out how to get my posts from our #now-listening channel and do some statistics to them.

Session 1: After school run, but before work…

Start – there’s an API. https://api.slack.com

Read the documentation: https://api.slack.com/methods/search.messages looks useful – how do I call it?

I NEED A TOKEN! Aha – https://api.slack.com/apps – a “generate token” button…

Access token: xoxe.xoxp-blah-blah-blah. SUCCESS!

First obvious question: has someone done this already? Google knows everything: https://github.com/slack-scala-client/slack-scala-client

Create a new project: sbt new scala/scala-seed.g8 – add dependency on slack-scala-client, ready to rock! In such a hurry; I can’t even be bothered to set up a package, just hijack the Hello app that came in the skeletal project…

From docs:

val token = "MY TOP SECRET TOKEN"
implicit val system = ActorSystem("slack")
try {
  val client = BlockingSlackApiClient(token)
  client.searchMessages(WHAT TO PUT HERE?)
} finally {
  Await.result(system.terminate(), Duration.Inf)
}

… maybe something like…?:

val ret = client.searchMessages("* in:#67bricks-now-listening from:@Daniel", sort = Some("timestamp"), sortDir = Some("asc"), count = Some(5))

RUN IT

Fails. Because HelloSpec fails (I mentioned I just hijacked the OOTB Hello app). Fix with the delete key.

RUN IT

[WARN] [11/26/2021 08:40:37.418] [slack-akka.actor.default-dispatcher-2] [akka.actor.ActorSystemImpl(slack)] Illegal header: Illegal 'expires' header: Illegal weekday in date 1997-07-26T05:00:00: is 'Mon' but should be 'Sat'
Exception in thread "main" slack.api.ApiError: missing_scope
at slack.api.SlackApiClient$.$anonfun$makeApiRequest$3(SlackApiClient.scala:92)

:-​(

Google: “missing_scope” and interpret results

The token used is not granted the specific scope permissions required to complete this request.

:-​( :-​(

Maybe I have to create an app and add it to the workspace? I’ll try that.

Created, figured out how to add the user token scope “search:read” – and I got a new token!

Token= xoxp-blahblahblah

Rerun: I got a response!

{
  "ok":true,
  "query":"* in:#67bricks-now-listening from:@Daniel",
  "messages": {
    "total":0,
    "pagination": {
      "total_count":0,
      "page":1,
      "per_page":5,
      "page_count":0,
      "first":1,
      "last":0
    },
    "paging": {
      "count":5,
      "total":0,
      "page":1,
      "pages":0
    },
    "matches": []
  }
}

:-​(

Let’s just search in the channel without specifying a name…?

val ret = client.searchMessages("in:#67bricks-now-listening", sort = Some("timestamp"), sortDir = Some("asc"), count = Some(5))

Gives:

{
  "ok": true,
  "query": "in:#67bricks-now-listening",
  "messages": {
    "total": 7113,
    "pagination": {
      "total_count": 7113,
      "page": 1,
      "per_page": 5,
      "page_count": 1423,
      "first": 1,
      "last": 5
    },
    "paging": {
      "count": 5,
      "total": 7113,
      "page": 1,
      "pages": 1423
    },
    "matches": [
      {
        "username": "daniel.rendall",
        "other": "field_here"
      }
    ]
  }
}

Aha! My username is daniel.rendall, let’s try that:

val ret = client.searchMessages("in:#67bricks-now-listening from:@daniel.rendall", sort = Some("timestamp"), sortDir = Some("asc"), count = Some(5))

Gives:

{
  "ok": true,
  "query": "in:#67bricks-now-listening from:@daniel.rendall",
  "messages": {
    "total": 3213,
    "pagination": { ... etc }
  }
}

Success! Also – 3213 messages – sounds plausible. This is looking good… but sort direction seems wrong…? Try switching to “desc” => same result.

(Time spent so far: about half an hour – better stop or will miss the morning call!)

Session 2: Re-run – still works (hooray!)

Copy and paste output and save as response.json, fix up with jq so I can examine it:

cat response.json | jq '.' > response_tidied.json

And now:

"pagination": {
  "total_count": 3234,
  "page": 1,
  "per_page": 5,
  ... etc

Number has gone up – I’m still listening to things!

So, I could parse the responses to work out what the next page should be, or I could just loop – with pages of size 100 (if the API will return them) there should be 33. So we will loop and save these as 1.json, 2.json etc. First rule of scraping – aim to do it just once and save the result locally.

Horrible quick and dirty code alert!

val outDir = new File("/home/daniel/Scratch/slack/output")
def main(args: Array[String]): Unit = {
  outDir.mkdirs()
  implicit val system = ActorSystem("slack")
  try {
    val client = BlockingSlackApiClient(token)
    (1 to 33).foreach { pageNum =>
      try {
        val ret = client.searchMessages("in:#67bricks-now-listening from:@daniel.rendall",
sort = Some("timestamp"),
sortDir = Some("desc"),
count = Some(100),
page = Some(pageNum))
        Files.write(new File(outDir, "" + pageNum + ".json").toPath, ret.toString().getBytes(StandardCharsets.UTF_8), StandardOpenOption.CREATE)
        println(s"Got page $pageNum")
      } catch {
        case NonFatal(e) =>
          println(s"Couldn't get page $pageNum - ${e.getMessage}")
      }
      Thread.sleep(1000)
    }
  } finally {
    Await.result(system.terminate(), Duration.Inf)
  }
}

… prints up a reasuring list “Got page 1” => “Got page 33” and no (reported) errors!

Second rule of scraping – having done it and got the data, zip it up and put it somewhere just in case you destroy it…

Tidy it all (non essential, but makes it easier to look at):

mkdir tidied
ls output | while read JSON ; do cat output/$JSON | jq '.' > tidied/$JSON ; done

On scanning the data – it looks plausible, I can’t see an obvious “date” field but there’s a cryptic “ts” field (sample value: “1638290823.124400”) which is maybe a timestamp? A problem for another day…

(Time spent this session: about 20 minutes)

Session 3: I can haz stats?

Need to load it in. A new main method in a new object…

val outDir = new File("/home/daniel/Scratch/slack/output")
def main(args: Array[String]): Unit = {
  val jsObjects = outDir.listFiles().map { f =>
    Json.parse(new FileInputStream(f))
  }
  println(jsObjects.head)
}

Prints something sensible. Now need to get it in a useful form: define simplest class that could possibly work.

case class Message(iid: UUID, ts: String, text: String, permalink: String)
object Message {
  implicit val messageReads: Reads[Message] = (
  (__ \ "iid").read[UUID] and
  (__ \ "ts").read[String] and
  (__ \ "text").read[String] and
  (__ \ "permalink").read[String]
  ) (Message.apply _)
}

Not sure if I need the ID, but I like IDs. Looks like a UUID.

… oh, also some classes to wrap the whole result with minimum of faff (and Reads, omitted for brevity):

case class SearchResult(messages: Messages)
case class Messages(total: Int, matches: Seq[Message])

Go for broke:

val jsObjects: Array[JsResult[Seq[Message]]] = outDir.listFiles().map { f =>
  Json.parse(new FileInputStream(f)).validate[SearchResult].map(_.messages.matches)
}

Unpleasant type signature alert – Array[JsResult[Seq[Message]]] Let’s assume nothing will go wrong and just use “.get” and “.flatMap”:

val messages: Seq[Message] = outDir.listFiles().flatMap { f =>
  Json.parse(new FileInputStream(f)).validate[SearchResult].map(_.messages.matches).get
}.toList

That gives me 3234 Message objects, which is reassuring. They include top-level messages, and responses to threads. As far as I can see, the thread responses include a ?thread_ts parameter in their permalink, therefore filter them out – leaves 1792 remaining.

val filtered = messages.filterNot(_.permalink.contains("?thread_ts"))
filtered.take(10).map(_.text).foreach(println)

…and voila:

The things I’m looking for will all have the format “Artist – Album”. Regex time!

val ArtistAlbumRegex = "(.*?) - (.*)".r("artist", "album")

Wait, what…? “@deprecated(“use inline group names like (?<year>X) instead”, “2.13.7”)”

Didn’t know that had changed. Ho hum…

val ArtistAlbumRegex: Regex = "(?<artist>.*?) - (?<album>.*)".r

  case ArtistAlbumRegex(artist, album) => ArtistAndAlbum(artist, album)
}
artistsAndAlbums.take(10).foreach(println)

val artistsAndAlbums = messages.filterNot(_.permalink.contains("?thread_ts")).map(_.text).collect {
  case ArtistAlbumRegex(artist, album) => ArtistAndAlbum(artist, album)
}
artistsAndAlbums.take(10).foreach(println)

Even more promising:

Getting there! Now, there are bound to be loads of duplicates. So I guess the most obvious thing to do is count them. Let’s see if I can find the albums I’ve listened to the most, and their counts. I’m going to define a canonical key for grouping an ArtistAndAlbum just in case I’ve not been completely consistent in capitalisation.

case class ArtistAndAlbum(artist: String, album: String) {
  val groupingKey: (String, String) = (artist.toLowerCase, album.toLowerCase)
}

Then we should be able to count by:

val mostCommonAlbums = artistsAndAlbums.groupBy(_.groupingKey)
.view.map { case (_, seq) => seq.head -> seq.length }.toList.sortBy(_._2)sorted.take(10).foreach(println)

(The Bob Lazar Story – Vanquisher,1)
(The Pretenders – Pretenders (152),1)
(Saxon – Lionheart,1)
(Leprous – Tall Poppy Syndrome,1)
(Tom Petty and the Heartbreakers – Damn the Torpedoes (231),1)
(2Pac – All Eyez on Me (436),1)
(Bon Iver – For Emma, Forever Ago (461),1)
(James – Stutter,1)
(Elton John – Honky Château (251),1)
(Ice Cube – AmeriKKKa’s Most Wanted,1)

Ooops – wrong way – also the numbers in brackets need to be removed. Not sure there’s a nicer way to invert the ordering then explicitly passing the Ordering that I want to use…

val mostCommonAlbums = artistsAndAlbums.groupBy(_.groupingKey)
.view.map { case (_, seq) => seq.head -> seq.length }.toList.sortBy(_._2)(Ordering[Int].reverse)
mostCommonAlbums.take(10).foreach(println

(Benny Andersson – Piano,8)
(Meilyr Jones – 2013),7)
(Brian Eno – Here Come The Warm Jets),4)
(Richard &amp; Linda Thompson – I Want To See The Bright Lights Tonight),4)
(Admirals Hard – Upon a Painted Ocean,4)
(Neuronspoiler – Emergence,4)
(Global Communication – Pentamerous Metamorphosis),4)
(Steely Dan – Countdown To Ecstasy,4)
(Pole – 2,4)
(Faith No More – Angel Dust,4)

That looks plausible, actually. I like Piano. I’m guessing there are loads of other “4” albums…

But who is my most listened to artist? I have a shrewd idea I know who it will turn out to be – my prediction is that it will be a four word band name with the initials HMHB. Use the fact that I defined my grouping key to start with the artist

val mostCommonArtists = artistsAndAlbums.groupBy(_.groupingKey._1)
.view.map { case (_, seq) => seq.head.artist -> seq.length }.toList.sortBy(_._2)(Ordering[Int].reverse)
mostCommonArtists.take(10).foreach(println)

(Half Man Half Biscuit,28)
(Fairport Convention,19)
(R.E.M.,18)
(Steeleye Span,15)
(Julian Cope,15)
(Various,13)
(James,12)
(Cardiacs,11)
(Faith No More,9)
(Brian Eno,9)

Bingo! The mighty Half Man Half Biscuit in there at #1. One flaw is immediately apparent – this naive approach doesn’t distinguish between “listening to lots of albums by an artist as part of business-as-usual” and “listening to an artist’s entire back catalogue in one go” (which accounts for the high showings of Fairport Convention, R.E.M. and Steeleye Span). Worry about that some other time.

How many albums have I listened to?

val distinctAlbums = artistsAndAlbums.distinctBy(_.groupingKey)
println("Total albums = " + artistsAndAlbums.length)
println("Distinct albums = " + distinctAlbums.length)

Total albums = 1371
Distinct albums = 1255

.. but that will be wrong because I’ve listened to some albums in the context of e.g. working through the Rolling Stone or NME’s list of top 500 albums, and in those cases I appended the number to the list e.g. “Battles – Mirrored (NME 436)”. So chop that off the end of the album name:

val artistsAndAlbums = messages.filterNot(_.permalink.contains("?thread_ts")).map(_.text).collect {
  case ArtistAlbumRegex(artist, album) =>
    ArtistAndAlbum(artist, album.replaceAll("\\([^)]+\\)$", "").trim)
}

Distinct albums = 1191

This final session took about 50 minutes, so if my maths is correct, the total time spent on this was a little under 2 hours. TBH I’m slightly dubious about the results; after listing all of the albums I’ve listened to in alphabetical order I’m sure there are some missing (e.g. I tackled the entire Prince back catalogue, but there were only a handful of Prince albums in there, ditto for David Bowie). I suspect a bit more work and exploration of the Slack API might reveal what I’m missing. Or maybe my method for distinguishing main messages from responses is wrong (just had a thought; maybe a main message that begins a thread also gets the ?thread_ts parameter).  But it’s close enough for now, and appears to confirm my suspicion that Half Man Half Biscuit are my most listened to artist.

And now, what with it being the season of goodwill and all that, it’s time for my special Christmas Playlist

The Trials and Tribulations of a Working Parent

I’ve worked in the science and technology sector for my whole career, starting off by completing a PhD in Physics, then migrating into computer modelling, and then into software development.  I had my son during the final year of my PhD (oh how naïve I was about how easy that would be) and then immediately hit the dilemma of “how are we going to pay for this?”

Students are not entitled to any kind of maternity leave or pay – when I made enquiries about this I was advised to quit my studies, which would have made me eligible for various benefits.  My department was much more accommodating and gave me four months paid maternity leave – something they were under no obligation to do.  I also managed to claim Maternity Allowance  (https://www.gov.uk/maternity-allowance/how-to-claim), because I had done some part-time maths lecturing which made me eligible.

When I made my first forays into real paid employment, I had to tackle the thorny issue of childcare.  Childminder or nursery?  Enlisting the grand-parents wasn’t an option, and neither was either myself or my husband becoming a stay-at-home parent.  We initially picked a nursery at my husband’s work-place, and then found one closer to home.  The fees were astronomical – larger than the mortgage – but we scraped by.  I had some comments (addressed to me, never to my husband) along the lines of “why bother having kids if you’re never going to see them?” which I shrugged off.

We timed having our second child so that we would not have two children in nursery at the same time for long – if nursery fees for one child were astronomical then double nursery fees were on a whole other level.  My employer at the time (not 67 Bricks, I should add) only offered the barest minimum maternity package, and so I could only afford to take four months’ maternity leave (my take-home pay went down to around £120 a week after the first 6-weeks of leave).  This was before the change in the law that would have allowed my husband to take some extended leave himself – when he requested to do so his employer said something along the lines of “you can do that when men start giving birth to babies”.

Despite the deficiencies in that company’s maternity policy, my immediate line manager was wonderfully accommodating.  He allowed me to have an arrangement where I worked in the office for about 5 hours a day and then completed the rest of my hours at home.  This enabled me to reduce the nursery hours from 9-4 rather than 8-6, and save money while spending more time with the kids.  He also gave me considerable flexibility around school holidays, and working from home on days when I had various child-related errands such as school plays, parent appointments etc.

Speaking of school, nobody ever tells you that having a school-age child is actually harder to fit a job around than having a nursery age child.  The school day ends at 3pm – and who is ready to finish work at 3pm???  Also, there are 13 weeks a year of school holidays to somehow cover.  At the time, the school did not have an after-school club (they started one up in later years) and so I had to find a child-minder who could do after-school pickups.  There was a holiday club at a school in the next village which I used – it was heavily sports-related, which my son in particular did not like, but I told him there was no choice and he had to go.

Over the next decade I went through various child-care arrangements, including nursery, childminders, after-school clubs, holiday clubs, and various forms of flexible working arrangement for both myself and my husband.  Our days were organised with military precision.  Drop the kids at school at 8:40.  Drive like a maniac to work, never being able to arrive earlier than 9:30.  And then having to leave at 5pm on the dot so that I could once more drive like a maniac to pick the kids up by 6pm or risk being fined (typically an immediate fine of £15 and then £10 for every extra 15 minutes you were late).  And the stress of sitting in a traffic jam on the motorway, watching the clock tick, wondering exactly how late I would be.  I changed jobs and my new line manager was equally wonderful – I was never quite able to complete my full hours during office-time, but he was perfectly fine with me making up time in the evenings.  And he never once quibbled when I said that I could not get in any earlier than 9:30, or stay any later than 5pm.

My kids are now in secondary school and make their own way to and from school, and are old enough to be left alone during school holidays.  Therefore I finally no longer have to worry about pick-ups, or astronomical childcare fees, and I don’t have to rush around like a loon trying to pick them up by a specific time.

The key takeaways from all this are:

  • Parents need flexibility.  We have schedules to meet, parents evenings and school plays to attend, and sometimes sick children to tend to.  Having an understanding manager who doesn’t watch the clock, and allows you to complete your working hours according to whatever pattern works, makes our lives so much easier.
  • Child-rearing is expensive.  Attractive maternity packages will improve your staff retention and employee satisfaction no end. The 67 Bricks maternity policy is better than many (Employees who have been here 2 years get 12 weeks on full pay and 12 weeks on half pay).
  • Dads need flexibility too.  For every dad who you allow to leave early to do a school run, there is probably a grateful mum who is able to get on with her own job without worry.  The number of dads standing at the school gate is getting bigger year-on-year.  When I was a child my dad was the only one at the gate, but these days it is much higher and that is only a good thing.  I have seen male colleagues experience discrimination in previous jobs, for example expressions of incredulity when they state an intention to take paternity leave.

Sharing Failures Dev Forum: The most “interesting” bugs you’ve caused

Ely (n.)

The first, tiniest inkling you get that something, somewhere, has gone terribly wrong.

The Deeper Meaning of Liff – Douglas Adams & John Lloyd (Thanks to Stephen for pointing this quote out)

At 67 Bricks we value open and honest communication, and aim to treat each other with respect and humility. Normalising failure helps everyone understand that mistakes happen and shouldn’t be used as a mechanism to punish but instead are an opportunity to learn and become more resilient to future failures. This week’s Dev Forum was on the topic of sharing failures, specifically talking about the most “interesting” bugs you have caused.

64 Bit vs 32 Bit Ubuntu

Our first fail came from a developer working on converting data from base 64 as part of a side project. Developing on the 64 bit version of Ubuntu, the code was stable and the tests were green and so they continued to build the application out. The application was intended to work on a number of different environments including 32 bit systems and this is where the first problems emerged.

The 64 bit OS zeroed out data after it was read and the code was designed to work with this, the 32 bit OS didn’t and instead exhibited random test failures. Sometimes expected values would match up and other times it would be wrong by a character or two, typically at the end of the strings. Finding the root cause required a trial and error approach to eliminate possibilities and eventually get a grasp on the problem and come up with a solution.

Transactions and Logging

Working on a client application this developer made two perfectly reasonable design decisions:

  1. Use transactions with the SQL database to ensure integrity of data changes
  2. Write logs to a table to ensure they are persisted in a convenient location

This approach was perfectly fine while the system worked, but the mystery began when the client started to complain about errors with the system. The logs didn’t show any error messages that lined up with what the client was reporting. Why? The same transaction was used to store logs and to update the data in SQL. If an exception was thrown, the transaction was rolled back preventing any data corruption problems, but also rolled back all the logs!

The application was changed to use a different transaction for logging to ensure logs were persisted. Using these logs meant the root cause of the client problem could be resolved and a lesson learnt about unintended consequences.

Overlogging

Another mystery around missing logs. A web application used a separate log shipping application to take logs from the server and send them to a remote server. However, under heavy loads the logs would become spotty and clear gaps appeared. The reason for this was due to the sheer volume of logs the shipper had to deal with eventually becoming too great and causing it to crash.

The solution was to reduce the number of logs written so the shipper would continue to function even when the main application was under heavy load. This triggered an interesting conversation at the dev forum on the ideas of how many logs should be written. Should you write everything including debug level logs to help with debugging faulty systems? Or should you write no logs whatsoever and only enable them when something starts going wrong?

Naturally we seemed to settle somewhere in the middle, but there were disagreements. Possibly a future dev forum topic?

The Wrong Emails

A client needed to receive important data daily via fax and the application did this by sending an email from a VB app to an address which would convert the body to a fax and send it on. The application needed testing and so had a separate test environment that was presumed to be safe to use for running tests.

It turned out that the test system was using the same email address as the live system, meaning the client was receiving both test and live data. Luckily the client caught on and asked why they were receiving two emails a day and the test system was quickly updated.

From then on this developer now always uses some form of override mechanism to prevent sending actual emails to end users. Others use apps like smtp4dev which will never send on emails (and just capture them so they can be inspected) or services like SES which can be configured to run in a sandbox mode.

Hidden Database Costs

AWS can be very costly, especially if no one is watching the monthly spend. In this case, one database drove up costs to astronomical new highs due to a single lambda running regularly and reading the entire database. The lambda was supposed to delete old data, but had a bug which meant it never deleted anything. Several months of operating the system meant a lot of data had piled up and so did the costs. The fix was simple enough to apply and we added additional monitoring to catch cases of runaway costs.

Similar experiences have been had before with runaway costs on clustered databases like MarkLogic. Normally a well-built index will be able to efficiently query the right data on the right node, but if the index is missing, MarkLogic will pull all the data across from the other node to evaluate it. This can drive up some eye-wateringly high data transfer prices in AWS and as such we now always monitor data transfer costs to ensure we haven’t missed an index.

Caching Fun

Our next issue is about users appearing to be logged in as other users. The system in question used CloudFront CDN to reduce server load and response times. The CDN differentiated between authenticated and unauthenticated users so different pages could be served for users who were authenticated.

The system made use of various headers set by lambdas to differentiate between authenticated and unauthenticated users. The problem arose when session handling was changed and the identifier used was accidentally stored within the CDN. This caused an issue where a user could load a page with a set-cookie header that set the identifier used for a different user’s session.

The team solved this bug by tweaking the edge lambdas to ensure only non-user specific data was cached. Caching in authenticated contexts can be challenging and need to be very carefully considered how they will be used reliably.

Deactivate / Activate

In this bug the business asked for a feature where user accounts could have a deactivated time set in the future that when reached, the user would be considered inactive and unable to access the system. This feature was implemented with a computed field in the SQL server which could be used to determine if the user is active or not.

As the system was already in use, migration scripts were developed to update the database. These needed to be applied in a certain order, however, deployment practices for this system meant that someone else applied the scripts and ended up causing an error where all users ended up deactivated preventing any users from accessing the system. To restore service, the database was rolled back and ultimately the feature was abandoned as too risky by the business.

Some viewed this as an example of why services should be responsible for their own database state and handle database migrations automatically.

Creating Bugs for Google to Index

Magic links can be a very useful feature for making authenticated access to a website easy, as long as these urls remain private to the correct user. In one case the url got cached by Google including the authentication token meaning anyone who could find the link would be able to access the authenticated content! This was fixed by asking Google to remove the URL, invalidating the token in the database and ensuring metadata was added to appropriate pages to prevent bots from indexing pages.

Another Google based bug next; after building a new system, the old one needed to be retired and part of this involved setting up permanent redirects to the new system. However, Google continues to serve up the old site’s urls as opposed to the new system’s urls, and a fix is still being worked out. A lesson learned on how important it is to carefully consider how search engines will crawl and store web sites.

As we see, failures can come from any number of sources, including ourselves. Bugs are a perfectly normal occurrence and working through them is an unavoidable part of building a robust system. By not fearing them, we can become more adept at fixing them and build better systems as a result.

Session musician programmers

As a software consultancy, we are always in the business of trying to recruit good developers. One of the more annoying phrases that has cropped up in the industry lexicon of late is “rockstar programmer”, as both an ideal to which developers are assumed to aspire, and a glib description of the kind of programmer that software publishers are assumed to be desperate to employ.

In my very humble opinion, the software industry has no need for rockstar programmers (or rockstar programmers, as it happens). If we are going to use a musical analogy, a better kind of programmer could be termed a “session musician programmer”. Session musicians are the (usually) fairly anonymous musicians who are relied upon for actually getting records recorded. Attributes of a good session musician would include at least some of the following.

  • Courtesy and professionalism towards colleagues and clients.
  • Playing, or being willing to learn, a number of different instruments.
  • Playing in a range of styles depending on what is required.
  • Being able to read music, in order to play someone else’s arrangement.
  • Ability to improvise, if the arrangement just calls for “a solo”.
  • Ability to work with other musicians to come up with a workable arrangement for a piece if there isn’t one.
  • Some understanding of music production, and an understanding of how what is played will translate to what is eventually heard on the recording.

Translated roughly into development terms, with a lot of artistic license, these might be equivalent to the following desirable attributes in a developer:

  • Courtesy and professionalism towards colleagues and clients.
  • Having some knowledge of a number of different languages.
  • Being able to fit one’s code into the “style” of the project being worked on.
  • Willingness to follow direction from a technical lead and implement a precisely written specification, where necessary.
  • Ability to understand and fill gaps in the technical description of a project.
  • Ability to work with other developers to come up with a plan for implementing something.
  • Some understanding of the tools used for building, testing and deploying code.

I am not sure that the concept of the “rockstar programmer” embodies these things. Rockstars exist to do one thing well, generally singing or playing guitar, and that often seems to come with a whole lot of tiresome baggage in the form of massive ego, tantrums, and demands for excessive amounts of money (yes, there are exceptions). It so happens that plenty of session musicians can sing, plenty of them can play guitar, and they are capable of generating far more in the way of recorded music in a year than most rockstars would manage in their careers.

My advice, therefore, is that aspiring software developers looking to the world of music for role models should aim to be part of The Wrecking Crew rather than Guns N’ Roses.

Dev meeting – debugging XQuery and XSLT – 8th January 2021

In the dev meeting today, we talked about debugging XQuery.

XQuery is a query language (primarily) for XML (in very loose terms, it’s like SQL for XML). Reece is the developer of an IntelliJ IDEA plugin that supports development of XQuery in IDEA. The latest changes he’s made to it allow it to be used for debugging XQuery in MarkLogic.

In IDEA, after installing his plugin, you need to set up an XQuery Run/Debug configuration for the MarkLogic server you are debugging. Then, you can run an XQuery file from within IDEA against MarkLogic, and it will display the output in the MarkLogic console.

Using standard IDEA breakpoints, you can add a breakpoint to an XQuery expression. Then, you can debug, which shows the stack frame, and the current variables and their values.

There is some complexity in working out the stack frames for eval expressions, which dynamically execute XQuery code in strings. Inigo expressed the opinion that this was generally a bad idea anyway.

At the moment, it’s not possible to debug into X-Ray XQuery tests, but Reece is doing further work to make it possible to run X-Ray tests easily from IDEA, which would also include work to make X-Ray tests debuggable.

Reece is also working on debug support for Saxon, which will support debugging in XQuery and in XSLT via Saxon.

Clarity and XQuery – dev meeting fun with blockchains and Intellij IDEA

Alex talked about Clarity, a language for running on the Blockstack blockchain. He recently worked on a hackathon for it. One of the motivations was to improve his functional programming – Clarity is very Lisp-like, and so being forced into a functional mindset was useful for learning. Another motivation was that Clarity were offering cash prizes. Clarity is pretty new – and things like the test framework and code syntax highlighting are very new.

Alex wrote a high score system that runs in Clarity – that will maintain the high scores for a game long-term, rather than needing to be maintained by a particular game developer on their own server.

There are various limitations to Clarity because of the nature of running on the blockchain – like an inability to perform loops, because it’s not possible to prove that they resolve in a reasonable length of time, due to the halting problem. And some primitives cost money, like storing and retrieving variables in the blockchain.

There is a built-in testing framework, that allows you to test your code on your local machine without writing to the blockchain. It’s incomplete at the moment, because it does not support transactions.

Native lists cannot be unbounded, for similar reasons of cost. So Alex also wrote an “endless list” that is essentially a linked list of bounded lists, so you can treat it as an unbounded list but you are only acting on the portions of list that you are currently looking at.

He then wrote code that can be used for redistributing money from a group of donors to a large group without having to use existing infrastructure like Paypal – because organizations like Paypal don’t like it when individuals send money to lots of people. This was inspired by some work that was done based on US stimulus cheques – individuals got cheques from the government, but not all of them needed them, and plenty of people needed more than the stimulus cheque provided. This allowed a group of donors to store money in the contract itself, in a way that is transparent to everyone; and then it to be distributed out to a set of recipients.

There’s more information at https://community.blockstack.org/clarity-winners.

——–

Reece talked about the latest improvements to his IntelliJ IDEA XQuery plugin. It now shows inline parameter names on function calls, and shows the function structure. It also displays documentation for functions from MarkLogic or from the standard XQuery and XPath libraries – it will download the documentation.

It will also show a query log in a tab for the various MarkLogic log files, so you can see the latest log messages without viewing a separate file. This works for both local log files, and for remote MarkLogic servers.

It will parse MarkLogic rewriter XML files, and display the results as a list of REST endpoints. In the upcoming version, it will automatically retrieve all the registered MarkLogic schemas, so for the rewriter.xml it provides auto-complete for the various structural elements and options in the rewriter XML. It also parses RestXQ APIs as used in systems like BaseX.

It will handle relative imports correctly if you set an XQuery directory as a source root – so “/myLibraries/whatever.xqy” will resolve to the right place. In the upcoming version, it will be able to recognize these automatically.

When you’re executing queries locally using a configured query processor (e.g. if you’ve configured MarkLogic or another system so you can do the local equivalent of using the MarkLogic console), then you can pretty-print the output and you can also see profiling information for the code that you execute.

Reece is currently working on debug support, so you can set breakpoints in your XQuery code.

Inigo stated that Reece’s plugin is great, and that everyone using XQuery or MarkLogic should be using it.

Dev meeting – 6th March 2020

Chris talked about a taxonomy management service that we’d created for a publisher. We created a system that allowed for a large number of taxonomies to be stored and updated, so they could be used by the client for categorizing their content. When a new version of a taxonomy arrived (for example a medical taxonomy like SNOMED-CT), then it would be stored inside the data store. We would often want to provide a subset of that taxonomy, because the ontologies were too large and sprawling to be useful to individual consumers.

Rich talked about SNOMED-CT; which is an ontology of medical terms. It describes tools, procedures, drugs, devices and many other things. It’s important that two different medical systems use the same terms for what they’re referring to, like “paracetamol”. By using the same identifier from SNOMED-CT for each, then it makes it easier to correlate the data between different systems.

Reece talked about a project for a customer that uses a financial ontology to manage financial documents. Those documents discuss a range of financial topics – so the system we developed uses the ontology to classify the sections within the document according to the terms matching those sections. Hence – the customer can find documents and sections within them that are relevant to the specific

We briefly discussed the differences between term lists, taxonomies, thesauri, and ontologies. Inigo expressed the heretical view that none of the definitions really matter.

We sometimes care about classifying content against geographical regions, and understanding the hierarchy of those regions. For example, we might classify a document as discussing a legal case based in Paris. However, the user is interested in legal cases of that sort occurring in the EU. Because our ontology provides information that Paris is in France, and that France is in the EU, then we should be able to provide that information to the user based on inferring this additional information for the document. Loic talked about this, and also about the need to make subsets of ontologies to a certain depth only – we don’t necessarily care about small villages, and may harm performance as well as providing false positives.

Nikolay talked about micro-frontends – a frontend composed of UIs provided from different servers. He talked about using view components in ASP.NET Core MVC, and how they allow you to compose the information displayed in a view in a slightly different way to using partials. This is done server-side. They retrieve their data via dependency injection – so the parent doesn’t need to change when the contained component changes. It helps you think in terms of features rather than pages.

“It’s full of stars” – Dev meeting 24th January 2020

Alex A talked about the “Astronomy Picture of the Day” – he has liked having this as his desktop background. But getting hold of all the images is hard – there’s an torrent but that’s very old, you can download the site via wget but that leads to duplication and an odd file structure. They do have an API via https://api.nasa.gov/planetary/apod (and there are a lot of other NASA APIs) – but the parameters that the API describes don’t actually work! You can retrieve a chunk of JSON for an individual picture at a time – but if you repeatedly call it, then you get banned for too many accesses. It turns out that you can pass in a start_date and an end_date, and get a big chunk of JSON back from it, which resolves the “too many accesses” problem. He parsed this with the Newtonsoft JSON library (very useful for .NET JSON parsing), dealing with a few dirty data issues. One of the problems is that there are some specific dates that don’t work – and if they’re inside your date range, then the whole date range fails! So, Alex wrote code to bisect the date range repeatedly to find what the problematic date was so it could be excluded from the range. There were also some broken images in the data. Alex has now downloaded all of these files, organized them by size, and put them in a torrent, so they are now freely available.

Alex G talked about Powershell, and how he uses it. It uses an object-oriented scripting language, based on .NET (or .NET core for Powershell Core), and also has ksh-style syntax. It supports basic calculation on the command-line like addition and multiplication – and can do more complex maths with e.g. [Math]::Exp(2) – and it will autocomplete from within .NET packages too. It supports setting up aliases. “ogv” (Out-GridView) is a useful graphical tool for filtering text… but it only works on Windows. Because it’s acting on objects and passing objects between components, you can act on them as objects – e.g. “ls | select -Property length” will select the “length” property of each filename. “ls | convertto-json” will create a JSON format of the input, and convertfrom-json will conver it back again into a JSON object that can be traversed via JSON dot notation. Powershell Core is available on OS X and on Windows – and if you’re working on a mix of Windows and Unix environments, then it can be useful to use Powershell across each environment.

Inigo talked about Amazon Polly, a speech synthesis tool that we’ve been using for a client. The impressive thing about it is how well it reads out scientific text – while we’ve looked at speech synthesis before, it’s always been good at doing common words but has fallen over with technical text. Polly copes very well.

All the trees!

We planted 192 trees this month!

This is part of our ongoing plan to be carbon negative – we wrote earlier about our plan to be carbon neutral, but we upgraded this a few months ago to instead be carbon negative. So, after working out how many trees would be needed to offset our server hosting (our single biggest source of CO2), we then multiplied it by ten to make sure that we were carbon negative.

One of the areas where trees have been planted on our behalf is:

https://tree-nation.com/projects/la-pedregoza/updates