Dev meeting – Lightning Round! AWS Layers, Scala 3, becoming carbon neutral, and more

In our dev meeting this week, we had a “lightning round” where developers talked for up to 5 minutes each about a topic that interested them or that they had recently been working on.

Loic talked about “AWS Layers” – this is a way of sharing code across multiple lambdas. One advantage is that your turnaround cycle for lambdas can become faster, because you can just update the lambda, and the layer remains the same each time. This is very useful if you are pulling in lots of third-party libraries, such as in a Java library.

Velizar talked about Scala 3, which is due to be released in 2020. This is interesting to us because we use Scala 2 for a significant number of projects. Scala 3 has a nicer syntax. It removes implicits and replaces them with a number of different items. It will be “mostly” backwards-compatible, with the older syntax deprecated over time. The compiler will have LSP built-in, which means that support in VS.Code becomes much easier.

Reece talked about using fixed headers in HTML tables. There are various hacky ways of managing this to make the table body scrollable separately. The new way of doing this is to use a “position sticky” CSS class that can be added to the th elements, which allows the browser to sort out all of the problems. You need to be careful with z-indexes and transparent backgrounds to make sure that the header and body display correctly. Older browsers will not support this, but will gracefully degrade to a traditional non-scrolling header.

Nikolay talked about Razor parameters – if a parameter is set to null, then the entire parameter is removed, which is handy for setting attributes on HTML elements.

Dan talked about testing one of our client applications which is a Scala and JavaScript application for managing drugs. We have been updating an application that we wrote a few years ago to a newer system. To make sure that the migration is successful, Dan has added instrumentation code to the old system to quickly and easily generate test data. This instrumentation code makes it easier to carry out various developer-friendly actions in the old system, and then to download JSON from it. This is an ad-hoc way of creating 300 additional unit tests.

Bart talked about the AWS CloudFront CDN that we have been using for a client. We have been implementing a third-party user behaviour tool on top of the platform that we have created for them. In the test environment, this is tricky to test when working from home, because a home IP address is not included in the list of whitelisted IPs that are allowed access. It would be possible to configure our VPN so it forwarded all traffic from a home computer to CloudFront – but this affects all home users. The CDN itself has a very large number of IP addresses, so it’s not easy to list just the IPs that should be forwarded to it. He talked about setting up the default gateway on his Linux installation. Ultimately, he resolved the problem by whitelisting his home office IP address, because it’s relatively stable.

Rich talked about recruitment and recruiting developers better. He talked about becoming carbon neutral as a company, and how we should be socially responsible as a company. We have recently been talking internally about our company values, and being carbon neutral is a thing that we are all enthusiastic about (Sam, one of the directors, is currently with his children at Extinction Rebellion). We also talked about the work that we do for charities, and that we should perhaps advertise it more.

Finally, Ian talked about .NET Core 3, which is being launched next week. It has a number of new interesting features. However, it is not LTS, which will be coming out in version 3.1. Then Simon talked about Java 13, which was released yesterday.

Dev meeting – the Lead Dev conference

Chris went to the Lead Dev conference a few weeks ago. It is a single-track conference; there are many short talks, and there’s no need to choose between them.

There are four main areas that Chris focussed on: misc things; making teams work well; diversity and hiring; and operations. These are some of the talks he went to.

Nickolas Means told stories – this time about the building of the Eiffel Tower, and how it is relevant to software development. He talked about how to get things done – you need to do networking, self-promotion, and negotiation – better thought of as “making friends, telling stories, and co-operating”. He thinks that team leads shouldn’t be, um umbrellas entirely protecting a team from what’s going on in a business, but should instead be heat shields that reduce the impact of these things.

James Birnie talked about Quantum Cryptography, and how we should all be worried that governments can record encrypted network traffic now that can be decrypted in a few years.

Lara Hogan talked about dealing with friction in teams – “Forming”, “Storming”, “Norming” and “Performing”. When you feel under attack, you become defensive and you don’t make good decision. Instead, think about peoples’ core needs – Belonging, Improvement, Choice, Equality, Predictability, and Significance. If there are problems in a team, then it may be because one or more of these needs isn’t being met. Lara has several blog posts on these topics.

Paula Kennedy talked about Silence Isn’t Golden – how to deal with distributed teams. They had a weekly standup to discuss things inside and outside woek; a monthly retrospective to discuss team functioning; and having a regular “coffee break” meeting.

Bethan Vincent talked about increasing diversity in hiring. She discussed the issues with a fully anonymized process, which didn’t work very well for various reasons. She talked about the importance of plenty of information in job listings, and the issues of mandatory take home tests for people who have limited time available. We discussed this more, because this is an area where we’ve been putting in effort internally.

Kate Beard talked about 10% time at the FT, working on side projects, and its relevance to diversity. Without that, people who have limited free time outside work are not able to do side projects, which limits their learning opportunities and career development.

Ola Sitarska talked about diversity again. She discussed code review as an interview technique.

Steve Williams talked about teams performing in crisis situations. Companies don’t plan for business crises as well as they should. He discussed his experience volunteering for the RNLI. It’s important to think about appropriate roles and responsibilites; rituals to follow; carrying out exercises. He also talked about “SMEAC” – Situation/Scenario – Mission – Execution – Admin – Comms, where you’re thrown into a situation and then need to deal with it (also known as Five Paragraph Order). We talked about how you could arrange a communication process in the event of a disaster, and how to manage disaster recovery processes.

All of the conference sessions are available online.

Arrows, and outrageous fortune

Tim talked about arrow functions in JavaScript, and how “this” works with arrow functions. The behaviour of “this” differs between ES6 arrow functions, and traditional ES5 and before JS functions.

He suggested some rules of thumb:

  • Don’t use arrow functions as methods in object literals, because they do not pick up the scope from the object literal that you might expect
  • Don’t use arrow functions as methods in classes – it does work, but it’s more verbose than using the concise class method syntax
  • Do use arrow functions as function parameters – e.g. arr.map(el => this.doSomething(el)) is much clearer than using bind or assigning this to another variable outside the method
  • Be careful using “this” with an arrow function if you’re using JQuery – because JQuery reassigns “this”
  • Don’t use them in lifecycle functions in Vue – Vue reassigns “this” instead

We also talked about lambdas in TypeScript and in other languages.

Then, Ed talked about the past and future of programming, based on a talk he’d recently attended by Uncle Bob. He talked about the history of programming, using mercury delay lines for memory, and how the profile of what a “programmer” is has changed over time. Half of programmers have less than five years experience – so passing information on is harder than it used to be, and understanding the importance of disciplines such as TDD is less common. Uncle Bob thinks that a combination of lack of experience, and that programmers are managing increasingly important systems, means that there will be a disaster created by poor software in the near future. Uncle Bob suggests that more attention towards craftsmanship will help prevent this – the alternative being greater regulation of programmers.

We expressed many strong opinions on these topics.

Observability and hot reloading

Chris talked about “Observability”, based on an article he had been reading from Martin Fowler’s site (https://martinfowler.com/articles/domain-oriented-observability.html). This covers topics such as logging, analytics, and metrics. In the modern era of cloud servers, this is more complicated and more important than in old-style server setups. It’s a set of cross-cutting concerns, that you do not want to be obscuring your domain logic. Domain level logging and metrics are the interesting topic – low-level server level events are easy to handle – and if the domain level metrics are covered, then the low-level metrics are less important.

One approach for improving this is to pull out the domain logic into a focussed domain class, and then have a separate instrumentation class to deal with the metrics – a “domain probe” pattern. Testing the instrumentation is important, but is easy to ignore and hard to test if the logic for it is scattered around your code. Testing is easier if you break out the instrumentation into a separate class.

When you’re doing instrumentation, you typically care about other metadata apart from the domain values that are being passed around – such as request IDs, versions, etc. This can be wrapped up in an execution context – which you can perhaps retrieve via a factory. An alternative approach is an event-based approach, whereby you fire off events when events happen.

The article suggests that AOP isn’t the right tool to use for this, because AOP is typically at the method level, and the domain level importance of activities typically doesn’t match up totally with the method code structure. It also adds more magic, which is bad.

We discussed that this could be done with a decorator pattern, and we discussed the value of decorator patterns in value. In some projects we have used a similar approach in a decorator-like fashion. This does have the same issue as AOP that the important domain logic might not match up with the method level granularity, but doesn’t involve magic. We also discussed that the execution context could be passed around via implicits when using Scala, and agreed this was useful.

Alex talked about live reloading of Java and JavaScript. We want two things – that changing Java source code leads to immediate updates, and that changes to JavaScript lead to immediate page code updates. The approach he has been using is to run “gradle –build continuous” to continuously rebuild the source; and a separate gradle process to run the development server using those classes. Then, for the JavaScript, there’s a separate webpack instance that rebuilds the JavaScript and runs on a separate port with hot reload. There are CORS issues too that need to be overcome.

We also talked about using a similar approach here with Spring Boot devtools. We also discussed how to achieve the same results in Scala Play.

Dev meeting – 25th Jan – TCR, NativeScript, Tentai Show and XQuery for IDEA

Chris talked about “Test-Commit-Revert”, an idea from Kent Beck building on TDD. Kent Beck was using a process of “Test, Commit every time that all the tests pass”, so you are never left with a state where the tests used to work and you’re not quite sure why they stopped working. But the extreme extension to that is “Test-Commit-Revert” – commit when the test pass, but revert all your changes if the test fail! This forces you to have a very short commit cycle, making tiny incremental changes each time. It’s even more interesting if you doing it in a team, and everyone is doing this simultaneously, as you each do tiny commits and then build on top of each others changes. Chris (and Kent Beck) aren’t suggesting that we do this all the time, but it’s an interesting idea that makes you think about how your development cycle works.

Alex talked about NativeScript – a way of writing code for mobile devices from VueJS. It isn’t just a web browser shell – it’s very fast. It doesn’t use HTML – instead, it uses “NativeXML” which has a lot more control over Android-style layout. There is a very good short project as the introduction, which is a great place to start and walks through all the basics. Alex showed us a very simple chunk of Vue code and NativeXML that produced a simple application. There is code in his Github account at https://github.com/xmakina/CrisisMH. He recommends it highly for mobile development (the only issues are with the general complexity of the Android ecosystem). We discussed how easy it would be to convert an existing VueJS app to NativeScript – not that easy, since it doesn’t use HTML.

Chris also talked about some code that he wrote over Christmas to solve a Japanese puzzle called “Tentai Show” – this is apparently a pun around astronomy and rotational symmetry. There is a grid of squares, containing stars that are in a corner or in the centre; and you need to divide up this grid into jigsaw pieces that have rotational symmetry. There is an example at http://www.nikoli.co.jp/en/puzzles/astronomical_show.html. Chris wrote a solver for this in Scala, on his Github at https://github.com/nespera/tentaishow.

Reece then talked about his XQuery plugin for IntelliJ IDEA. This is available from the JetBrains plugin repository at https://plugins.jetbrains.com/plugin/8612-xquery-intellij-plugin. It supports XQuery 1.0, 3.0, and 3, also with support for MarkLogic and BaseX extensions. At present, he’s working on splitting out the XPath support from XQuery – so the features of his plugin can also be used in XSLT as well. He has already got Run Configurations working for a range of processors including Saxon and MarkLogic, so you can specify which query processor it uses. It’s already the best XQuery plugin available – but there is also a huge set of additional features that he is working on!

Nullable references in C#

Tim B went to see Jon Skeet at the recent Oxford .NET talk. Jon Skeet is the most prolific and highly voted Stack Overflow contributor ever. He was talking about the plans for C# 8.

Jon Skeet mostly talked about nullable reference types – what will happen, and his opinions about them. C# has value types and reference types, similarly to Java. C# currently has nullable value types – so you can have int? to represent a nullable integer. In C# 8, there will be “nullable reference types”. This seems a little odd, since reference types are already nullable – but it is to change the default behaviour of reference types so the compiler will warn if null is assigned to a reference type. This is only a language level change, not a CLR change.

The compiler will attempt to identify when a nullable object is referenced – and it tries to do this cleverly, by checking for things like “xx != null” that mean that a null reference might be accessed.

There are other changes coming around pattern matching in switch statements – similar to the way that this works in Scala.

We discussed the way that the C# language is evolving compared to the way that Java is evolving. We agreed that the C# language maintainers were much more willing to make significant language changes, whereas Java has been much more focussed on backwards compatibility and a much slower evolution.

Following this, we discussed Kubernetes, and our experience with it.

Testing Test Driven Development

In our dev meeting, we discussed Test Driven Development. So, of course, we did it TDD-style: we started off by asking whether everyone knew what TDD is; and then testing that; and then asking more questions about TDD until we got a failing test – i.e. several people who didn’t know. Then we updated our knowledge of TDD until we didn’t have a failing test, and iterated…

We agreed that there are many benefits to using TDD, of which “ending up with a working test suite” is only the most obvious:

  • Allows incremental development, so you are able to run the code as you’re going along
  • Provides documentation for how the code works
  • Helps you to get a feel for how the code will work as you’re writing the test
  • Proves safety, confidence, and a greater ability to refactor
  • Writing a test beforehand means that you are forced to have a failing test
  • Forces a different mindset – because you’re thinking about your design and API as you write
  • Helps you concentrate on what’s relevant to the task at hand, rather than doing unnecessary work
  • Encapsulates features nicely – encourages better organization
  • Helps keep focussed on the problem, and keeping the solution simple, rather than solving unimportant problems

But, we concluded that while many of us used TDD for some of the time, it wasn’t true that all of us were using TDD for all of the time. So, given all these great things about TDD, why aren’t we using TDD for everything?

  • We’re lazy
  • It takes longer to get to a result – longer term benefits but a short term cost
  • Not as good if you’re exploring
  • Some types of test are hard to write
  • Slow feedback loop for some types of test – e.g. some integration tests
  • Limited framework support
  • Working in an existing codebase that has been developed without TDD
  • Working with frameworks that have been written without testing in mind – particularly older frameworks
  • Working in very data-heavy projects, based on mutable source data
  • TDD is a bit less useful when the language is more focussed on the problem domain (e.g. C#) rather than more focussed on technical issues (like C++)

We discussed the level that we were “pure” in our application of TDD – whether we were always using a strict, Kent Beck approved “write the most minimal test, write a dumb implementation that just passes that test, then add more tests and refactor”. We concluded that we mostly weren’t – the range of TDD-like approaches that we use are:

  • Pure
  • Use a hybrid approach – write some minimal tests, write a real implementation, and extend the tests to cover edge cases
  • Write TDD-style  when it will actively speed up implementing a feature due to better turnaround time
  • Write out the specs first, and then implement the code based on that –
  • Write tests to help the code reviewer understand what’s been written: sometimes before writing the code, and sometimes after

We discussed a potential pitfall we had sometimes hit with TDD – that sometimes it is tempting to keep hammering at the code until the tests pass, rather than to step back from it to reconsider the overall strategy.

In conclusion – we use TDD a lot of the time, but we tend to use hybrid approaches rather than a strict pure approach.

Nothing to fear except hackfear itself

Nikolay talked about the #HACKFEAR hackathon  that he had recently been to. It was organized by Karen Palmer, a film maker and parkour practitioner. She is interested in fear – Parkour is at its essence about fear management. She discussed a future in which technology has gone bad – for example, if you wanted by the police and you get into an automated car, it will take you straight to the police station and instead wants to use technology to help guide and empower people.

She has an art piece called “Riot” – which is a webcam watching you, while watching a video, and attempting to identify the emotion that you’re portraying using a neural network. If you show “appropriate” responses for the situation, then the video progresses; if you show inappropriate responses, then the video ends and you have as many attempts as you need.

At the hack, most of the hacking was about concepts, rather than fully working products. Nikolay’s group looked at fear of public speaking; taking the technology from Riot to analyze your posture and speech (e.g. how much you say ‘um’) to help provide feedback on your speaking.

Another team used a VR system to analyze your emotions and show you “scary” things, as exposure therapy to them. other teams that tackledmanagement of memory loss, fear of self expression as well as fear management through journals or improvenent of communication for early school-leavers.

We also discussed other types of fear, such as writers’ block, and acrophobia; and the difference between climbing a ladder, versus jumping off a cliff with a paragliding harness attached. While the latter should be scarier, it’s not, because you know that you have a harness.

Karen Palmer has TED talks about this topic.

Lead Developer Conference 2018

I attended the Lead Developer conference in London a couple of weeks ago. I enjoyed it and came back with lots of ideas buzzing around in my head. It’s a single track conference, which is good because you don’t have to make decisions about what to see and what to miss, but also you get to see some things you might not have chosen just based on the title. Many of the speakers have given longer versions of the talks elsewhere, or have written articles on the subject, so if particular topics are of interest it is possible to go and dig in further. You can think of it like a taster menu at a fancy restaurant.

Photo by White October Events

I talked about some of the talks I had seen at our developer meeting on Friday. I couldn’t cover all of them (23 in total I think), so concentrated on a few that had particularly resonated. The full set of conference videos are available to view on YouTube, so go and check them out. Here are some details of the handful of talks I discussed with the team:

Alex Hill – Giving and receiving code reviews gracefully

Alex has written up a longer form in this blog post while the video of her talk is on YouTube.

This talk was about the psychology of code reviews and how to take that into account to get the best outcomes. People sometimes feel defensive about code reviews as it feels as if they are being criticized rather than the code under review.

She talks about dividing up code review comments into 4 quadrants along 2 axes: High vs Low Conflict & High vs Low Reward. The Low Reward, High Conflict things tend to be preferences like where to put brackets and so on. The best way to handle these things is to agree code format standards and automate them away. The Low Conflict things don’t cause problems between team members because they are non-contentious. Things like obvious bugs (in the High Reward area) and debug statements (in the Low Reward area). It’s the High Reward, High Conflict things that are tricky. She suggests considering Conflict Resolution Archetypes- Avoiding vs Yielding vs Competing vs Collaborating. We are aiming for collaboration and she has some suggestions on how to achieve that.

These include: Doing more pair programming and having more discussion before implementing a feature. Ensuring everyone reviews and is reviewed, so there is a level playing field. Using “we” rather than “you” or the passive voice to keep the whole tone of the review more neutral. Asking questions rather than making demands. Just being positive rather than negative or confrontational.

As the receiver of the review, say thank you and also think about how you think someone else would respond.

Adrian Howard – Points don’t mean prizes

There is a longer version of this, in video form from the ACE conference while the short version from Lead Dev is on YouTube.

Adrian works in the intersection between development, UX and product helping companies build the right things. This talk was about various dysfunctions he sees in the way people think about Scrum, Agile and requirements.

The default scrum model that people use is kind of broken. Someone comes up with the vision that everyone is heading to. Someone comes up with the user journeys to get to that place, that gets split up into stories. Those stories are given to the developers and everyone lives happily ever after. But that’s a lie.

Problems arise because the different stories are different sizes. So it’s hard to put them into fixed sprint-sized boxes or to get flow in a Kanban approach. So break them up into smaller ones and we get smoother flow.
Give those to the developers and we’re done. Again that’s a lie.

Stories focus on size and effort not on actual value. So we may have split
up the story and actually delivered little value. So think about:

  • Bin – can we discard or postpone a story?
  • Thin – can we deliver less and still get value?
  • Split – can we break up a story and still get value from the pieces?

Once that is done, give the stories to the development team and we are done. Once again, it’s a lie.

The problem is that often the people who want to follow this approach don’t have the authority to make it happen.

Adrian recommends User Story Mapping as a way to get good stories and keep the big picture in mind. He particularly likes the book that describes it, because if you give someone a book, it has much more weight than just, “hey try this technique”. The output is a map rather than a flat backlog. People tend to do this at the start, but it’s best to keep refining. Some of the ideas of this approach are described in Jeff Patton’s blog post that predates the book.

Nickolas Means – Who destroyed Three Mile Island?

The final talk I discussed from Day One of the conference was about the nuclear reactor meltdown at Three Mile Island. I recommend watching the video of this as he is a good story teller and I am not going to retell it in detail here.

He first outlined the events that lead to the partial meltdown occurring and then discussed the ideas of the “first story” and “second story” as described by Sidney Dekker’s book “Field Guide to Understanding Human Error“. The “first story” is written with hindsight and outcome bias and generally seeks to blame someone for the results. The “second story” seeks to look at what happened through the eyes of those who were there and what they knew at the time. The idea is to start with the assumption that everyone was doing the best they could with the information they had at the time, so human error is never the cause of the event. This leads into the idea of blame-free post-mortems as a way to discover and fix systemic problems rather than seeking someone to blame.

Uberto Barbini – Legacy Code – Big Rewrite or Progressive Rejuvenation?

The first talk I discussed from the second day of the conference was this one about legacy systems. The video of this talk is on YouTube.

A legacy system is old, but it works and usually makes money for the company, or it would have been retired.  One of the options for dealing with such a system is to just keep patching it as changes are required. The downside to this is that the system slowly degrades as more and more changes are added.

Another option is the big rewrite. This rarely works out. The thing you are replacing was successful, so not as simple to replace as you might think.
The old system contains quite a bit of knowledge that can be lost in the transition. Finally, data migration is nearly always harder than expected

The best approach seems to be the “Strangler” pattern as described by Martin Fowler whereby the new application wraps the old one and then slowly replaces it over time. This has the advantage of showing results quickly and not requiring a risky “big bang” switchover.

Uberto Barbini has a similar technique which he calls “Alchemical Rejuvenation” – Turning legacy code into gold.  It has the following steps:

    • Seal with external tests. First of all you need some high-level assurance that the system is working after you make changes. These tests may be discarded later, once there is better testing in place.
    • Split into modules. Start improving the internal architecture to separate into logical pieces.
    • Clean the module you need to work in, adding tests as you go.
    • Repeat as needed

He had an interesting take on code quality – It’s not clean code, TDD, or patterns etc. Those are just tools to get code quality. The real test is if your application has been running for 10 years and you can still add features and fix bugs quickly, then you have high code quality.

Kevin Goldsmith – Using Agile to Build Inclusive Teams

The final talk I discussed was about using agile techniques to improve the way teams are run. The video for this talk is also available on YouTube.

He talked about using post-its to work with one of his reports to work out want they each expected of each other. Similar to the idea of the “Manager Read Me

In similar theme he talked about mentoring a lead. Again, working out where different responsibilities lie. Is the manager keeping it, Does the manager approve it, Does the new lead inform of their decisions, or Does the new lead take full responsibility?

He also talked about improving team meetings. When it comes to making a decision he has two approaches: Polling – everyone gives their opinion, but in the end the manager decides. Voting – everyone votes. In the end the Manager has to accept and defend the decision. He talked about having a collaborative team meeting agenda in a shared Google Doc. For larger groups he recommends the Lean Coffee approach.

Finally he talked about having more inclusive meetings. The lead needs to resist talking as other people will yield to them. He also suggested having an observer who points out interruptions, people not getting credit etc. This role should be rotated though to avoid people not contributing.

Release It – 2nd edition – part 2!

Chris talked again about Release It 2nd edition.

Last time, Chris talked about “Creating Stability” – things that can go wrong, and how to prevent that.

The next section “Living in Production” is about how a system works in production. Part of this is physical (networks, IPs, etc.). There can be clock problems particularly with VMs. It covers “12 factor apps” – which we’ve discussed before in the context of microservices, coming from the microservices ideas, this is all about making the app not depend on things on the box.

We discussed “Stucco apps” – where if you install subsequent versions 1, 2, 3 of an app on a box, then there will be bits of version 1 and 2 left over – so the app isn’t exactly any of those versions. Instead, you should rebuild from scratch each time (you could use Nix and NixOS for this…). We also discussed configuration – getting environment-appropriate configuration onto each box.

We had a digression about ambient sound from services – like putting microphones in the JET torus – so you can tell whether the system is running normally. Because humans are good at recognizing unusual noises or unusual changes in noise patterns, this can let you pick up on patterns of behaviour that aren’t otherwise obvious.

We talked about setting up logging to demonstrate that the high-level goals of the system are being met. For example, in some systems, it might be really important if page loads have become slow, or users cannot log in, or if the number of purchases per hour has significantly dropped; and these are the important business needs rather than just whether a box is up.

We briefly discussed the merits of the Unix command “uniq -n” for monitoring services; for example to find the counts of unique ID addresses. This is very useful for spotting patterns in your logs.

When upgrading data in SQL databases, the upgrade path is typically straightforward – you migrate it all via a migration, or the apps don’t work. In NoSQL databases, there is no schema, there may be multiple clients using the data imposing their own restrictions, and so it’s not so straightforward. The author suggests a “trickle then batch” approach of first converting the high priority items, then after a while, converting all the other items.

We talked about API changes, and contract tests created by consumers, and versioning of APIs.

The final part of the book is about systemic problems – a grab bag of issues that didn’t come up elsewhere. Load testing scripts can be overly polite and well behaved, and then sites break when hit with real users that aren’t well behaved – so the load testing scripts should be more impolite. We talked about chaos – we’ve discussed chaos monkeys before, but there are various refinements to this idea. For example, a default “opt-in” for chaos monkeys, with the ability to opt-out if your service cannot tolerate chaos. Also, a “zombie apocalypse” – you send home a bunch of people, and see whether any of them are indispensable or not.