{"id":252,"date":"2021-12-24T08:10:00","date_gmt":"2021-12-24T08:10:00","guid":{"rendered":"http:\/\/blog.67bricks.com\/?p=252"},"modified":"2024-12-20T14:30:02","modified_gmt":"2024-12-20T14:30:02","slug":"what-have-i-been-listening-to","status":"publish","type":"post","link":"https:\/\/blog.67bricks.com\/?p=252","title":{"rendered":"What have I been listening to?"},"content":{"rendered":"\n<p>A while ago, Tim suggested we could have a #now-listening channel in our company Slack, in which people could post details of what they were listening to. It occurred to me that it might be a fun challenge to try to figure out from what I&#8217;d posted on there who my favourite artist was, and which was my most-listened-to album. So I rolled up my sleeves and got to work. This is an account of what I did and my various thought processes as I went along&#8230;<\/p>\n\n\n\n<p>Challenge: figure out how to get my posts from our #now-listening channel and do some statistics to them.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p><strong>Session 1<\/strong>: After school run, but before work&#8230;<\/p>\n\n\n\n<p>Start &#8211; there&#8217;s an API. <a href=\"https:\/\/api.slack.com\">https:\/\/api.slack.com<\/a><\/p>\n\n\n\n<p>Read the documentation: <a href=\"https:\/\/api.slack.com\/methods\/search.messages\">https:\/\/api.slack.com\/methods\/search.messages<\/a> looks useful &#8211; how do I call it?<\/p>\n\n\n\n<p>I NEED A TOKEN! Aha &#8211; <a href=\"https:\/\/api.slack.com\/apps\">https:\/\/api.slack.com\/apps<\/a> &#8211; a &#8220;generate token&#8221; button&#8230;<\/p>\n\n\n\n<p>Access token: xoxe.xoxp-blah-blah-blah. SUCCESS!<\/p>\n\n\n\n<p>First obvious question: has someone done this already? Google knows everything: <a href=\"https:\/\/github.com\/slack-scala-client\/slack-scala-client\">https:\/\/github.com\/slack-scala-client\/slack-scala-client<\/a><\/p>\n\n\n\n<p>Create a new project: <code>sbt new scala\/scala-seed.g8<\/code> &#8211; add dependency on slack-scala-client, ready to rock! In such a hurry; I can&#8217;t even be bothered to set up a package, just hijack the Hello app that came in the skeletal project&#8230;<\/p>\n\n\n\n<p>From docs:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val token = \"MY TOP SECRET TOKEN\"\nimplicit val system = ActorSystem(\"slack\")\ntry {\n  val client = BlockingSlackApiClient(token)\n  client.searchMessages(WHAT TO PUT HERE?)\n} finally {\n  Await.result(system.terminate(), Duration.Inf)\n}<\/code><\/pre>\n\n\n\n<p>&#8230; maybe something like&#8230;?:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val ret = client.searchMessages(\"* in:#67bricks-now-listening from:@Daniel\", sort = Some(\"timestamp\"), sortDir = Some(\"asc\"), count = Some(5))<\/code><\/pre>\n\n\n\n<p>RUN IT<\/p>\n\n\n\n<p>Fails. Because HelloSpec fails (I mentioned I just hijacked the OOTB Hello app). Fix with the delete key.<\/p>\n\n\n\n<p>RUN IT<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[WARN] [11\/26\/2021 08:40:37.418] [slack-akka.actor.default-dispatcher-2] [akka.actor.ActorSystemImpl(slack)] Illegal header: Illegal 'expires' header: Illegal weekday in date 1997-07-26T05:00:00: is 'Mon' but should be 'Sat'\nException in thread \"main\" slack.api.ApiError: missing_scope\nat slack.api.SlackApiClient$.$anonfun$makeApiRequest$3(SlackApiClient.scala:92)<\/pre>\n\n\n\n<p>:-\u200b(<\/p>\n\n\n\n<p>Google: &#8220;missing_scope&#8221; and interpret results<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The token used is not granted the specific scope permissions required to complete this request.<\/p><\/blockquote>\n\n\n\n<p>:-\u200b( :-\u200b(<\/p>\n\n\n\n<p>Maybe I have to create an app and add it to the workspace? I&#8217;ll try that.<\/p>\n\n\n\n<p>Created, figured out how to add the user token scope &#8220;search:read&#8221; &#8211; and I got a new token!<\/p>\n\n\n\n<p>Token= xoxp-blahblahblah<\/p>\n\n\n\n<p>Rerun: I got a response!<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"json\" class=\"language-json\">{\n  \"ok\":true,\n  \"query\":\"* in:#67bricks-now-listening from:@Daniel\",\n  \"messages\": {\n    \"total\":0,\n    \"pagination\": {\n      \"total_count\":0,\n      \"page\":1,\n      \"per_page\":5,\n      \"page_count\":0,\n      \"first\":1,\n      \"last\":0\n    },\n    \"paging\": {\n      \"count\":5,\n      \"total\":0,\n      \"page\":1,\n      \"pages\":0\n    },\n    \"matches\": []\n  }\n}<\/code><\/pre>\n\n\n\n<p>:-\u200b(<\/p>\n\n\n\n<p>Let&#8217;s just search in the channel without specifying a name&#8230;?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val ret = client.searchMessages(\"in:#67bricks-now-listening\", sort = Some(\"timestamp\"), sortDir = Some(\"asc\"), count = Some(5))<\/code><\/pre>\n\n\n\n<p>Gives:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"json\" class=\"language-json\">{\n  \"ok\": true,\n  \"query\": \"in:#67bricks-now-listening\",\n  \"messages\": {\n    \"total\": 7113,\n    \"pagination\": {\n      \"total_count\": 7113,\n      \"page\": 1,\n      \"per_page\": 5,\n      \"page_count\": 1423,\n      \"first\": 1,\n      \"last\": 5\n    },\n    \"paging\": {\n      \"count\": 5,\n      \"total\": 7113,\n      \"page\": 1,\n      \"pages\": 1423\n    },\n    \"matches\": [\n      {\n        \"username\": \"daniel.rendall\",\n        \"other\": \"field_here\"\n      }\n    ]\n  }\n}<\/code><\/pre>\n\n\n\n<p>Aha! My username is daniel.rendall, let&#8217;s try that:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val ret = client.searchMessages(\"in:#67bricks-now-listening from:@daniel.rendall\", sort = Some(\"timestamp\"), sortDir = Some(\"asc\"), count = Some(5))<\/code><\/pre>\n\n\n\n<p>Gives:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"json\" class=\"language-json\">{\n  \"ok\": true,\n  \"query\": \"in:#67bricks-now-listening from:@daniel.rendall\",\n  \"messages\": {\n    \"total\": 3213,\n    \"pagination\": { ... etc }\n  }\n}<\/code><\/pre>\n\n\n\n<p>Success! Also &#8211; 3213 messages &#8211; sounds plausible. This is looking good&#8230; but sort direction seems wrong&#8230;? Try switching to &#8220;desc&#8221; =&gt; same result.<\/p>\n\n\n\n<p>(Time spent so far: about half an hour &#8211; better stop or will miss the morning call!)<\/p>\n\n\n\n<p><strong>Session 2<\/strong>: Re-run &#8211; still works (hooray!)<\/p>\n\n\n\n<p>Copy and paste output and save as response.json, fix up with jq so I can examine it:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"bash\" class=\"language-bash\">cat response.json | jq '.' &gt; response_tidied.json<\/code><\/pre>\n\n\n\n<p>And now:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"json\" class=\"language-json\">\"pagination\": {\n  \"total_count\": 3234,\n  \"page\": 1,\n  \"per_page\": 5,\n  ... etc<\/code><\/pre>\n\n\n\n<p>Number has gone up &#8211; I&#8217;m still listening to things!<\/p>\n\n\n\n<p>So, I <em>could<\/em> parse the responses to work out what the next page should be, or I could just loop &#8211; with pages of size 100 (if the API will return them) there should be 33. So we will loop and save these as 1.json, 2.json etc. First rule of scraping &#8211; aim to do it just once and save the result locally.<\/p>\n\n\n\n<p><strong>Horrible quick and dirty code alert<\/strong>!<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val outDir = new File(\"\/home\/daniel\/Scratch\/slack\/output\")\ndef main(args: Array[String]): Unit = {\n  outDir.mkdirs()\n  implicit val system = ActorSystem(\"slack\")\n  try {\n    val client = BlockingSlackApiClient(token)\n    (1 to 33).foreach { pageNum =&gt;\n      try {\n        val ret = client.searchMessages(\"in:#67bricks-now-listening from:@daniel.rendall\",\nsort = Some(\"timestamp\"),\nsortDir = Some(\"desc\"),\ncount = Some(100),\npage = Some(pageNum))\n        Files.write(new File(outDir, \"\" + pageNum + \".json\").toPath, ret.toString().getBytes(StandardCharsets.UTF_8), StandardOpenOption.CREATE)\n        println(s\"Got page $pageNum\")\n      } catch {\n        case NonFatal(e) =&gt;\n          println(s\"Couldn't get page $pageNum - ${e.getMessage}\")\n      }\n      Thread.sleep(1000)\n    }\n  } finally {\n    Await.result(system.terminate(), Duration.Inf)\n  }\n}<\/code><\/pre>\n\n\n\n<p>&#8230; prints up a reasuring list &#8220;Got page 1&#8221; =&gt; &#8220;Got page 33&#8221; and no (reported) errors!<\/p>\n\n\n\n<p>Second rule of scraping &#8211; having done it and got the data, zip it up and put it somewhere just in case you destroy it&#8230;<\/p>\n\n\n\n<p>Tidy it all (non essential, but makes it easier to look at):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"bash\" class=\"language-bash\">mkdir tidied\nls output | while read JSON ; do cat output\/$JSON | jq '.' &gt; tidied\/$JSON ; done<\/code><\/pre>\n\n\n\n<p>On scanning the data &#8211; it looks plausible, I can&#8217;t see an obvious &#8220;date&#8221; field but there&#8217;s a cryptic &#8220;ts&#8221; field (sample value: &#8220;1638290823.124400&#8221;) which is maybe a timestamp? A problem for another day&#8230;<\/p>\n\n\n\n<p>(Time spent this session: about 20 minutes)<\/p>\n\n\n\n<p><strong>Session 3<\/strong>: I can haz stats?<\/p>\n\n\n\n<p>Need to load it in. A new main method in a new object&#8230;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val outDir = new File(\"\/home\/daniel\/Scratch\/slack\/output\")\ndef main(args: Array[String]): Unit = {\n  val jsObjects = outDir.listFiles().map { f =&gt;\n    Json.parse(new FileInputStream(f))\n  }\n  println(jsObjects.head)\n}<\/code><\/pre>\n\n\n\n<p>Prints something sensible. Now need to get it in a useful form: define simplest class that could possibly work.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">case class Message(iid: UUID, ts: String, text: String, permalink: String)\nobject Message {\n  implicit val messageReads: Reads[Message] = (\n  (__ \\ \"iid\").read[UUID] and\n  (__ \\ \"ts\").read[String] and\n  (__ \\ \"text\").read[String] and\n  (__ \\ \"permalink\").read[String]\n  ) (Message.apply _)\n}<\/code><\/pre>\n\n\n\n<p>Not sure if I need the ID, but I like IDs. Looks like a UUID.<\/p>\n\n\n\n<p>&#8230; oh, also some classes to wrap the whole result with minimum of faff (and Reads, omitted for brevity):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">case class SearchResult(messages: Messages)\ncase class Messages(total: Int, matches: Seq[Message])<\/code><\/pre>\n\n\n\n<p>Go for broke:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val jsObjects: Array[JsResult[Seq[Message]]] = outDir.listFiles().map { f =&gt;\n  Json.parse(new FileInputStream(f)).validate[SearchResult].map(_.messages.matches)\n}<\/code><\/pre>\n\n\n\n<p>Unpleasant type signature alert &#8211; Array[JsResult[Seq[Message]]] Let&#8217;s assume nothing will go wrong and just use &#8220;.get&#8221; and &#8220;.flatMap&#8221;:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val messages: Seq[Message] = outDir.listFiles().flatMap { f =&gt;\n  Json.parse(new FileInputStream(f)).validate[SearchResult].map(_.messages.matches).get\n}.toList<\/code><\/pre>\n\n\n\n<p>That gives me 3234 Message objects, which is reassuring. They include top-level messages, and responses to threads. As far as I can see, the thread responses include a ?thread_ts parameter in their permalink, therefore filter them out &#8211; leaves 1792 remaining.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val filtered = messages.filterNot(_.permalink.contains(\"?thread_ts\"))\nfiltered.take(10).map(_.text).foreach(println)<\/code><\/pre>\n\n\n\n<p>&#8230;and voila:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Yellow Magic Orchestra &#8211; <a href=\"https:\/\/open.spotify.com\/album\/72noAkTZmKDGR5F2wSQNf0\">Yellow Magic Orchestra<\/a><\/li><li>Goldie Lookin Chain &#8211; <a href=\"https:\/\/open.spotify.com\/album\/4CvsCpaTJ3y4KQLfE4q5Oq\">Greatest Hits<\/a><\/li><li>M.I.A &#8211; <a href=\"https:\/\/open.spotify.com\/album\/2xoj2gYed3IYmGWn3owSfu\">Kala<\/a><\/li><li>Pink Floyd &#8211; <a href=\"https:\/\/open.spotify.com\/album\/7iLuEbxvxepyHp4yfVfiut\">Pulse<\/a><\/li><li>Pink Floyd &#8211; <a href=\"https:\/\/open.spotify.com\/album\/5c1ZTzT4oSkiiFS4wmEuOe\">Atom Heart Mother<\/a><\/li><li>Amon D\u00fc\u00fcl II &#8211; <a href=\"https:\/\/open.spotify.com\/album\/4ip7L3AtG2pLgZLhAON3va\">Yeti<\/a><\/li><li>The Charlatans &#8211; <a href=\"https:\/\/open.spotify.com\/album\/5cy4MGCvWdTaGp23Q7d79B\">Between 10th and 11th<\/a><\/li><li>Quite sweet that in Haydn&#8217;s era a surprise could be something as simple as having the orchestra play really quietly for a bit and then suddenly play really loudly.<\/li><li>Symphonies from Haydn, the Father of the Symphony<\/li><li>The Hold Steady &#8211; <a href=\"https:\/\/open.spotify.com\/album\/16XUMEdixzqRXVVPZsB3ak\">Thrashing Thru The Passion<\/a><\/li><\/ul>\n\n\n\n<p>The things I&#8217;m looking for will all have the format &#8220;Artist &#8211; Album&#8221;. Regex time!<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val ArtistAlbumRegex = \"(.*?) - (.*)\".r(\"artist\", \"album\")<\/code><\/pre>\n\n\n\n<p>Wait, what&#8230;? &#8220;@deprecated(&#8220;use inline group names like (?&lt;year&gt;X) instead&#8221;, &#8220;2.13.7&#8221;)&#8221;<\/p>\n\n\n\n<p>Didn&#8217;t know that had changed. Ho hum&#8230;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val ArtistAlbumRegex: Regex = \"(?&lt;artist&gt;.*?) - (?&lt;album&gt;.*)\".r\n\n  case ArtistAlbumRegex(artist, album) =&gt; ArtistAndAlbum(artist, album)\n}\nartistsAndAlbums.take(10).foreach(println)\n\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val artistsAndAlbums = messages.filterNot(_.permalink.contains(\"?thread_ts\")).map(_.text).collect {\n  case ArtistAlbumRegex(artist, album) =&gt; ArtistAndAlbum(artist, album)\n}\nartistsAndAlbums.take(10).foreach(println)<\/code><\/pre>\n\n\n\n<p>Even more promising:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Yellow Magic Orchestra &#8211; <a href=\"https:\/\/open.spotify.com\/album\/72noAkTZmKDGR5F2wSQNf0\">Yellow Magic Orchestra<\/a><\/li><li>Goldie Lookin Chain &#8211; <a href=\"https:\/\/open.spotify.com\/album\/4CvsCpaTJ3y4KQLfE4q5Oq\">Greatest Hits<\/a><\/li><li>M.I.A &#8211; <a href=\"https:\/\/open.spotify.com\/album\/2xoj2gYed3IYmGWn3owSfu\">Kala<\/a><\/li><li>Pink Floyd &#8211; <a href=\"https:\/\/open.spotify.com\/album\/7iLuEbxvxepyHp4yfVfiut\">Pulse<\/a><\/li><li>Pink Floyd &#8211; <a href=\"https:\/\/open.spotify.com\/album\/5c1ZTzT4oSkiiFS4wmEuOe\">Atom Heart Mother<\/a><\/li><li>Amon D\u00fc\u00fcl II &#8211; <a href=\"https:\/\/open.spotify.com\/album\/4ip7L3AtG2pLgZLhAON3va\">Yeti<\/a><\/li><li>The Charlatans &#8211; <a href=\"https:\/\/open.spotify.com\/album\/5cy4MGCvWdTaGp23Q7d79B\">Between 10th and 11th<\/a><\/li><li>The Hold Steady &#8211; <a href=\"https:\/\/open.spotify.com\/album\/16XUMEdixzqRXVVPZsB3ak\">Thrashing Thru The Passion<\/a><\/li><li>The Hold Steady &#8211; <a href=\"https:\/\/open.spotify.com\/album\/1lLya6vgwjJahE3TMq7IfR\">Boys and Girls in America<\/a><\/li><li>New Fast Automatic Daffodils &#8211; <a href=\"https:\/\/open.spotify.com\/album\/0MwGMuxZvOVTHFEMUoEexu\">Pigeonhole<\/a><\/li><\/ul>\n\n\n\n<p>Getting there! Now, there are bound to be loads of duplicates. So I guess the most obvious thing to do is count them. Let&#8217;s see if I can find the albums I&#8217;ve listened to the most, and their counts. I&#8217;m going to define a canonical key for grouping an ArtistAndAlbum just in case I&#8217;ve not been completely consistent in capitalisation.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">case class ArtistAndAlbum(artist: String, album: String) {\n&nbsp; val groupingKey: (String, String) = (artist.toLowerCase, album.toLowerCase)\n}<\/code><\/pre>\n\n\n\n<p>Then we should be able to count by:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val mostCommonAlbums = artistsAndAlbums.groupBy(_.groupingKey)\n.view.map { case (_, seq) =&gt; seq.head -&gt; seq.length }.toList.sortBy(_._2)sorted.take(10).foreach(println)<\/code><\/pre>\n\n\n\n<p>(The Bob Lazar Story &#8211; <a href=\"https:\/\/open.spotify.com\/album\/0u7Ud0VXD7hmliUsEbETwg\">Vanquisher<\/a>,1)<br>(The Pretenders &#8211; <a href=\"https:\/\/open.spotify.com\/album\/28Eu96aUziJU9iemBomWRs\">Pretenders<\/a> (152),1)<br>(Saxon &#8211; <a href=\"https:\/\/open.spotify.com\/album\/0TfijmN5YNZkPghOaGhJ4A\">Lionheart<\/a>,1)<br>(Leprous &#8211; <a href=\"https:\/\/open.spotify.com\/album\/2qdDgPLpiK0iY5ZqfJya8n\">Tall Poppy Syndrome<\/a>,1)<br>(Tom Petty and the Heartbreakers &#8211; <a href=\"https:\/\/open.spotify.com\/album\/708Whrc4abJEtqBINv9S2b\">Damn the Torpedoes<\/a> (231),1)<br>(2Pac &#8211; <a href=\"https:\/\/open.spotify.com\/album\/78iX7tMceN0FsnmabAtlOC\">All Eyez on Me<\/a> (436),1)<br>(Bon Iver &#8211; <a href=\"https:\/\/open.spotify.com\/album\/1r5JEclOv0s5S8GhFet0Wx\">For Emma, Forever Ago<\/a> (461),1)<br>(James &#8211; <a href=\"https:\/\/open.spotify.com\/album\/45LL6dxV381i17JsRbfSBt\">Stutter<\/a>,1)<br>(Elton John &#8211; <a href=\"https:\/\/open.spotify.com\/album\/2ei2X6ghPnw7YRwQtAH075\">Honky Ch\u00e2teau<\/a> (251),1)<br>(Ice Cube &#8211; <a href=\"https:\/\/open.spotify.com\/album\/3AI5kAUjgNtZBwFRi6opDc\">AmeriKKKa&#8217;s Most Wanted<\/a>,1)<\/p>\n\n\n\n<p>Ooops &#8211; wrong way &#8211; also the numbers in brackets need to be removed. Not sure there&#8217;s a nicer way to invert the ordering then explicitly passing the Ordering that I want to use&#8230;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val mostCommonAlbums = artistsAndAlbums.groupBy(_.groupingKey)\n.view.map { case (_, seq) =&gt; seq.head -&gt; seq.length }.toList.sortBy(_._2)(Ordering[Int].reverse)\nmostCommonAlbums.take(10).foreach(println<\/code><\/pre>\n\n\n\n<p>(Benny Andersson &#8211; <a href=\"https:\/\/open.spotify.com\/album\/3YGh4CN0JocLK0SwvJgMWc\">Piano<\/a>,8)<br>(Meilyr Jones &#8211; <a href=\"https:\/\/open.spotify.com\/album\/48IAjwtoTqbR7hIgFTBc9J\">2013<\/a>),7)<br>(Brian Eno &#8211; <a href=\"https:\/\/open.spotify.com\/album\/74jn28Kr29iyh8eZXSvnwi\">Here Come The Warm Jets<\/a>),4)<br>(Richard &amp;amp; Linda Thompson &#8211; <a href=\"https:\/\/open.spotify.com\/album\/3vCMmrJEx8CBtW4Hh0ehdl\">I Want To See The Bright Lights Tonight<\/a>),4)<br>(Admirals Hard &#8211; <a href=\"https:\/\/open.spotify.com\/album\/0Yzvw2AgzzYUsHsnryrCdM\">Upon a Painted Ocean<\/a>,4)<br>(Neuronspoiler &#8211; <a href=\"https:\/\/open.spotify.com\/album\/3bFD7InbTHp14nZXhOvMiH\">Emergence<\/a>,4)<br>(Global Communication &#8211; Pentamerous Metamorphosis),4)<br>(Steely Dan &#8211; <a href=\"https:\/\/open.spotify.com\/album\/3VwMlhrc3Z0YON3UNV0VSC\">Countdown To Ecstasy<\/a>,4)<br>(Pole &#8211; <a href=\"https:\/\/open.spotify.com\/album\/1dzdJ9A6zQrnWRw9q7HeME\">2<\/a>,4)<br>(Faith No More &#8211; <a href=\"https:\/\/open.spotify.com\/album\/59GwovfBk0Kp2HJw1G7E5Q\">Angel Dust<\/a>,4)<\/p>\n\n\n\n<p>That looks plausible, actually. I like Piano. I&#8217;m guessing there are loads of other &#8220;4&#8221; albums&#8230;<\/p>\n\n\n\n<p>But who is my most listened to artist? I have a shrewd idea I know who it will turn out to be &#8211; my prediction is that it will be a four word band name with the initials HMHB. Use the fact that I defined my grouping key to start with the artist<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val mostCommonArtists = artistsAndAlbums.groupBy(_.groupingKey._1)\n.view.map { case (_, seq) =&gt; seq.head.artist -&gt; seq.length }.toList.sortBy(_._2)(Ordering[Int].reverse)\nmostCommonArtists.take(10).foreach(println)<\/code><\/pre>\n\n\n\n<p>(<a href=\"https:\/\/open.spotify.com\/artist\/6hBQq083tyW3yrF1gdVt4Q\">Half Man Half Biscuit<\/a>,28)<br>(<a href=\"https:\/\/open.spotify.com\/artist\/2LIdnZDzySb04oH40be1fR\">Fairport Convention<\/a>,19)<br>(<a href=\"https:\/\/open.spotify.com\/artist\/4KWTAlx2RvbpseOGMEmROg\">R.E.M.<\/a>,18)<br>(<a href=\"https:\/\/open.spotify.com\/artist\/4OrrjMGltjy6ojX6034f8u\">Steeleye Span<\/a>,15)<br>(<a href=\"https:\/\/open.spotify.com\/artist\/5sMku8iI6FH3ypZTErz4kv\">Julian Cope<\/a>,15)<br>(Various,13)<br>(<a href=\"https:\/\/open.spotify.com\/artist\/0qLNsNKm8bQcMoRFkR8Hmh\">James<\/a>,12)<br>(<a href=\"https:\/\/open.spotify.com\/artist\/0lopEzYZq2mwBPDlpP4Bcw\">Cardiacs<\/a>,11)<br>(<a href=\"https:\/\/open.spotify.com\/artist\/6GbCJZrI318Ybm8mY36Of5\">Faith No More<\/a>,9)<br>(<a href=\"https:\/\/open.spotify.com\/artist\/7MSUfLeTdDEoZiJPDSBXgi\">Brian Eno<\/a>,9)<\/p>\n\n\n\n<p>Bingo! The mighty Half Man Half Biscuit in there at #1. One flaw is immediately apparent &#8211; this naive approach doesn&#8217;t distinguish between &#8220;listening to lots of albums by an artist as part of business-as-usual&#8221; and &#8220;listening to an artist&#8217;s entire back catalogue in one go&#8221; (which accounts for the high showings of Fairport Convention, R.E.M. and Steeleye Span). Worry about that some other time.<\/p>\n\n\n\n<p>How many albums have I listened to?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val distinctAlbums = artistsAndAlbums.distinctBy(_.groupingKey)\nprintln(\"Total albums = \" + artistsAndAlbums.length)\nprintln(\"Distinct albums = \" + distinctAlbums.length)<\/code><\/pre>\n\n\n\n<p>Total albums = 1371<br>Distinct albums = 1255<\/p>\n\n\n\n<p>.. but that will be wrong because I&#8217;ve listened to some albums in the context of e.g. working through the Rolling Stone or NME&#8217;s list of top 500 albums, and in those cases I appended the number to the list e.g. &#8220;Battles &#8211; Mirrored (NME 436)&#8221;. So chop that off the end of the album name:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java line-numbers\">val artistsAndAlbums = messages.filterNot(_.permalink.contains(\"?thread_ts\")).map(_.text).collect {\n  case ArtistAlbumRegex(artist, album) =&gt;\n    ArtistAndAlbum(artist, album.replaceAll(\"\\\\([^)]+\\\\)$\", \"\").trim)\n}<\/code><\/pre>\n\n\n\n<p>Distinct albums = 1191<\/p>\n\n\n\n<p>This final session took about 50 minutes, so if my maths is correct, the total time spent on this was a little under 2 hours. TBH I&#8217;m slightly dubious about the results; after listing all of the albums I&#8217;ve listened to in alphabetical order I&#8217;m sure there are some missing (e.g. I tackled the entire Prince back catalogue, but there were only a handful of Prince albums in there, ditto for David Bowie). I suspect a bit more work and exploration of the Slack API might reveal what I&#8217;m missing. Or maybe my method for distinguishing main messages from responses is wrong (just had a thought; maybe a main message that begins a thread also gets the ?thread_ts parameter).&nbsp; But it&#8217;s close enough for now, and appears to confirm my suspicion that Half Man Half Biscuit are my most listened to artist.<\/p>\n\n\n\n<p>And now, what with it being the season of goodwill and all that, it&#8217;s time for my special <a href=\"https:\/\/open.spotify.com\/playlist\/6MBCyHpLxEamCqKD8mDdQ2\">Christmas Playlist<\/a>&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A while ago, Tim suggested we could have a #now-listening channel in our company Slack, in which people could post details of what they were listening to. It occurred to me that it might be a fun challenge to try to figure out from what I&#8217;d posted on there who my favourite artist was, and &hellip; <a href=\"https:\/\/blog.67bricks.com\/?p=252\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;What have I been listening to?&#8221;<\/span><\/a><\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[26,27,25],"class_list":["post-252","post","type-post","status-publish","format-standard","hentry","category-scala","tag-hacking","tag-half-man-half-biscuit","tag-music"],"_links":{"self":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/252","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=252"}],"version-history":[{"count":29,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/252\/revisions"}],"predecessor-version":[{"id":865,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/252\/revisions\/865"}],"wp:attachment":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=252"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=252"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=252"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}