{"id":16,"date":"2008-03-19T10:20:29","date_gmt":"2008-03-19T09:20:29","guid":{"rendered":"http:\/\/67bricks.com\/blog\/2008\/03\/19\/bloom-filter-implementation-in-f\/"},"modified":"2021-12-16T10:52:20","modified_gmt":"2021-12-16T10:52:20","slug":"bloom-filter-implementation-in-f","status":"publish","type":"post","link":"https:\/\/blog.67bricks.com\/?p=16","title":{"rendered":"Bloom Filter implementation in F#"},"content":{"rendered":"<p>Further to my previous post on <a href=\"http:\/\/67bricks.com\/blog\/2008\/03\/15\/bloom-filters-for-efficient-ontology-querying-and-text-mining\/\">bloom filters for efficient ontology lookup<\/a>, I&#8217;ve made a simple implementation in F#. This is based on <a href=\"http:\/\/locut.us\/blog\/2008\/01\/12\/a-decent-stand-alone-java-bloom-filter-implementation\/\">a Java implementation by Ian Clarke<\/a>.<\/p>\n<p>The neat thing about Ian&#8217;s implementation is its use of Random to extend the hashcode provided by the object being stored into a hash of arbitrary length, suitable for use by the bloom filter algorithm. This will reduce the quality of the hash, but for an arbitrary passed-in object, it&#8217;s hard to do better. (For a specific application, like storing ontology labels, it would be better to use a more specific algorithm such as a <a href=\"http:\/\/burtleburtle.net\/bob\/hash\/doobs.html\">Jenkin&#8217;s Hash<\/a>).<\/p>\n<pre>#light\nopen System\n\/\/ Based on Java Bloom Filter implementation by Ian Clarke <ian@uprizer.com>\n\/\/ http:\/\/locut.us\/blog\/2008\/01\/12\/a-decent-stand-alone-java-bloom-filter-implementation\/\ntype BloomFilter(bitArraySize : int, expectedElementCount : int) =\n    let bitSet  = new System.Collections.BitArray(bitArraySize, false)\n    let bitArraySize = bitArraySize\n    let expectedElementCount = expectedElementCount\n    let k = (int) (Math.Ceiling( ((double) bitArraySize \/ (double) expectedElementCount) * Math.Log(2.0) ))\n    let bitSequence o =\n        let r = new Random( hash o )\n        Seq.init_infinite (fun n -&gt; r.Next(bitArraySize))\n    member b.expectedFalsePositiveProbability =\n        Math.Pow((1.0 - Math.Exp( -((double) k) * (double) expectedElementCount \/ (double) bitArraySize)), (double) k)\n    member b.add o =\n        let sq = bitSequence o\n        for x in 0 to k do bitSet.Set( Seq.hd sq , true)\n    member b.addAll os = Seq.iter b.add os\n    member b.clear = bitSet.SetAll(false)\n    member b.contains o =\n        let sq = bitSequence o\n        let isSet n = bitSet.Get( Seq.hd sq )\n        Seq.for_all isSet [0..k]\n    member b.containsAll os = Seq.for_all b.contains os\n<\/ian@uprizer.com><\/pre>\n<p>Using F# makes the code slightly neater than Ian&#8217;s Java version &#8211; I&#8217;ve been able to factor out the hash code into a sequence supplied by the bitSequence function, and to collapse some of the for loops into operations over lists instead. But, the basic structure of the code is still very similar.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Further to my previous post on bloom filters for efficient ontology lookup, I&#8217;ve made a simple implementation in F#. This is based on a Java implementation by Ian Clarke. The neat thing about Ian&#8217;s implementation is its use of Random to extend the hashcode provided by the object being stored into a hash of arbitrary &hellip; <a href=\"https:\/\/blog.67bricks.com\/?p=16\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Bloom Filter implementation in F#&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-16","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/16","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=16"}],"version-history":[{"count":1,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/16\/revisions"}],"predecessor-version":[{"id":292,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/16\/revisions\/292"}],"wp:attachment":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=16"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=16"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=16"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}