{"id":14,"date":"2008-03-15T12:36:58","date_gmt":"2008-03-15T11:36:58","guid":{"rendered":"http:\/\/67bricks.com\/blog\/2008\/03\/15\/bloom-filters-for-efficient-ontology-querying-and-text-mining\/"},"modified":"2021-12-16T10:50:54","modified_gmt":"2021-12-16T10:50:54","slug":"bloom-filters-for-efficient-ontology-querying-and-text-mining","status":"publish","type":"post","link":"https:\/\/blog.67bricks.com\/?p=14","title":{"rendered":"Bloom Filters for efficient ontology querying and text mining"},"content":{"rendered":"<p>One of the problems with large ontologies such as <a href=\"http:\/\/www.connectingforhealth.nhs.uk\/systemsandservices\/data\/snomed\">SNOMED Clinical Terms<\/a> is that they&#8217;re, well, large. So, it&#8217;s not typically possible to hold all of the ontology in memory at once, and queries against it require a database lookup. It&#8217;s possible to eliminate a number of database accesses, and thus speed up the query process, by using a Bloom filter.<\/p>\n<p>A <a href=\"http:\/\/en.wikipedia.org\/wiki\/Bloom_filter\">Bloom filter<\/a> is a memory-efficient probabilistic data structure that lets you test whether a particular item is a member of a set. It may return false positives, but not false negatives. So, by adding all of the terms in your ontology to a Bloom filter, you can do a fast, in-memory check to see whether an entered term definitely doesn&#8217;t exist in your ontology. If the Bloom filter reports that the term does exist, then you can confirm with a slower file or database query for that term.<\/p>\n<p>In an application where you expect to encounter many terms that aren&#8217;t in the ontology, such as automated metadata extraction from documents, and automated document classification, then this can potentially lead to large performance improvements.<\/p>\n<p>I think there are also interesting possibilities in using Bloom filters in environments where storing a whole ontology isn&#8217;t feasible. For example, a JavaScript implementation of a Bloom filter, initialized with a few 100kb of data, could give a fairly high probability of testing accurately whether a particular term exists in an ontology of half-a-million terms.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the problems with large ontologies such as SNOMED Clinical Terms is that they&#8217;re, well, large. So, it&#8217;s not typically possible to hold all of the ontology in memory at once, and queries against it require a database lookup. It&#8217;s possible to eliminate a number of database accesses, and thus speed up the query &hellip; <a href=\"https:\/\/blog.67bricks.com\/?p=14\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Bloom Filters for efficient ontology querying and text mining&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-14","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/14","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14"}],"version-history":[{"count":1,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/14\/revisions"}],"predecessor-version":[{"id":290,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/14\/revisions\/290"}],"wp:attachment":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}