国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Table of Contents
set up
Unique username
Fresh content
Outdated data
結(jié)論
Home CMS Tutorial WordPress Explore the power of Bloom Filters using Node.js and Redis

Explore the power of Bloom Filters using Node.js and Redis

Sep 01, 2023 pm 10:53 PM

使用 Node.js 和 Redis 探索 Bloom Filter 的魅力

In the right use case, bloom filters look like magic. That's a bold statement, but in this tutorial we'll explore this strange data structure, how to best use it, and some practical examples using Redis and Node.js.

The Bloom filter is a probabilistic, one-way data structure. The word "filter" can be confusing in this context; filter means it's an active thing, a verb, but it might be easier to think of it as storage, a noun. With a simple bloom filter you can do two things:

  1. Add an item.
  2. Check if an item has not been added before.

These are important limitations to understand - you cannot delete items, nor can you list items in a bloom filter. Additionally, you cannot determine whether an item has been added to the filter in the past. This is where the probabilistic nature of Bloom filters comes into play - false positives are possible, but false positives are not. If the filter is set up correctly, the chance of false positives is very small.

Variants of Bloom filters exist that add additional functionality such as removal or scaling, but they also add complexity and limitations. Before moving on to variations, it is important to first understand a simple bloom filter. This article only introduces simple Bloom filters.

With these limits, you get many benefits: fixed size, hash-based encryption, and fast lookups.

When you set up a bloom filter, you need to specify a size for it. This size is fixed, so if there are one or billion items in the filter, it will never grow beyond the specified size. As you add more items to the filter, the likelihood of false positives increases. If you specify a smaller filter, the false positive rate will increase faster than if you use a larger filter.

Bloom filters are built on the concept of one-way hashing. Much like correctly storing passwords, Bloom filters use a hashing algorithm to determine the unique identifier of the item passed into it. A hash is essentially irreversible and is represented by a seemingly random string of characters. Therefore, if someone gains access to a bloom filter, it will not directly reveal anything.

Finally, bloom filters are fast. This operation involves far fewer comparisons than other methods and can be easily stored in memory, preventing performance-impacting database hits.

Now that you understand the limitations and advantages of Bloom filters, let's look at some situations where they can be used.

set up

We will illustrate Bloom filters using Redis and Node.js. Redis is the storage medium for Bloom filters; it's fast, in-memory, and has specific commands (GETBIT, SETBIT) that make implementation more efficient. I assume you have Node.js, npm, and Redis installed on your system. Your Redis server should be running on the default port on localhost for our example to work properly.

In this tutorial, we will not implement a filter from scratch; instead, we will implement a filter from scratch. Instead, we'll focus on a practical use of a pre-built module in npm: bloom-redis. bloom-redis has a very concise set of methods: add, contains, and clear.

As mentioned before, bloom filters require a hashing algorithm to generate an item's unique identifier. bloom-redis uses the well-known MD5 algorithm, which works fine although it may not be suitable for Bloom filters (a bit slow, a bit overkill).

Unique username

Usernames, especially those that identify the user in the URL, need to be unique. If you build an application that allows users to change their username, then you may want a username that is never used to avoid username confusion and attacks.

Without bloom filters, you would need to reference a table containing every username ever used, which can be prohibitively expensive at scale. Bloom filters allow you to add an item every time a user adopts a new name. When a user checks to see if the username is taken, all you need to do is check the bloom filter. It will be able to tell you with absolute certainty whether the requested username has been added previously. The filter may incorrectly return that the username has been taken when in fact the username has not been taken, but this is just a precaution and does not cause any real harm (other than that the user may not be able to declare "k3w1d00d47").

To illustrate this, let's build a fast REST server using Express. First, create the package.json file and then run the following terminal command.

npm install bloom-redis --save

npm install express --save

npm install redis --save

The default option size for bloom-redis is set to 2 MB. That's wrong out of caution, but it's quite large. Setting the size of the bloom filter is critical: too large and you waste memory, too small and the false positive rate will be too high. The math involved in determining the size is complex and beyond the scope of this tutorial, but luckily there is a bloom filter size calculator that does the job without having to crack a textbook.

Now, create app.js as follows:

var
  Bloom         =   require('bloom-redis'),
  express       =   require('express'),
  redis         =   require('redis'),
  
  app,
  client,
  filter;

//setup our Express server
app = express();

//create the connection to Redis
client = redis.createClient();


filter = new Bloom.BloomFilter({ 
  client    : client, //make sure the Bloom module uses our newly created connection to Redis
  key       : 'username-bloom-filter', //the Redis key
  
  //calculated size of the Bloom filter.
  //This is where your size / probability trade-offs are made
  //http://hur.st/bloomfilter?n=100000&p=1.0E-6
  size      : 2875518, // ~350kb
  numHashes : 20
});

app.get('/check', function(req,res,next) {
  //check to make sure the query string has 'username'
  if (typeof req.query.username === 'undefined') {
    //skip this route, go to the next one - will result in a 404 / not found
    next('route');
  } else {
   filter.contains(
     req.query.username, // the username from the query string
     function(err, result) {
       if (err) { 
        next(err); //if an error is encountered, send it to the client
        } else {
          res.send({ 
            username : req.query.username, 
            //if the result is false, then we know the item has *not* been used
            //if the result is true, then we can assume that the item has been used
            status : result ? 'used' : 'free' 
          });
        }
      }
    );
  }
});


app.get('/save',function(req,res,next) {
  if (typeof req.query.username === 'undefined') {
    next('route');
  } else {
    //first, we need to make sure that it's not yet in the filter
    filter.contains(req.query.username, function(err, result) {
      if (err) { next(err); } else {
        if (result) {
          //true result means it already exists, so tell the user
          res.send({ username : req.query.username, status : 'not-created' });
        } else {
          //we'll add the username passed in the query string to the filter
          filter.add(
            req.query.username, 
            function(err) {
              //The callback arguments to `add` provides no useful information, so we'll just check to make sure that no error was passed
              if (err) { next(err); } else {
                res.send({ 
                  username : req.query.username, status : 'created' 
                });
              }
            }
          );
        }
      }
    });
  }
});

app.listen(8010);

To run this server: node app.js. Go to your browser and point it to: https://localhost:8010/check?username=kyle. The response should be: {"username":"kyle","status":"free"}.

Now, let's save that username by pointing your browser to http://localhost:8010/save?username=kyle. The response will be: {"username":"kyle","status":"created"}. If the return address is http://localhost:8010/check?username=kyle, the response will be {"username":"kyle","status ":"used"} .Similarly, returning http://localhost:8010/save?username=kyle will result in {"username":"kyle","status":"not -created"} .

From the terminal you can see the size of the filter: redis-cli strlen username-bloom-filter.

Now, for one item, it should read 338622.

Now, go ahead and try to add more usernames using the /save route. You can try as many as you want.

If you check the dimensions again, you may find that the dimensions have increased slightly, but not with every addition. Curious, right? Internally, the bloom filter sets individual bits (1/0) at different locations in the string stored in username-bloom. However, these are not contiguous, so if you set a bit at index 0 and then set a bit at index 10,000, everything in between will be 0. For practical purposes, it's not important to understand the precise mechanics of each operation at first, just know that this is normal and you will never store more in Redis than you specify.

Fresh content

Fresh content on the website can attract users to return, so how to show new content to users every time? Using a traditional database approach, you would add a new row to a table containing the user identifier and story identifier, and then query the table when you decide to display a piece of content. As you might imagine, your database will grow very quickly, especially as your users and content grow.

In this case, the consequences of false negatives (e.g. not showing unseen content) are very small, making bloom filters a viable option. At first glance, you might think that each user needs a Bloom filter, but we'll use a simple concatenation of a user identifier and a content identifier, and then insert that string into our filter. This way we can use a single filter for all users.

In this example, let's build another basic Express server that displays content. Each time you access the route /show-content/any-username (any-username is any URL-safe value), a new piece of content will be displayed until the site is empty of content. In the example, the content is the first line of the top ten Project Gutenberg books.

We need to install another npm module. Run from terminal: npm install async --save

Your new app.js file:

var
  async         =   require('async'),
  Bloom         =   require('bloom-redis'),
  express       =   require('express'),
  redis         =   require('redis'),
  
  app,
  client,
  filter,
  
  // From Project Gutenberg - opening lines of the top 10 public domain ebooks
  // https://www.gutenberg.org/browse/scores/top
  openingLines = {
    'pride-and-prejudice' : 
      'It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.',
    'alices-adventures-in-wonderland' : 
      'Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it' }
      

If you pay careful attention to the round trip time in the development tools, you will find that the more times you request a single path with the username, the longer it takes. While checking the filter takes a fixed amount of time, in this case we are checking for the presence of more items. Bloom filters are limited in what they can tell you, so you are testing the presence of each item. Of course, in our example it's fairly simple, but testing hundreds of projects is inefficient.

Outdated data

In this example, we will build a small Express server that will do two things: accept new data via POST, and display the current data (using a GET request). When new data is POSTed to the server, the application checks whether it exists in the filter. If it doesn't exist we will add it to the collection in Redis, otherwise we will return null. A GET request will get it from Redis and send it to the client.

This is different from the first two situations, false positives are not allowed. We will use bloom filters as the first line of defense. Given the properties of bloom filters, we can only be sure that something is not in the filter, so in this case we can continue to let the data in. If the bloom filter returns data that might be in the filter, we check against the actual data source.

那么,我們得到了什么?我們獲得了不必每次都檢查實際來源的速度。在數(shù)據(jù)源速度較慢的情況下(外部 API、小型數(shù)據(jù)庫、平面文件的中間),確實需要提高速度。為了演示速度,我們在示例中添加 150 毫秒的實際延遲。我們還將使用 console.time / console.timeEnd 來記錄 Bloom 過濾器檢查和非 Bloom 過濾器檢查之間的差異。

在此示例中,我們還將使用極其有限的位數(shù):僅 1024。它很快就會填滿。當(dāng)它填滿時,它將顯示越來越多的誤報 - 您會看到響應(yīng)時間隨著誤報率的填滿而增加。

該服務(wù)器使用與之前相同的模塊,因此將 app.js 文件設(shè)置為:

var
  async           =   require('async'),
  Bloom           =   require('bloom-redis'),
  bodyParser      =   require('body-parser'),
  express         =   require('express'),
  redis           =   require('redis'),
  
  app,
  client,
  filter,
  
  currentDataKey  = 'current-data',
  usedDataKey     = 'used-data';
  
app = express();
client = redis.createClient();

filter = new Bloom.BloomFilter({ 
  client    : client,
  key       : 'stale-bloom-filter',
  //for illustration purposes, this is a super small filter. It should fill up at around 500 items, so for a production load, you'd need something much larger!
  size      : 1024,
  numHashes : 20
});

app.post(
  '/',
  bodyParser.text(),
  function(req,res,next) {
    var
      used;
      
    console.log('POST -', req.body); //log the current data being posted
    console.time('post'); //start measuring the time it takes to complete our filter and conditional verification process
    
    //async.series is used to manage multiple asynchronous function calls.
    async.series([
      function(cb) {
        filter.contains(req.body, function(err,filterStatus) {
          if (err) { cb(err); } else {
            used = filterStatus;
            cb(err);
          }
        });
      },
      function(cb) {
        if (used === false) {
          //Bloom filters do not have false negatives, so we need no further verification
          cb(null);
        } else {
          //it *may* be in the filter, so we need to do a follow up check
          //for the purposes of the tutorial, we'll add a 150ms delay in here since Redis can be fast enough to make it difficult to measure and the delay will simulate a slow database or API call
          setTimeout(function() {
            console.log('possible false positive');
            client.sismember(usedDataKey, req.body, function(err, membership) {
              if (err) { cb(err); } else {
                //sismember returns 0 if an member is not part of the set and 1 if it is.
                //This transforms those results into booleans for consistent logic comparison
                used = membership === 0 ? false : true;
                cb(err);
              }
            });
          }, 150);
        }
      },
      function(cb) {
        if (used === false) {
          console.log('Adding to filter');
          filter.add(req.body,cb);
        } else {
          console.log('Skipped filter addition, [false] positive');
          cb(null);
        }
      },
      function(cb) {
        if (used === false) {
          client.multi()
            .set(currentDataKey,req.body) //unused data is set for easy access to the 'current-data' key
            .sadd(usedDataKey,req.body) //and added to a set for easy verification later
            .exec(cb); 
        } else {
          cb(null);
        }
      }
      ],
      function(err, cb) {
        if (err) { next(err); } else {
          console.timeEnd('post'); //logs the amount of time since the console.time call above
          res.send({ saved : !used }); //returns if the item was saved, true for fresh data, false for stale data.
        }
      }
    );
});

app.get('/',function(req,res,next) {
  //just return the fresh data
  client.get(currentDataKey, function(err,data) {
    if (err) { next(err); } else {
      res.send(data);
    }
  });
});

app.listen(8012);

由于使用瀏覽器 POST 到服務(wù)器可能會很棘手,所以讓我們使用curl 來測試。

curl --data“您的數(shù)據(jù)放在這里”--header“內(nèi)容類型:text/plain”http://localhost:8012/

可以使用快速 bash 腳本來顯示填充整個過濾器的外觀:

#!/bin/bash
for i in `seq 1 500`;
do
  curl --data “data $i" --header "Content-Type: text/plain" http://localhost:8012/
done   

觀察填充或完整的過濾器很有趣。由于這個很小,你可以使用 redis-cli 輕松查看。通過在添加項目之間從終端運行 redis-cli get stale-filter ,您將看到各個字節(jié)增加。完整的過濾器將為每個字節(jié) \xff 。此時,過濾器將始終返回正值。

結(jié)論

布隆過濾器并不是萬能的解決方案,但在適當(dāng)?shù)那闆r下,布隆過濾器可以為其他數(shù)據(jù)結(jié)構(gòu)提供快速、有效的補充。

如果您仔細注意開發(fā)工具中的往返時間,您會發(fā)現(xiàn)使用用戶名請求單個路徑的次數(shù)越多,所需的時間就越長。雖然檢查過濾器需要固定的時間,但在本例中,我們正在檢查是否存在更多項目。布隆過濾器能夠告訴您的信息有限,因此您正在測試每個項目是否存在。當(dāng)然,在我們的示例中,它相當(dāng)簡單,但測試數(shù)百個項目效率很低。

The above is the detailed content of Explore the power of Bloom Filters using Node.js and Redis. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to use the WordPress testing environment How to use the WordPress testing environment Jun 24, 2025 pm 05:13 PM

Use WordPress testing environments to ensure the security and compatibility of new features, plug-ins or themes before they are officially launched, and avoid affecting real websites. The steps to build a test environment include: downloading and installing local server software (such as LocalWP, XAMPP), creating a site, setting up a database and administrator account, installing themes and plug-ins for testing; the method of copying a formal website to a test environment is to export the site through the plug-in, import the test environment and replace the domain name; when using it, you should pay attention to not using real user data, regularly cleaning useless data, backing up the test status, resetting the environment in time, and unifying the team configuration to reduce differences.

How to use Git with WordPress How to use Git with WordPress Jun 26, 2025 am 12:23 AM

When managing WordPress projects with Git, you should only include themes, custom plugins, and configuration files in version control; set up .gitignore files to ignore upload directories, caches, and sensitive configurations; use webhooks or CI tools to achieve automatic deployment and pay attention to database processing; use two-branch policies (main/develop) for collaborative development. Doing so can avoid conflicts, ensure security, and improve collaboration and deployment efficiency.

How to create a simple Gutenberg block How to create a simple Gutenberg block Jun 28, 2025 am 12:13 AM

The key to creating a Gutenberg block is to understand its basic structure and correctly connect front and back end resources. 1. Prepare the development environment: install local WordPress, Node.js and @wordpress/scripts; 2. Use PHP to register blocks and define the editing and display logic of blocks with JavaScript; 3. Build JS files through npm to make changes take effect; 4. Check whether the path and icons are correct when encountering problems or use real-time listening to build to avoid repeated manual compilation. Following these steps, a simple Gutenberg block can be implemented step by step.

How to set up redirects in WordPress htaccess How to set up redirects in WordPress htaccess Jun 25, 2025 am 12:19 AM

TosetupredirectsinWordPressusingthe.htaccessfile,locatethefileinyoursite’srootdirectoryandaddredirectrulesabovethe#BEGINWordPresssection.Forbasic301redirects,usetheformatRedirect301/old-pagehttps://example.com/new-page.Forpattern-basedredirects,enabl

How to flush rewrite rules programmatically How to flush rewrite rules programmatically Jun 27, 2025 am 12:21 AM

In WordPress, when adding a custom article type or modifying the fixed link structure, you need to manually refresh the rewrite rules. At this time, you can call the flush_rewrite_rules() function through the code to implement it. 1. This function can be added to the theme or plug-in activation hook to automatically refresh; 2. Execute only once when necessary, such as adding CPT, taxonomy or modifying the link structure; 3. Avoid frequent calls to avoid affecting performance; 4. In a multi-site environment, refresh each site separately as appropriate; 5. Some hosting environments may restrict the storage of rules. In addition, clicking Save to access the "Settings>Pinned Links" page can also trigger refresh, suitable for non-automated scenarios.

How to send email from WordPress using SMTP How to send email from WordPress using SMTP Jun 27, 2025 am 12:30 AM

UsingSMTPforWordPressemailsimprovesdeliverabilityandreliabilitycomparedtothedefaultPHPmail()function.1.SMTPauthenticateswithyouremailserver,reducingspamplacement.2.SomehostsdisablePHPmail(),makingSMTPnecessary.3.SetupiseasywithpluginslikeWPMailSMTPby

How to make a WordPress theme responsive How to make a WordPress theme responsive Jun 28, 2025 am 12:14 AM

To implement responsive WordPress theme design, first, use HTML5 and mobile-first Meta tags, add viewport settings in header.php to ensure that the mobile terminal is displayed correctly, and organize the layout with HTML5 structure tags; second, use CSS media query to achieve style adaptation under different screen widths, write styles according to the mobile-first principle, and commonly used breakpoints include 480px, 768px and 1024px; third, elastically process pictures and layouts, set max-width:100% for the picture and use Flexbox or Grid layout instead of fixed width; finally, fully test through browser developer tools and real devices, optimize loading performance, and ensure response

How to integrate third-party APIs with WordPress How to integrate third-party APIs with WordPress Jun 29, 2025 am 12:03 AM

Tointegratethird-partyAPIsintoWordPress,followthesesteps:1.SelectasuitableAPIandobtaincredentialslikeAPIkeysorOAuthtokensbyregisteringandkeepingthemsecure.2.Choosebetweenpluginsforsimplicityorcustomcodeusingfunctionslikewp_remote_get()forflexibility.

See all articles