国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Home Database Mysql Tutorial Twilio incident and Redis

Twilio incident and Redis

Jun 07, 2016 pm 04:35 PM
and redis twilio

Twilio just released a post mortem about an incident that caused issues with the billing system: http://www.twilio.com/blog/2013/07/billing-incident-post-mortem.html The problem was about a Redis server, since Twilio is using Redis to stor

Twilio just released a post mortem about an incident that caused issues with the billing system:

http://www.twilio.com/blog/2013/07/billing-incident-post-mortem.html

The problem was about a Redis server, since Twilio is using Redis to store the in-flight account balances, in a master-slaves setup, with multiple slaves in different data centers for obvious availability and data safety concerns.

This is a short analysis of the incident, what Twilio can do and what Redis can do to avoid this kind of issues.

The first observation is that Twilio uses Redis, an in memory system, in order to save balances, so everybody will say "WTF Twilio! Are you serious with your data?". Actually Redis uses memory to serve data and to internally manipulate its data structures, but the incident has *nothing to do* with the durability of Redis as a DB. In fact Twilio stated that they are using the append only file that can be a very durable solution as explained here: http://oldblog.antirez.com/post/redis-persistence-demystified.html

The incident is actually centered around two main aspects of Redis:

1) The replication system.
2) The configuration.

I'll address they two things respectively.

Analysis of the replication issue
===

Redis 2.6 always needs a full resynchronization between a master and a slave after a connection issue between the two.
Redis 2.8 addressed this problem, but is currently a release candidate, so Twilio had no way to use the new feature called "partial resynchronization".

Apparently the master became unavailable because many slaves tried to resynchronize at the same time.

Actually for the way Redis works a single slave or multiple slaves trying to resynchronize should not make a huge difference, since just a single RDB is created. As soon as the second slave attaches and there is already a background save in progress in order to create the first RDB (used for the bulk data transfer), it is put in a queue with the previous slave, and so forth for all the other slaves attaching. Redis will just produce a single RDB file.

However what is true is that Redis may use additional memory with many slaves attaching at the same time, since there are multiple output buffers to "record" to transfer when the RDB file is ready. This is true especially in the case of replication over WAN. In the Twilio blog post I read "multiple data centers" so it is possible that the replication process may be slow in some case.

The bottom line is, Redis normally does not need to go slow when multiple slaves are resynchronizing at the same time, unless something strange happens like hitting the memory limit of the server, with the master starting to swap and/or problems with very slow disks (probably EC2?) so that creating an RDB starts to mess with the ability to write to the AOF file.

However issues writing to the AOF are a bit unlikely to be the cause, since during the AOF rewrite there is the same kind of disk i/o stress, with one thread writing a lot of data to the new AOF, and the other (main) thread logging every new write to the AOF. Everything considered memory pressure seems more probable, but Twilio engineers can just comment with details about what happened, this will be an useful real-world data point for sure.

From the Twilio side, what is possible to do to minimize incidents, is to understand exactly why the master is not able, with the current architecture, to survive without serious loss of performance to many slaves resynchronizing.

From the Redis side, well, we had to do our homework and provide partial resynchronization *long time ago* probably, we finally have it in Redis 2.8, and it is very good that a few days ago I pushed forward the 2.8 release skipping all the other pending features for this release that will be postponed for the next release. Now we have the first release candidate, in a few weeks this should be a release in the hands of users.

The configuration
===

The other obvious problem, probably the biggest one, was restarting the master with the wrong configuration.

Again I think here there was an human error that was "helped" by a Redis non perfect mechanism.

Basically up to Redis 2.6 you had CONFIG SET to change the configuration by hand, so it was possible for example to switch the system from RDB to AOF for more data safety with just:

redis-cli CONFIG SET appendonly yes

However you had to change the configuration file manually in order to ensure that the change will affect the instance after the next restart. Otherwise the change is only in the current in memory configuration and a restart will bring you back to the old config.

Maybe this was not the case, but it is not unlikely that Twilio engineers modified the wrong redis.conf file or forgot to do it in some way.

Fortunately Redis 2.8 provides a better workflow for on-the-fly configuration changes, that is:

redis-cli CONFIG SET appendonly yes
redis-cli CONFIG REWRITE

Basically the config rewriting feature will make sure to change the currently used configuration file, in order to contain the configuration changes operated by CONFIG SET, which is definitely safer.

In the end
===

I'll be happy to work with the Twilio engineers in the next weeks in order to understand the details and their requests and see how Redis can be improved to make incidents like this less likely to happen.

A real world test
===

I just tried to setup a master with AOF enabled, rotating disks, and a huge write load. Only trick is, it is bare metal entry-level hardware.

Then I put a steady load on it of 70k writes per second across 10 millions of keys.

Finally I tried to mass-resync four slaves form scratch multiple times.

Results:

$ redis-cli -h 192.168.1.10 --latency-history
min: 0, max: 26, avg: 0.97 (1254 samples) -- 15.00 seconds range
min: 0, max: 5, avg: 0.66 (1287 samples) -- 15.00 seconds range
min: 0, max: 2, avg: 0.62 (1290 samples) -- 15.00 seconds range
min: 0, max: 1, avg: 0.47 (1307 samples) -- 15.01 seconds range
min: 0, max: 10, avg: 0.48 (1306 samples) -- 15.00 seconds range
min: 0, max: 1, avg: 0.47 (1310 samples) -- 15.01 seconds range
min: 0, max: 3, avg: 0.45 (1311 samples) -- 15.01 seconds range
min: 0, max: 10, avg: 0.48 (1305 samples) -- 15.01 seconds range
min: 0, max: 23, avg: 0.49 (1306 samples) -- 15.01 seconds range
min: 0, max: 3, avg: 0.47 (1307 samples) -- 15.01 seconds range
min: 0, max: 36, avg: 0.86 (1255 samples) -- 15.00 seconds range
min: 0, max: 6, avg: 1.05 (1246 samples) -- 15.01 seconds range
min: 0, max: 21, avg: 0.52 (619 samples)^C

As you can see there is no moment in which the server struggles with this load. During the test the load continued to be accepted at the rate of 70k writes/sec.

This test is in no way able to simulate the Twilio architecture, but the bottom line here is, Redis is supposed to handle this well with minimally capable hardware so something odd happened, or there was a low memory condition, or there was the "EC2 effect", that is, some very poor disk performance allowed for memory pressure. Comments
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Laravel8 optimization points Laravel8 optimization points Apr 18, 2025 pm 12:24 PM

Laravel 8 provides the following options for performance optimization: Cache configuration: Use Redis to cache drivers, cache facades, cache views, and page snippets. Database optimization: establish indexing, use query scope, and use Eloquent relationships. JavaScript and CSS optimization: Use version control, merge and shrink assets, use CDN. Code optimization: Use Composer installation package, use Laravel helper functions, and follow PSR standards. Monitoring and analysis: Use Laravel Scout, use Telescope, monitor application metrics.

How to use the Redis cache solution to efficiently realize the requirements of product ranking list? How to use the Redis cache solution to efficiently realize the requirements of product ranking list? Apr 19, 2025 pm 11:36 PM

How does the Redis caching solution realize the requirements of product ranking list? During the development process, we often need to deal with the requirements of rankings, such as displaying a...

What should I do if the Redis cache of OAuth2Authorization object fails in Spring Boot? What should I do if the Redis cache of OAuth2Authorization object fails in Spring Boot? Apr 19, 2025 pm 08:03 PM

In SpringBoot, use Redis to cache OAuth2Authorization object. In SpringBoot application, use SpringSecurityOAuth2AuthorizationServer...

Recommended Laravel's best expansion packs: 2024 essential tools Recommended Laravel's best expansion packs: 2024 essential tools Apr 30, 2025 pm 02:18 PM

The essential Laravel extension packages for 2024 include: 1. LaravelDebugbar, used to monitor and debug code; 2. LaravelTelescope, providing detailed application monitoring; 3. LaravelHorizon, managing Redis queue tasks. These expansion packs can improve development efficiency and application performance.

Laravel environment construction and basic configuration (Windows/Mac/Linux) Laravel environment construction and basic configuration (Windows/Mac/Linux) Apr 30, 2025 pm 02:27 PM

The steps to build a Laravel environment on different operating systems are as follows: 1.Windows: Use XAMPP to install PHP and Composer, configure environment variables, and install Laravel. 2.Mac: Use Homebrew to install PHP and Composer and install Laravel. 3.Linux: Use Ubuntu to update the system, install PHP and Composer, and install Laravel. The specific commands and paths of each system are different, but the core steps are consistent to ensure the smooth construction of the Laravel development environment.

Redis's Role: Exploring the Data Storage and Management Capabilities Redis's Role: Exploring the Data Storage and Management Capabilities Apr 22, 2025 am 12:10 AM

Redis plays a key role in data storage and management, and has become the core of modern applications through its multiple data structures and persistence mechanisms. 1) Redis supports data structures such as strings, lists, collections, ordered collections and hash tables, and is suitable for cache and complex business logic. 2) Through two persistence methods, RDB and AOF, Redis ensures reliable storage and rapid recovery of data.

How to configure slow query log in centos redis How to configure slow query log in centos redis Apr 14, 2025 pm 04:54 PM

Enable Redis slow query logs on CentOS system to improve performance diagnostic efficiency. The following steps will guide you through the configuration: Step 1: Locate and edit the Redis configuration file First, find the Redis configuration file, usually located in /etc/redis/redis.conf. Open the configuration file with the following command: sudovi/etc/redis/redis.conf Step 2: Adjust the slow query log parameters in the configuration file, find and modify the following parameters: #slow query threshold (ms)slowlog-log-slower-than10000#Maximum number of entries for slow query log slowlog-max-len

In a multi-node environment, how to ensure that Spring Boot's @Scheduled timing task is executed only on one node? In a multi-node environment, how to ensure that Spring Boot's @Scheduled timing task is executed only on one node? Apr 19, 2025 pm 10:57 PM

The optimization solution for SpringBoot timing tasks in a multi-node environment is developing Spring...

See all articles