If you’re using CloudFormation, this should save you a boat load of time. Amazon now parallelizes the various resources that have no defined or implicit dependencies on other resources. This means that your stack creation time can be sliced by a significant percentage! My colleague Dave, our resident puppet master (no pun intended!), noted a reduction of over 30% in his stack creation time!
Shawn Smith, one of our extremely talented and bright engineers, discusses why we chose Cassandra at Bazaarvoice.
Removing nodes from a Cassandra ring
This week I had the pleasant experience of accidentally over-provisioning one of our Cassandra rings. As part of the data infrastructure team, scaling Cassandra is one of many things we do on a day-to-day basis. If you scan the internet, there’s some basic information about how to safely remove nodes from a Cassandra ring, but it’s fairly vague and hard to tell for sure what would happen if you simply execute
nodetool decommission and hope for the best.
I’m going to walk you through the process of removing nodes from a Cassandra ring without downtime and without degradation of service.
The steps I’m about to describe apply to removing a healthy node from a Cassandra ring. If you’re trying to remove a dead node from a ring, you should look into the
nodetool removetoken command. I’m not going to discuss what to do if a node in your ring dies. This is for removing a healthy node only.
Removing a node
This is the easy part. Connect to one of the nodes in your ring and get a list of nodes.
# nodetool ring
Now, you can choose a node to decommission from here. If you have a specific node in mind, that’s fine; if you can’t decide, consider removing the node with both the smallest effective ownership and the smallest load (in gigabytes). This will reduce the time it takes to decommission the node because the node does not carry a significant percentage of replicated data, which must be migrated to the subsequent node during decommission. If removing multiple nodes, it’s always most efficient to remove the node nearest to the last node (the node with the highest token). If you were to remove nodes in any other order, you would simply be copying the same data from one node, to another, to another, which would waste time and network bandwidth.
Connect to the node you want to decommission and run
nodetool decommission. Cassandra will offload all of the data on that node to the next node in the ring. It will always (smartly) choose the next node in the ring due to the token assigned to the adjacent node. After running
nodetool decommission, you may notice that you can’t really do anything — you’re just waiting for the command to complete. You can suspend the process (Ctrl+Z) and hit
bg to swap it to the background, but for safety, I like to open up another terminal and connect to a different node in the ring separately. Once I’ve connected, I’ll run the following command to watch all the data flow from the “leaving” node to the subsequent node in the ring:
# watch nodetool -h <hostname of node leaving> netstats
nodetool netstats command is used to watch the flow of data (replication, bootstrapping, etc) from one node to and from its peers (generally the adjacent node during decommissioning). The
netstats command is helpful because you can see all of the data leave the node. Once it’s finished, it will remove itself from the ring and report “DECOMMISSIONED.”
Congratulations! You’ve removed the node. If you run
nodetool ring on any of the other nodes, you’ll notice that the node you just decommissioned went from a “leaving” state to not showing up as part of the ring altogether.
A note about Priam
If you’re using Netflix’s Priam, you will need to connect to Amazon SimpleDb and locate the entry for the node you decommissioned. When you find it, delete it so Priam doesn’t try any funny business (it shouldn’t unless you restart the Priam service or add a node to the ring again).
Once you’ve removed the node successfully, you may consider running
nodetool repair --partitioner-range (or
nodetool repair -pr for short) on every node in the ring. But not at the same time! Do it one node at a time to prevent degrading the performance of the ring. This is a due-diligence task to ensure that each node has all the appropriate data for its primary range.
nodetool repair automatically attempts to repair a node’s partitioner range including replication. You want to perform a repair to fill in any missing data in each node’s primary range, but it would be a waste of computing power to take into account the replication strategy at this point. This will also help reduce the impact of running a repair on a node. Performing a
nodetool repair operation is not required, but it may be a good idea if you’re concerned about all your data being there or, more specifically, if you’re concerned with the accuracy of distributed deletes.
In addition to repair, you may wish to run
nodetool cleanup on any remaining nodes to ditch any data not relevant for a particular node. (This doesn’t affect query results, it just frees up disk space on your nodes.) This isn’t required either, but if you are worried about disk space for any reason, it may help.
After removing a node, I recommend backing up the data on that node for safekeeping before you dispose of the server entirely. It’s data after all.
You might be wondering how to reassure your boss (and yourself) that no data loss occurred during the scale-down operation. The answer is, Cassandra doesn’t provide any specific tool or function for this. My recommendation is to count the row keys in each column family that’s important to you before and after the scaling operation, and verify that, at the very least, the number of row keys after the scale-down operation is equal to or greater than the number of row keys when you started. This, of course, may not be particularly helpful depending on your schema.
Big data, big deal
We’ve been doing a lot with Cassandra at Bazaarvoice. Curious about scaling big data systems? Cassandra isn’t the only system we use for big data. Come by and find out.
Amazon’s security blog describes some basics about using a Bastion server to harden your EC2 instances against attack. This is great advice!
At Bazaarvoice, we follow this practice with our own internal service that is a self-contained application (which, by coincidence, we also call “the Bastion”).
The basic idea
Infrastructure lives in a VPC (virtual private network) and anyone with access must connect to the Bastion host and forward their keys to subsequent hosts, since none of these hosts accept connections from anywhere other than the Bastion service. As an added measure, the Bazaarvoice Bastion host lives within a private network itself, which is only accessible from within the private network or over a VPN connection if you’re connecting remotely. This further constricts access, since the host can’t even be reached without proper access. Finally, Bazaarvoice has monitoring in place that allows us to detect unusual behavior on the Bastion hosts and, more importantly, quickly cut off connections that appear to be malicious. The system is self-updating, installing new access for users on an ongoing and frequent basis.
We’re solving interesting problems at Bazaarvoice, and we’re on the cutting edge of cloud technology on a tremendous scale. Curious about what else we’re doing? Come find out!
Milo’s efforts have reached the open source community. Nice work, Milo!
We are pleased to announce a new open source contribution, a Java based JSON to JSON transformation tool named Jolt.
Yeah, right. Like Flickr, delicious, Geocities, and every other site/service Yahoo has ever bought.
Time to go back to Wordpress, I guess. Sigh.
For those interested, I’ll be attending DevOps Days Austin Tuesday and Wednesday. I’m really looking forward to this event, and Bazaarvoice is a gold sponsor, along with a lot of other really cool folks. See you all there!
GrayBit finds a new home with Joe Dolson of Accessible Web Design. Thanks for taking up the gauntlet, Joe!