We read about the black hole table type a long time ago and it’s kind of been a running joke – why in the hell would somebody want to insert into a black hole? Many times it’s been suggested as the solution to all of our data problems… Just swap out all the InnoDB tables for black hole and ta daaa!!! No more capacity issues.
So as you might guess it was merely coincidence when we discovered it’s functionality. Here’s the scenario:
1 DRBD/HA Master MySQL Server
1 DRBD/HA Relay Slave
20 Standalone Slaves
We found with one of our applications that it made alot of sense to keep some tables local and pull some from the master. The tables that were kept locally on the 20 web serving boxes are rolled up every 10 minutes and their data is sent on for processing. The tables that are replicated are slave to the Relay Slave, our middle man.
So where does black hole fit in?
We actually figured it out on the initial import of data. The beginning database size was around 2GB. We did the full import to the master and at the time, the relay slave was configured with InnoDB tables. In addition, we were doing the dump as data infile. Using data infile in a replication scenario works as follows.
- The entire file is imported to the master.
- Once the file is completed on the master, the relay slave then begins to import the data.
- Once the relay slave has completed the file, then the slaves begin to load the data.
In our case these files were quite large. What we discovered was, for each file we were going to have a hella long wait before it every got to the slaves (which is where we actually needed the data). The solution? Don’t write the data to the middle man. Since binloging is enabled on the relay slave, all of the associated SQL commands are being passed from the Master to the Relay Slave. Since we never planned on using the Relay Slave in any capacity other than passing on data, there’s no need to write the commands to disk. Hence, Black Hole. All of the SQL replication commands are executed on the Relay Slave, but the are basically sent to /dev/null so you save the write time.
The utility of this setup really stood out when using data infile because of the size of the files, it was really easy to see the lag in the middle. It will probably not be as important with standard inserts and updates to the master as those happen rather quickly. However, in the event of a problem, should we have to restore a table or whatever, we know moving forward we are going to save that write time in the middle and possible spare some downtime.