DNS TTLs and Best Settings
This is one of the most common questions we get asked and the answer isn't always straight forward so we wanted to provide you with some best practices to keep in mind when setting your TTLs. TTL, or Time To Live, is the amount of time measured in seconds that a record is cached in a resolver when the record is queried and it is set within each and every record in your DNS configuration.
High TTLs
For example, let say you have the A record www.example.com pointing to 1.1.1.1 and a TTL of 86400 (24 hours). When Client A queries www.example.com, the IP 1.1.1.1 will be stored in their cache for a full day and Client A will not make another query for www.example.com as their resolver already knows which IP to go to and for how long. If an hour later, you realize you need to change the IP for that A record to 2.2.2.2, Client A will still be going to 1.1.1.1 for the next 23 hours until the TTL counts down to 0, the record expires and a new query can be made against that FQDN. For the sake of flexibility and site availability, this is obviously not an ideal scenario.
Low TTLs
At the other end of the spectrum, let's say you set your TTL to 30 so that if you need to make frequent changes to the DNS, the end-user's experience is minimally impacted and allows you the most flexibility. This sounds perfect but if your user is perusing your site all day, they are querying the www.example.com record every 30 seconds or so, and when you multiply that by however many users, your query count goes up and at the end of the month, it costs you more.
The purpose of these examples is to show that there is no universal golden TTL, it really comes down to your use case so we will go over the two most common and polarized use cases.
Most Common DNS TTL Use Cases
"I'm Using Failover and Need the Lowest Downtime Possible"
If your FQDN's availability is absolutely imperative, failover is a must but it requires a low TTL to work the way you want it to. Failover cannot bypass TTLs in any way; if you have a TTL of 1800 (seconds) for your failover record and it fails over the second IP, users will not be directed to the updated IP for up to 30 minutes (until the cache in the resolver expires). You are going to want to set a low TTL and the lower the better. Having said that, many resolvers won't recognize TTLs lower than 30 seconds but you can always make a test record to find out if your resolver allows for TTLs below 30.
"This Record Does Not Need Failover but We Do Make Changes"
If the record in question is not mission-critical but gets updated from time to time, a higher TTL is the way to go. The most common follow-up question is "what about when I need to make a change?" and the answer is to plan for the update. Suppose you have the TTL set for 7200 (2 hrs), once you know you want to make the update, lower the TTL to however long you are comfortable having downtime (30 is the lowest we recommend) and then wait out the TTL, in this case you would wait 2 hours. Once two hours have passed, you can change the record and raise the TTL back up.