Thursday 13th July 2017

Network Upstream DNS authority outage, scheduled 7 years ago

At approximately 1 AM EDT (-0400 GMT) monitoring picked up incorrect reverse DNS results upstream from our delegative authority, ns1/2/3/4.networktransit.net. The issue has been traced back to inconsistent DNS records for networktransit.net as well as incorrect PTR delegation. This has not had an immediate impact on email, because PTR records are cached for up to 24 hours. By 1 PM EDT the problem still persists waiting on a solution upstream. At this time there is limited impact to email deliverability as a mail server checks the IP address matches has a hostname and the hostname matches the IP.

3:18 PM EDT: further digging has yielded that DNS delegation on 64.22.68.1/24 is incorrect in networktransit.net's zone definition. Waiting to hear back upstream.

68.22.64.in-addr.arpa.  86400   IN      NS      ns1.networktransit.net.
68.22.64.in-addr.arpa.  86400   IN      NS      ns2.networktransit.net.
68.22.64.in-addr.arpa.  86400   IN      NS      ns3.networktransit.net.
68.22.64.in-addr.arpa.  86400   IN      NS      ns4.networktransit.net.
;; Received 131 bytes from 199.180.180.63#53(199.180.180.63) in 227 ms

1.68.22.64.in-addr.arpa.68.22.64.in-addr.arpa. 86400 IN NS ns1.apisnetworks.com.
1.68.22.64.in-addr.arpa.68.22.64.in-addr.arpa. 86400 IN NS ns2.apisnetworks.com.
;; Received 117 bytes from 205.251.138.2#53(205.251.138.2) in 129 ms

7:02 PM EDT: still working with upstream vendor waiting on response. Confirmed with data center vendor reported an "outage" last night. Whatever that entails is unknown; however, likely to be related to the zone misconfiguration that also affects other class B IP addresses in the neighborhood.

July 15, 12:22 AM EDT: no response from upstream NOC (NetDepot) that manages DNS. Spoke with data center, NS records to mail servers were converted to PTR records around 10:30 PM. No change reported when querying ns1/2/3/4.networktransit.net. If this problem persists into tomorrow, servers may begin relaying mail through an external network, which will cause issues with SPF records. Hope for the best.

# dig @205.251.138.2 -x 64.22.68.2

; <<>> DiG 9.9.4-RedHat-9.9.4-38.el7_3.3 <<>> @205.251.138.2 -x 64.22.68.2
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41113
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 0

;; QUESTION SECTION:
;2.68.22.64.in-addr.arpa.       IN      PTR

;; AUTHORITY SECTION:
2.68.22.64.in-addr.arpa.68.22.64.in-addr.arpa. 86400 IN NS ns1.apisnetworks.com.
2.68.22.64.in-addr.arpa.68.22.64.in-addr.arpa. 86400 IN NS ns2.apisnetworks.com.

12:56 AM EDT: reporting PTR records on ns1.networktransit.net with a dangling NS authority record for ns2 still present (which usurps authority). Hopefully a step in the right direction and just a matter of cache.

# dig @ns1.networktransit.net +norec -x 64.22.68.2

;; QUESTION SECTION:
;2.68.22.64.in-addr.arpa.       IN      PTR

;; ANSWER SECTION:
2.68.22.64.in-addr.arpa. 86400  IN      PTR     image.apisnetworks.com.

;; AUTHORITY SECTION:
2.68.22.64.in-addr.arpa.68.22.64.in-addr.arpa. 86400 IN NS ns2.apisnetworks.com.

1:06 AM EDT: rDNS confirmed in place for mail servers. Mail will continue to flow over the next 2-4 hours to its intended recipients. This fix will remain in place until the NOC corrects its PTR record delegation. Updates will be posted as they come through.

July 17, 3:09 PM EDT: No update yet from NOC. Mail is continuing to flow through as expected; however, reverse DNS assignment is still invalid which has minimal impact.

July 19, 2:52 PM EDT: DNS still not restored. Escalated incident with Zayo supervisor. Supervisor has informed us that through confirmation with engineering that NetDepot changed DNS backends on Friday. The transfer has not gone over as smoothly as expected. Furthermore, NetDepot will not address tickets from Zayo unless they come from Zayo's Accounts Payable department. Supervisor in process of working with engineering team to open a ticket with NetDepot from Zayo's AP department.

Incidentally, Zayo now owns the data center that NetDepot once operated from and NetDepot still has a presence there. NetDepot owns the IP address space. What fun!

3:21 PM EDT: rDNS coming back online. Issue appears to have been resolved.

July 20, 2:20 PM EDT Periodic timeouts reported on ns4.networktransit.net. Escalated issue with NOC, awaiting response. Remaining nameservers properly forward rDNS to the correct authoritative nameservers (ns1/2.apisnetworks.com).

3:58 PM EDT rDNS appears to be operational once again across all 4 nameservers.