7 Exploring DNS issues
Ryan Nowak редактировал(а) эту страницу 2018-02-17 16:46:27 -08:00

Exploring DNS issues

After some discussion on the community standup, I spent some time trying to test the claims I've heard about HttpClient and DNS. I designed an experiment using my network setup at home and ran through it using my Windows desktop and OSX laptop.

TLDR

Here's a summary of my key findings.

EDIT: originally, I ran into some problems with OSX + curl + LibreSSL, which is not a supported scenario for .NET Core. Using OpenSSL works just fine for DNS updates, but is not reflecting in the graphic/table below.

image

Guidance:

  1. If you are on full framework (net4x) you might as well use the ServicePointManager scheme and disable handler rotation. Even after doing this, updates are slow.
  2. If you are on netcoreapp2.1 and Windows you need to use handler rotation.
  3. If you are on netcoreapp2.1 and OSX/Linux, you need to use handler rotation and you need to use OpenSSL.

Limitations:

  • It's possible that other settings on ServicePointManager might improve the speed of updates on Desktop windows

The Experiment

My goal was to set up a controller environment where I could cause DNS changes to occur and watch clients using HttpClient react to those changes. Conveniently, I have a router running OpenWRT - an OSS linux-based router operating system.

First, I wrote an MVC application that would just echo the name of the server to all callers. That looks like: {"message":"Hello from, WIN"} or {"message":"Hello from, MAc"}. By running this application on the same port on each machine, I would use DNS to map a hostname to one of the computers, and then update DNS to map to the other.

I also wrote a simple client application that just does the same HTTP request and print the content to the console, over and over in a loop. Well it actually does it twice. I create a long lived client and reuse it in a loop, but I also create another client from the factory with each iteration of the loop. This 'fresh' client makes use the IHttpClientFactory handler lifetime feature and will cycle the inner handlers. Using this program, I can easily see if the two clients don't agree on what IP to talk to.

I had to dig into OpenWRT's docs to understand a bit about dnsmasq, the simple DNS service that it runs. Primarily what I would need to do looks like this:

vim /etc/hosts #edit hosts file
killall dnsmasq #stop dnsmasq process
/etc/init.d/dnsmasq start #restart dnsmasq process

The out of the box configuration that I have for OpenWRT is that dnsmasq will suffix all hostnames on the local network with .lan. So if my Windows box is Ryan-PC, then I can use DNS on any machine on my lan to resolve Ryan-PC.lan. Additionally, dnsmasq will respond to DNS queries for any entry in its own hosts file for *.lan with a TTL of 0sec, which (in theory) means that it should not be cached.

So I defined a domain name test.example.lan that I could map back and forth between my two computers acting as servers.

Then it's just a matter of repeating the process:

  1. Start client (look at the output to tell which computer we're talking to)
  2. Update DNS (described above)
  3. Wait

Results

I wanted to try a few specific cases. I'm most interested in netcoreapp2.X on OSX and Windows and .Net 4.X on Windows only. Windows .Net 4.X has some different behavior and some additional API support for solving these kinds of problems. For each ease I want to test HttpClient with and without IHttpClientFactory's handler rotation feature - which is designed to solve these problems, at the cost of some complexity and overhead.

netcoreapp2.X on Windows

I started here because this is what I use most often.

It was easy to confirm that DNS doesn't update for an existing handler on Windows. I gave the client about 5 minutes to see if it would update and no joy. I tried ipconfig /flushdns to see if that would help.

The handler rotation does the job in this case. Here's a sample output from a few seconds after I changed the DNS entry. My handler lifetime in this case was ten seconds. I have the factory's logging for just the handler rotation feature enabled as well.

Cached: 1/23/2018 8:29:53 PM: {"message":"Hello from, WIN"}
Fresh: 1/23/2018 8:29:53 PM: {"message":"Hello from, WIN"}
Cached: 1/23/2018 8:29:55 PM: {"message":"Hello from, WIN"}
Fresh: 1/23/2018 8:29:55 PM: {"message":"Hello from, WIN"}
dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[100]
      Starting HttpMessageHandler cleanup cycle with 2 items
dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[101]
      Ending HttpMessageHandler cleanup cycle after 0.0026ms - processed: 0 items - remaining: 2 items
dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[103]
      HttpMessageHandler expired after 10000ms for client 'dns'
Cached: 1/23/2018 8:29:57 PM: {"message":"Hello from, WIN"}
Fresh: 1/23/2018 8:29:59 PM: {"message":"Hello from, MAC"}
DNS CHANGED
Cached: 1/23/2018 8:30:01 PM: {"message":"Hello from, WIN"}
Fresh: 1/23/2018 8:30:01 PM: {"message":"Hello from, MAC"}
DNS CHANGED
Cached: 1/23/2018 8:30:03 PM: {"message":"Hello from, WIN"}
Fresh: 1/23/2018 8:30:03 PM: {"message":"Hello from, MAC"}
DNS CHANGED

You can see from this example output that just after the DNS entry was updated, the next handler we created picked up the update. The existing long-lived HttpClient didn't see the update.

.NET 4.X on Windows

Up next, Desktop .NET.

Desktop .NET also has the ServicePointManager class. Well, thanks to netstandard, both Desktop .NET and dotnet Core have access to call the APIs, but they don't seem to do anything on dotnet Core, at least not for HttpClient.

There's a particular piece of code in the code in the client that attempts to set the connection lifetime to a low value. In theory this is pretty similar to what handler rotation does.

ServicePointManager.FindServicePoint(new Uri("http://test.example.lan:5000")).ConnectionLeaseTimeout = 10000;

The result on Desktop .Net is that both handler rotation and using ServicePointManager will cause you to see DNS updates..... eventually. The DNS updates show up after about 2 minutes or so, I imagine this is due to Windows DNS Client caching. Using ipconfig /flushdns made the updates trigger pretty quickly, so I imagine if you're in this environment you may want to configure the client cache time to a lower value.

Using a long-lived HttpClient/handler without using ServicePointManager will result in stale DNS.

netcore2.X on OSX

EDIT: On OSX the 'curl' based handler with OpenSSL just fine with handler rotation.

On OSX I cannot get the 'curl' based handler with LibreSSL to support DNS updates. This is also not a supported scenario.

Using the new managed handler makes handler rotation work to apply DNS changes. You can activate this currently by setting an environment variable, and a few other mechanisms. https://github.com/dotnet/corefx/blob/master/src/System.Net.Http/src/System/Net/Http/HttpClientHandler.cs#L20

Appendix A: examining dnsmasq

I also used the nslookup tool (Windows) to try and inspect what's coming from my DNS. It's been a long time since I've looked at DNS output, but for better or worse, I wanted to verify that it was reasonable. I mostly wanted to see the TTL.

nslookup -type=A -debug test.example.lan 192.168.1.1
------------
Got answer:
    HEADER:
        opcode = QUERY, id = 1, rcode = NOERROR
        header flags:  response, auth. answer, want recursion, recursion avail.
        questions = 1,  answers = 1,  authority records = 0,  additional = 0

    QUESTIONS:
        1.1.168.192.in-addr.arpa, type = PTR, class = IN
    ANSWERS:
    ->  1.1.168.192.in-addr.arpa
        name = Nimitz.lan
        ttl = 0 (0 secs)

------------
Server:  Nimitz.lan
Address:  192.168.1.1

------------
Got answer:
    HEADER:
        opcode = QUERY, id = 2, rcode = NXDOMAIN
        header flags:  response, want recursion, recursion avail.
        questions = 1,  answers = 0,  authority records = 0,  additional = 0

    QUESTIONS:
        test.example.lan.lan, type = A, class = IN

------------
------------
Got answer:
    HEADER:
        opcode = QUERY, id = 3, rcode = NOERROR
        header flags:  response, auth. answer, want recursion, recursion avail.
        questions = 1,  answers = 1,  authority records = 0,  additional = 0

    QUESTIONS:
        test.example.lan, type = A, class = IN
    ANSWERS:
    ->  test.example.lan
        internet address = 192.168.1.104
        ttl = 0 (0 secs)

------------
Name:    test.example.lan
Address:  192.168.1.104

I am also aware that some systems don't behave correctly with a DNS TTL of 0sec. Fortunately dnsmasq allows you to configure this. I changed the TTL to 2sec for most of my testing just to be sure.