Software

CoreDNS at Home: Reclaiming Local DNS from pfSense

Local DNS starts out harmless: a few host overrides here, one internal service there. But once VLANs, multiple interfaces, and a growing homelab enter the picture, those convenient little firewall UI entries turn into hidden infrastructure state. In this post, I walk through how I moved my LAN DNS from pfSense host overrides to a clean CoreDNS-based control plane — fast, deterministic, reproducible, and finally aligned with how the network actually works.

There are infrastructure problems that look trivial from the outside.

“Just add a DNS entry.”

“Just create a firewall rule.”

“Just use the router’s DNS resolver.”

And technically, yes, that works — until the network grows a little, until VLANs appear, until services live on multiple interfaces, until you want reproducibility, and until every “just one more override” becomes one more small piece of hidden, non-idempotent state in a firewall web UI.

This is the story of one of those deceptively small problems: local DNS in a segmented LAN.

More precisely:

How do you run clean, fast, deterministic DNS for internal services across multiple VLANs, without manually maintaining pfSense host overrides forever?

The final solution was not exotic. No Kubernetes, no Consul, no enterprise appliance, no service mesh. Just a Debian server, CoreDNS, pfSense/Unbound, Ansible, and some careful thinking about how DNS resolution actually behaves.

And as usual with infrastructure, the hard part was not installing the software. The hard part was understanding the control flow.


The starting point: pfSense DNS worked, but not well enough

The network already used pfSense as the central firewall and DNS resolver. That is a fairly common and reasonable setup.

pfSense runs Unbound as its DNS resolver, and for many environments that is good enough:

  • DHCP integration
  • local host overrides
  • domain overrides
  • forwarding or recursive resolution
  • DNS over TLS support
  • a UI that is approachable enough

For a small LAN, this is perfectly fine.

But over time, the local DNS setup had grown beyond “a few host overrides.”

There were many internal services:

prometheus.myhost.lan
grafana.myhost.lan
git.myhost.lan
dev.myhost.lan
vpn.myhost.lan

Some entries were internal only. Some belonged to a public domain used internally. Some needed different answers depending on the client VLAN. And all of it was managed through the pfSense UI.

That created a few problems:

  • manual changes
  • no proper version control
  • no easy rollback
  • no idempotency
  • no reusable deployment logic
  • no real infrastructure-as-code workflow

This was not really a DNS performance problem. It was a DNS control plane problem.

pfSense remained excellent as a firewall and as a central resolver/cache, but it was not the right place to maintain a growing, structured, local DNS authority.


The architectural decision: separate resolver and authority

The key mental shift was this:

The firewall should not necessarily be the source of truth for all internal DNS records.

A cleaner architecture is to separate responsibilities:

Clients
   ↓
CoreDNS / pfSense DNS resolver
   ↓
Local authoritative records
   ↓
pfSense / Unbound
   ↓
Public upstream DNS

Or more specifically:

CoreDNS = local DNS authority / overlay / VLAN-aware logic
pfSense Unbound = recursive resolver, cache, upstream forwarding, firewall integration
Public DNS = global resolution

This gives us a layered model:

Layer Responsibility
CoreDNS Internal records, local overlays, VLAN-aware answers
pfSense / Unbound General DNS resolution, cache, upstream forwarding
Public resolvers External DNS resolution

The important part: CoreDNS does not need to replace pfSense entirely.

It can simply become the part of the system that handles local DNS logic in a declarative way.


Why CoreDNS?

CoreDNS is small, fast, plugin-based, and configured through a simple text file called a Corefile.

That makes it ideal for this kind of task:

  • easy to deploy on Debian
  • easy to template with Ansible
  • easy to version in Git
  • supports static host records
  • supports forwarding
  • supports Prometheus metrics
  • supports client-aware views
  • simple enough to reason about

The goal was not to build an overcomplicated internal DNS platform. The goal was to replace hidden UI state with a clean, reproducible system.


The basic CoreDNS model

A minimal CoreDNS setup for local records looks like this:

myhost.lan:53 {
    errors
    cache 300

    hosts {
        192.168.10.20 prometheus.myhost.lan
        192.168.10.21 grafana.myhost.lan
        fallthrough
    }

    forward . 192.168.10.1
}

The logic is:

  1. If CoreDNS knows the hostname, return the local IP.
  2. If not, fallthrough.
  3. Forward the unresolved query to pfSense/Unbound.

For a public domain used internally, the same idea works:

example.org:53 {
    errors
    cache 300

    hosts {
        192.168.10.30 internal-api.example.org
        192.168.10.31 dev-only.example.org
        fallthrough
    }

    forward . 192.168.10.1
}

This is useful for split-horizon style DNS:

internal-api.example.org → local IP inside LAN
www.example.org          → public DNS via upstream

So far, so simple.

But the real problem was more interesting.


The multi-VLAN problem

One internal service, git.myhost.lan, was reachable from four VLANs.

The server itself had interfaces in all four VLANs:

VLAN 10 → 192.168.10.50
VLAN 20 → 192.168.20.50
VLAN 30 → 192.168.30.50
VLAN 40 → 192.168.40.50

The goal was simple:

Client in VLAN 10 → git.myhost.lan → 192.168.10.50
Client in VLAN 20 → git.myhost.lan → 192.168.20.50
Client in VLAN 30 → git.myhost.lan → 192.168.30.50
Client in VLAN 40 → git.myhost.lan → 192.168.40.50

Why?

Because then traffic stays local to the VLAN.

Without that, a client in VLAN 20 might resolve git.myhost.lan to the VLAN 10 IP, causing traffic to go through the firewall. Since the network did not yet have switches with proper inter-VLAN routing capabilities, that meant unnecessary firewall traversal and an ugly topology leak.

The workaround had been:

git1.myhost.lan → VLAN 10 IP
git2.myhost.lan → VLAN 20 IP
git3.myhost.lan → VLAN 30 IP
git4.myhost.lan → VLAN 40 IP

Functional, but not elegant.

The hostname should describe the service, not the network topology.

So the real requirement was:

Same hostname, different answer depending on the querying client subnet.

That is split-horizon DNS, or more precisely, client-subnet-aware DNS.


CoreDNS views: the promising mechanism

CoreDNS has a view plugin that can match queries based on client information.

Conceptually, this allows something like:

myhost.lan:53 {
    view vlan10 {
        expr incidr(client_ip(), '192.168.10.0/24')
    }

    hosts {
        192.168.10.50 git.myhost.lan
        fallthrough
    }

    forward . 192.168.10.1
}

And for VLAN 20:

myhost.lan:53 {
    view vlan20 {
        expr incidr(client_ip(), '192.168.20.0/24')
    }

    hosts {
        192.168.20.50 git.myhost.lan
        fallthrough
    }

    forward . 192.168.10.1
}

This was the right concept.

But getting the actual CoreDNS control flow right required several iterations.


Important lesson 1: CoreDNS must see the real client IP

For VLAN-aware DNS to work, CoreDNS needs to see the original client IP.

That sounds obvious, but it has an important implication.

If clients query pfSense, and pfSense forwards to CoreDNS, then CoreDNS may only see pfSense as the source:

Client → pfSense/Unbound → CoreDNS

From CoreDNS’ perspective, the client is then not 192.168.20.123, but the firewall.

That breaks view-based matching.

So for this setup, clients that need VLAN-aware DNS should query CoreDNS directly, usually via DHCP DNS server settings per VLAN.

Alternatively, the split-horizon logic would need to live in Unbound/pfSense itself.

For this setup, the clean path was:

Clients → CoreDNS → pfSense/Unbound → upstream

CoreDNS becomes the first resolver for the LAN clients. It handles local logic, then forwards everything else to pfSense.


Important lesson 2: binding CoreDNS correctly

The Debian server already had other DNS-related services, including dnsmasq on a Docker/libvirt-style subnet.

The initial CoreDNS attempt failed with:

listen tcp :53: bind: address already in use

At first, it looked like CoreDNS was ignoring explicit IPs. The real issue was a CoreDNS grammar detail.

This does not mean “bind to 127.0.0.1” in the intuitive way:

127.0.0.1:53 {
    ...
}

The correct way to restrict listener interfaces is to use the bind plugin inside the server block:

.:53 {
    bind 192.168.10.53 192.168.20.53 192.168.30.53 192.168.40.53

    errors
    cache 300
    forward . 192.168.10.1
}

And crucially: every relevant server block must contain the same bind restriction.

Otherwise CoreDNS may still try to listen broadly on port 53.

The resulting pattern:

myhost.lan:53 {
    bind 192.168.10.53 192.168.20.53 192.168.30.53 192.168.40.53

    ...
}

This avoids conflicts with dnsmasq or other services bound to different local interfaces.


Important lesson 3: CoreDNS does not cascade between server blocks

This was the most important logic trap.

The first structure looked roughly like this:

VLAN-specific block for myhost.lan
Generic/global block for myhost.lan

The expectation was:

1. Try VLAN-specific hosts
2. If not found, fall through to global hosts
3. If not found, forward upstream

But CoreDNS does not work like that.

Once a request lands in a matching server block/view, it runs that plugin chain. It does not then continue into a later generic server block.

So this failed:

myhost.lan:53 {
    view vlan10 {
        expr incidr(client_ip(), '192.168.10.0/24')
    }

    hosts {
        192.168.10.50 git.myhost.lan
        fallthrough
    }

    forward . 192.168.10.1
}

myhost.lan:53 {
    hosts {
        192.168.10.20 prometheus.myhost.lan
        192.168.10.21 grafana.myhost.lan
        fallthrough
    }

    forward . 192.168.10.1
}

A VLAN 10 client asking for prometheus.myhost.lan would enter the VLAN 10 block, not find the record, then forward upstream. It would not check the later global hosts block.

That explained why the “global overlay” seemed to be ignored.


Important lesson 4: hosts can only be used once per zone block

The next idea was to place multiple hosts sections inside one block:

myhost.lan:53 {
    view vlan10 { ... }

    hosts {
        192.168.10.50 git.myhost.lan
        fallthrough
    }

    hosts {
        192.168.10.20 prometheus.myhost.lan
        fallthrough
    }

    forward . 192.168.10.1
}

But that also failed, because the hosts plugin cannot simply be repeated like that in the same zone block.

The eventual solution was much simpler and more robust:

Materialize the global zone records into every matching VLAN-specific block.

In other words, each VLAN-specific block contains:

VLAN-specific records
+ global records for the same zone
+ fallthrough
+ forward

That keeps the plugin chain simple and deterministic.


The working CoreDNS pattern

A simplified final CoreDNS structure looks like this:

.:53 {
    bind 192.168.10.53 192.168.20.53 192.168.30.53 192.168.40.53

    errors
    cache 300
    prometheus 192.168.10.53:9153

    forward . 192.168.10.1
}

myhost.lan:53 {
    bind 192.168.10.53 192.168.20.53 192.168.30.53 192.168.40.53

    view vlan10 {
        expr incidr(client_ip(), '192.168.10.0/24')
    }

    errors
    cache 300

    hosts {
        192.168.10.50 git.myhost.lan

        # Global records injected into the VLAN view
        192.168.10.20 prometheus.myhost.lan
        192.168.10.21 grafana.myhost.lan
        192.168.10.22 alertmanager.myhost.lan

        fallthrough
    }

    forward . 192.168.10.1
}

myhost.lan:53 {
    bind 192.168.10.53 192.168.20.53 192.168.30.53 192.168.40.53

    view vlan20 {
        expr incidr(client_ip(), '192.168.20.0/24')
    }

    errors
    cache 300

    hosts {
        192.168.20.50 git.myhost.lan

        # Same global records
        192.168.10.20 prometheus.myhost.lan
        192.168.10.21 grafana.myhost.lan
        192.168.10.22 alertmanager.myhost.lan

        fallthrough
    }

    forward . 192.168.10.1
}

Now the behavior is exactly what we want:

VLAN 10 client:
git.myhost.lan        → 192.168.10.50
prometheus.myhost.lan → 192.168.10.20
google.com            → forwarded to pfSense

VLAN 20 client:
git.myhost.lan        → 192.168.20.50
prometheus.myhost.lan → 192.168.10.20
google.com            → forwarded to pfSense

The service-specific topology is handled locally, while general records remain shared.


Making it idempotent with Ansible

The point of this was not just to make DNS work once. The point was to make it reproducible.

The role uses variables like this:

coredns_upstreams:
  - "192.168.10.1"

coredns_bind_ips:
  - "192.168.10.53"
  - "192.168.20.53"
  - "192.168.30.53"
  - "192.168.40.53"

coredns_prometheus_enabled: true
coredns_prometheus_listen: "192.168.10.53:9153"

coredns_global_zones:
  - name: "myhost.lan"
    records:
      - "192.168.10.20 prometheus.myhost.lan"
      - "192.168.10.21 grafana.myhost.lan"
      - "192.168.10.22 alertmanager.myhost.lan"

  - name: "example.org"
    records:
      - "192.168.10.30 internal-api.example.org"
      - "192.168.10.31 dev-only.example.org"

coredns_views:
  - name: "vlan10"
    client_cidrs:
      - "192.168.10.0/24"
    zones:
      - name: "myhost.lan"
        records:
          - "192.168.10.50 git.myhost.lan"

  - name: "vlan20"
    client_cidrs:
      - "192.168.20.0/24"
    zones:
      - name: "myhost.lan"
        records:
          - "192.168.20.50 git.myhost.lan"

The key part in the Jinja template is where the global records are injected into matching view-specific zones:

hosts {
{% for record in zone.records %}
    {{ record }}
{% endfor %}

{# Add global zone entries if they match the same domain #}
{% for gzone in coredns_global_zones %}
{% if gzone.name == zone.name %}
    # Global records for {{ gzone.name }}
{% for grecord in gzone.records %}
    {{ grecord }}
{% endfor %}
{% endif %}
{% endfor %}

    fallthrough
}

That gives every VLAN view a complete local view of the zone.

It is not the most theoretically elegant model, but it is explicit, robust, and easy to reason about — which is usually what I want in infrastructure.


Docker: the loopback trap

One extra issue appeared after switching DNS on the Debian host itself.

The host used a new /etc/resolv.conf, including:

nameserver 127.0.0.1
nameserver 192.168.10.1

That worked for the host.

But Docker containers have their own network namespace. Inside a container:

127.0.0.1

does not mean the Debian host. It means the container itself.

So containers could not use the host’s loopback DNS.

The fix was to configure Docker explicitly with a DNS server reachable from containers:

{
  "dns": [
    "192.168.10.53",
    "192.168.10.1"
  ]
}

in:

/etc/docker/daemon.json

Then:

sudo systemctl restart docker

And test:

docker run --rm alpine nslookup git.myhost.lan
docker run --rm alpine nslookup google.com

That restored resolution inside containers as well.


Testing before switching production systems

This is the part that matters in real infrastructure work: never switch everything blindly.

The safe test process was:

dig @192.168.10.53 git.myhost.lan +short
dig @192.168.10.53 prometheus.myhost.lan +short
dig @192.168.10.53 google.com +short

From clients in different VLANs:

dig @192.168.10.53 git.myhost.lan +short
dig @192.168.20.53 git.myhost.lan +short
dig @192.168.30.53 git.myhost.lan +short
dig @192.168.40.53 git.myhost.lan +short

Expected:

VLAN 10 → 192.168.10.50
VLAN 20 → 192.168.20.50
VLAN 30 → 192.168.30.50
VLAN 40 → 192.168.40.50

CoreDNS logs were useful during rollout:

log
errors

And system logs:

journalctl -u coredns -f

The critical thing to verify was that CoreDNS saw real client IPs, not only the firewall.


Observability: Prometheus metrics

CoreDNS can expose Prometheus metrics with one line:

prometheus 192.168.10.53:9153

Then Prometheus can scrape:

scrape_configs:
  - job_name: "coredns"
    static_configs:
      - targets:
          - "192.168.10.53:9153"

Useful metrics include:

coredns_dns_requests_total
coredns_dns_responses_total
coredns_cache_hits_total
coredns_cache_misses_total
coredns_dns_request_duration_seconds

This is not just nice to have. DNS is foundational enough that visibility matters.

High NXDOMAIN rates, slow upstream responses, broken clients, excessive lookups — all of that becomes visible.


The final architecture

The resulting setup looks like this:

LAN clients
   ↓
CoreDNS on central Debian server
   ↓
local records / VLAN-aware views
   ↓
pfSense Unbound
   ↓
public upstream DNS

For regular internal services:

prometheus.myhost.lan → shared local IP
grafana.myhost.lan    → shared local IP

For topology-aware services:

git.myhost.lan → VLAN-local IP depending on client subnet

For everything else:

external domains → pfSense/Unbound → upstream DNS

And all of it is deployed through Ansible.

That means:

  • no manual pfSense host override editing
  • no hidden state
  • no one-off UI changes
  • no topology-leaking hostnames like git1, git2, git3
  • predictable local DNS behavior
  • observable DNS metrics
  • repeatable deployment

This is exactly the kind of infrastructure improvement that does not look glamorous, but immediately makes a network feel more coherent.


Why this matters beyond DNS

This is a local LAN topic, but the underlying pattern is much larger.

A lot of IT problems are not caused by missing tools. They are caused by fuzzy boundaries between responsibilities.

In this case:

Firewall UI
+ DNS resolver
+ local authority
+ topology routing
+ manual records

had all been collapsed into one place.

The solution was not “use a cooler DNS server.”

The solution was to separate concerns:

pfSense = firewall and resolver/cache
CoreDNS = local authority and DNS logic
Ansible = desired state
Prometheus = visibility

That is the difference between something that merely works and something that can be operated.


A small note from the consulting side

This is also the kind of work I enjoy doing professionally: taking a system that has grown organically, finding the hidden structural friction, and turning it into something cleaner, reproducible, and easier to operate.

At Neoground, we usually work on larger digital systems — AI consulting, web platforms, automation, SaaS architectures, infrastructure, and strategy — but the same principle applies everywhere:

Good technology work is not just implementation. It is structural clarification.

Sometimes that means designing a digital product. Sometimes it means untangling an internal workflow. Sometimes it means fixing DNS in a segmented LAN so the whole infrastructure finally behaves like it should.

If you have a system that works, but feels increasingly manual, fragile, or hard to reason about, that is usually a sign that it has outgrown its original control plane.

That is exactly where advisory work becomes valuable.


Final result

After the migration, DNS is now:

  • fast
  • local
  • VLAN-aware
  • reproducible
  • observable
  • no longer trapped in the pfSense UI

The satisfying part is not only that git.myhost.lan now resolves correctly from every VLAN.

The satisfying part is that the system now matches the mental model.

And in infrastructure, that is usually when things become calm again.

This blog post was written by me with the assistance of AI (GPT 5.5 Thinking) based on my long troubleshooting sessions, notes, and numerous tries.

About Sarah Robin

Sarah Robin is a founder, strategist, technologist, and writer based in Germany. She works at the intersection of AI/IT advisory, software architecture, media, public thought, and systems thinking. Through Neoground and her independent work, she helps people and organizations turn complexity into structure.

No Comments Yet

Add a comment

You can use **Markdown** in your comment. Your email won't be published. Find out more about my data protection in the privacy policy.