The system stack is often called "lighter" than a userspace stack like gVisor or lwIP. That deserves scrutiny — you still read packets, NAT them, and open a socket per flow. So where's the win? Follow one packet.
"You NAT and write back to the TUN to a localhost service, open a socket per connection, and hit userspace for every read and write. You're still parsing TCP in userspace — does this actually save anything?"
— the fair question this page answersA TCP connection through the system stack hands off to the kernel and back. The userspace touches headers; the kernel runs the connection.
listener.Accept()Accept() returns a real net.Conn. Its source port is the synthetic key; LookupBack recovers the true original destination.Condensed from stack_system.go — the listener, the NAT lookup, and the handoff to your handler:
func (s *System) acceptLoop(listener net.Listener) { for { conn, err := listener.Accept() // kernel-terminated TCP if err != nil { return } connPort := M.SocksaddrFromNet(conn.RemoteAddr()).Port session := s.tcpNat.LookupBack(connPort) // recover the original 5-tuple // hand the real destination + ready socket to your code: go s.handler.NewConnectionEx(s.ctx, conn, M.SocksaddrFromNetIP(session.Source), session.Destination, nil) } } // stack_system_nat.go — the NAT table is just two maps: type TCPNat struct { addrMap map[netip.AddrPort]uint16 // 5-tuple → synthetic port portMap map[uint16]*TCPSession // synthetic port → original }
The honest comparison. The system stack doesn't do less work — it moves the expensive work out of your process and into the kernel.
| Concern | system | gvisor / lwIP |
|---|---|---|
| TUN read / write | userspace | userspace |
| IP/TCP header NAT | userspace (cheap) | — |
| TCP state machine handshake · retransmit · reassembly · cwnd · buffers · timers | kernel | userspace, per-conn |
| Sockets per connection | ~2 (accepted + outbound) | ~1 (outbound) |
| Packet ↔ kernel crossings | more (write-back + re-ingest) | fewer |
| Per-connection memory | kernel | your process heap |
So the critique is half right: you do still parse packets and pay TUN I/O in userspace. But the system stack never runs the TCP state machine in userspace — it only rewrites headers. gVisor/lwIP run the whole stack in-process, per connection. That's the asymmetry the "lighter" claim is really about.
The write-back trick depends on the kernel routing that re-injected packet to sing-tun's local listener. Two environments break that assumption.
includeAllNetworksUnder Apple's full-tunnel mode, the extension's own sockets — including the local listener — get pulled into the tunnel, so the re-injected packet loops instead of being delivered. sing-tun refuses system and mixed in this mode (sing-tun#25) and forces gVisor. With includeAllNetworks off, the system stack runs on iOS exactly like macOS — it isn't an iOS limitation, it's an includeAllNetworks one.
Older Android kernels don't reliably support the NAT/redirect path the system stack needs. sing-box-derived clients read the kernel version at startup and fall back to gVisor below 5.10.
The verdict. The system stack doesn't save resources by doing less — you still pay TUN I/O, header NAT, and an extra socket per flow. It wins by doing different work: the costly part of TCP runs in the mature kernel stack, not your process. You trade file descriptors and packet-boundary crossings for kernel-grade TCP and a smaller heap. A throughput win on a server; a footprint win in a memory-capped mobile extension — which, ironically, is often exactly where includeAllNetworks won't let you use it.