NetworkManager or networkd

Posted Sep 13, 2024 22:42 UTC (Fri) by NYKevin (subscriber, #129325)
In reply to: NetworkManager or networkd by mathstuf
Parent article: Debating ifupdown replacements for Debian trixie

> FWIW, I dropped NetworkManager years ago for `wpa_supplicant`-based management because I had flaky wireless situations (thick concrete walls in the dorms, roaming across campus, etc.) and any whiff of packet loss would announce to the whole machine "no network" and apps would start to freak out and react. However, it was likely to be back Real Soon™ and normal TCP recovery would make it "transparent" (if with a spike in latency).

Somebody ought to write one of those "falsehoods programmers believe" articles for TCP, because this is just reflective of a broader trend of software that thinks it knows better than TCP, and usually does not. Here, I'll even get the ball rolling (remember, all of the following statements are *false* at least some of the time, but for some of these, perhaps not very often):

1. TCP is reliable, so everything I send will be received by the other end.
2. OK, mostly reliable.
3. OK, fine, it's not reliable (in the above sense of the word), but the sender and recipient will always eventually agree on exactly which bytes made it over the transport.
4. It is possible to create a guarantee analogous to (3) by building some message-oriented application-level protocol on top of TCP, such as HTTP or SMTP.
5. There is a such thing as a TCP packet.
6. There is no such thing as a TCP packet.
7. If we fail to connect to a well-known remote host, then we must be offline.
8. Nagle's algorithm is good.
9. Nagle's algorithm is bad.
10. I don't have to care about Nagle's algorithm.
11. This is all low-level pedantry. I can think of TCP like a two-way Unix pipe that goes over the network, and completely ignore how it is implemented.
12. If the network is transparent to TCP, then it must be transparent to IP.
13. If the network is transparent to HTTP/1.1, then it must be transparent to TCP.
14. Weird networks that are not transparent to standard protocols are an aberration. I can safely ignore them.
15. TCP is implemented in terms of IP.

Explainer for 1-4: https://en.wikipedia.org/wiki/Two_Generals%27_Problem. TL;DR: If the connection breaks while an ACK is outstanding, the sender will have no way of knowing whether the segment was received, and this turns out to be an insoluble problem no matter how much complexity you pile on top of it. You need something resembling Paxos or Raft to get a guarantee like that, and that always requires a minimum of three nodes, so it can't be built on top of a single two-party TCP stream. See RFC 1047 for an SMTP-specific discussion of this problem (which still applies to modern SMTP, since RFC 2821 says that implementations MUST follow 1047's core advice), but note that some variation of this problem applies to literally every two-party TCP service (and for that matter, every UDP or IP service as well), regardless of how it works or what abstractions it introduces. SMTP is only special in that both sides are explicitly required to care about whether the message was received or not, which is marginally unusual for TCP services (compare and contrast: FTP file uploads, HTTP POST and PUT, etc., most of which omit significant discussion of client retry logic in favor of leaving it up to the application or end user).

15 is left as an exercise for the reader (hint: it is primarily of historical interest, but I'm not sure it's possible to entirely rule out modern counterexamples, since we don't know what weird stuff is going on in [any large organization]'s private network).

NetworkManager or networkd

Posted Sep 14, 2024 12:24 UTC (Sat) by paulj (subscriber, #341) [Link] (4 responses)

Nice post.

One thing, unless I've misunderstood, I think you're being a bit /too/ strict and pessimistic about the consensus that is possible for 2 parties to reach over an unreliable (but non-malicious) link. It certainly is possible for 2 parties to reach a shared consensus that some /subset/ of the bytes have been received. I.e., I think your wording of 3 is too strict. There is a set of sent bytes (>= 0) where both sides can be sure that /both know/ those bytes were received, and a set of bytes following that set where they can not know. Your claim applies to the second set, but not the first set (with the caveat the first set may be 0 and remain at 0, but the size of the first set can only increase).

I.e, there /is/ a hard consensus set, and an ambiguous set. Your 3 is more that it is impossible to guarantee the consensus set will > 0, and that the ambiguous set will eventually reach 0. However, both sides can know what the consensus set is (including if it is 0).

It's a subtle difference, but you don't need Paxos for 2 parties to come to an agreement on a set of bytes having been transferred. You can't guarantee they reach a consensus on the success or failure of an attempted transfer (distinct from the ambiguity), but then I'm not sure Paxos/Raft can guarantee it either, at least in the sense of establishing that knowledge in the 2 parties (e.g., the sender may lose messages from the Paxos/Raft consensus protocol too).

NetworkManager or networkd

Posted Sep 15, 2024 18:37 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (3 responses)

My point is not that there is no set of bytes the parties agree on. My point is that it is not possible for either party to know exactly which bytes are in the consensus set.

NetworkManager or networkd

Posted Sep 15, 2024 18:37 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (2 responses)

(To clarify: It is possible for a party to know the consensus set contains *at least* the first N bytes. It is not possible for either party to know that the consensus set contains *exactly* the first N bytes.)

NetworkManager or networkd

Posted Sep 16, 2024 9:44 UTC (Mon) by paulj (subscriber, #341) [Link] (1 responses)

Right. This is the clarification I was trying to establish.

That two speakers /can/ establish consensus on the set of "definitely received and we both know it" bytes is quite useful and makes a lot of things possible. There is a set of bytes there-after in a grey zone, where the sender can't tell if they've been received, and the recipient can't know if the sender knows they've been received is an issue, but it's not catastrophic. That the first set can be built and (as the network permits) extended allows for useful and solid primitives to be created on top.

It's not quite as pessimistic as the point in your post above could have been read, perhaps. Distributed systems are still difficult, and it's very easy and very common for authors of them to fail to pay attention to the distinction above (leading to weird fail...), course. ;)

NetworkManager or networkd

Posted Sep 21, 2024 22:19 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

> That two speakers /can/ establish consensus on the set of "definitely received and we both know it" bytes is quite useful and makes a lot of things possible.

This is true, but I think it is also misleading. Let's use SMTP as an example:

1. You can get to a point where both sides agree that the email has been transmitted from the client to the server.
2. You can also get to a point where both sides agree that the server has assumed responsibility for delivering the email. This is the primary state transition that SMTP is designed to accomplish, so it is important that we can get to this point.
3. You cannot get (2) without first passing through a state where it is ambiguous who is currently responsible for delivering the email (and it is mathematically provable that no extension or modification of SMTP can fix this shortcoming - it is inherent in TCP, and for that matter in IP). If the connection breaks while we are in this state, we either duplicate the email or drop it. SMTP as written specifies that the email is duplicated, but I do not know the extent to which this has been tested on real implementations.

The purpose of Paxos and Raft is not to accomplish (2). It is to prevent (3).

NetworkManager or networkd

Posted Sep 14, 2024 13:59 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

I think 14 has bitten lots of people lots of times. Usually down to router manufacturers believing a similar set of falsehoods?

Buffer bloat, of course, being a perfect example of routers breaking the congestion mechanism.

Cheers,
Wol

NetworkManager or networkd

Posted Sep 14, 2024 21:06 UTC (Sat) by josh (subscriber, #17465) [Link]

Or networks that block ICMP, or networks that drop anything they don't understand...

NetworkManager or networkd

Posted Sep 14, 2024 21:11 UTC (Sat) by Sesse (subscriber, #53779) [Link]

16. I don't need to know anything about congestion control (a sub-category of this one is “If I don't get the speed I want, I should open multiple TCP connections”)