Fred Goldstein k1io goldstein@carafe.enet.dec.com Version 2.1 2-oct-1989 The Radio Shortest Path First (RSPF) Routing Protocol For DDN Internet Protocol over Amateur Packet Radio ** DRAFT ARCHITECTURE -- FOR COMMENT ** ** changes in V2.1 are noted by "**" and should be edited out before final release** CONTENTS I. Introduction and Version Notes II. Acquisition of router-router adjacencies III. Acquisition of end node adjacencies IV. Link state propagation V. The Shortest Path First Spanning Tree Appendix: Router Parameters I. Introduction Amateur packet radio use of the Internet Protocol does not yet provide all of the capabilities of other IP networks. One particular example of this is IP packet routing. Most existing versions of the AMPR IP code make use of a static routing table. This requires human intervention every time a new backbone path is added, and adds geographic constraints to address assignment which do not exist on the ARPA Internet. Some implementations make use of automatic routing protocols (interior gateway protocols, or IGPs) using distance vector routing. These IGPs were originally written for wireline networks and tend to scale poorly to the amateur packet radio environment. Many IP and other networks have implemented link state routing based upon Dijkstra's "SPF" (shortest path first) spanning tree algorithm. A wireline implementation of SPF for IP is being standardized as the Open SPF Interior Gateway Protocol (OSPF), and an SPF procedure is being considered by ISO as the standard "IS-IS" routing protocol for OSI connectionless networks. A similar (and derivative) procedure can be applied to AMPRnet (Net 44). It is called Radio Shortest Packet First (RSPF); this document outlines the RSPF protocol. RSPF occupies the role traditionally referred to in TCP/IP networks as an "Interior Gateway Protocol" (IGP), where "Gateway" means "router". It makes use of the services of the Internet Protocol. It is not inconceivable that a router could use both RSPF and another IGP, or communicate with another network using the Exterior Gateway Protocol (EGP). However these are not described in this document. RSPF is intended to be implemented on routers, and need not be implemented on end nodes for the end nodes to take advantage of routing services. Any IP station may be an end node giving no further consideration to routing. I.1. Elements of RSPF The RSPF protocol is designed for use by internet-layer routing nodes (IP Gateways) in a packet radio network using the DDN Internet Protocol. It is comprised of four major functions: 1) Acquisition of router-router adjacencies 2) Acquisition of end node adjacencies 3) Link state propogation 4) Spanning tree route decision making. Its net result is the automatic maintenance of a least-cost routing table for use by IP Routing. RSPF is optimized for the geographically heirarchical addressing used in AMPRnet, but does not depend upon it. RSPF is simpler than OSPF and IS-IS, as it is designed for PC implementation within the Amateur Radio Service. It also adds procedures to take advantage of packet radio's "semi-broadcast" nature. I.2. Addressing convention When an RSPF router communicates with an end node, it will typically deal with a 32 bit IP address. Routers themselves, however, also support node group addressing (fewer than 32 bits need match). This follows the convention in the KA9Q IP routing program, which permits a crude form of heirarchical addressing as well as allowing portable operations to override the defaults. RSPF looks for the match (node or node group) with the greatest number of matching bits. Only if the number of bits matched is equal, then the lower cost path will be used. Thus a match to a full node address will override a "cheaper" path that matches its "class C net" of 24 bits, which overrides a "class B net", etc., noting that AMPRnet does not follow strict 8-bit address classification, and is a single Class A net. In every case, a greater number of bits matched is considered a superior path to a destination than one that matches fewer bits, regardless of the value of the routing metric ("cost"). I.3. Connection-oriented vs. connectionless lower layers IP is a datagram network protocol, and supports both connection- oriented and connectionless lower layers (subnets). In a network with a high packet loss rate, the local retransmission provided by a connection-oriented datalink will substantially improve overall throughput. However, in a high-speed dedicated backbone, particularly one implemented using full-duplex radio or wireline links, connectionless links may provide better overall performance. The choice of which to use is a local matter; RSPF will work with both. The reliability of the routing information, however, may be somewhat greater with connection-oriented datalink procedures, since these will give more rapid indication of a physical link failure. I.4. Relationship to other protocols The reliability of the network depends upon reasonably reliable transmission of the routing update; hence, for non-broadcast procedures, connected-mode AX.25, or another reliable data link layer. (In any case connected-AX.25 may be more useful than connectionless for backbone links due to its local error correction ability.) **CHECK THIS OUT FOR VALIDITY WITH ANDERS** All packets specific to RSPF shall be exchanged inside IP packets using a protocol identifier which, pending formal assignment of one, shall be 73. (How is this formally assigned?) Such IP packets shall be sent with a time to live (TTL) value of 1. If broadcast procedures are used, connectionless AX.25 UI frames shall be sent, with a multicast address "QST-0" in the AX.25 address and an IP address of 44.255.255.255. (A router can also "read the mail" on passing radio packets not addressed to it; such procedures are for further study.) Note that in this document, "subnetwork" and "data link" are synonymous, and refer to the layer over which IP packets are exchanged. I.5. Version 2.1 changes. RSPF draft 2.0 was released in June, 1988, as the first nearly- implementable version. It was first implemented in September, 1989 by Anders Klemets. This version 2.1 reflects changes whose need was discovered during this implementation. These changes are both editorial and, in a few cases, substantive. The format of the Routing Update packet has been slightly modified. In order to prevent fragments of two or more different routing update messages from being erroneously merged, an Envelope ID is added to each such packet, with the same Envelope ID on all fragments of a multi-packet message. The term for such a message is now "envelope"; it contains one or more "bulletins", each of which originated from a single router. There are no longer separate packet types for Full Routing Update and Partial Routing Update. Instead, they are distinguished by the value of the subsequence number, which is always 0 for Full Routing Updates and is never 0 for Partial Routing Updates. A given envelope may contain both types of bulletin. Cost is now set on the basis of receiver instead of transmitter. This permits the automatic link quality adjustment to operate on the basis of locally-received traffic. The remaining horizon is stored in the links table. This is needed for consistency within the specification and was erroneously left out of 2.0. It is now explicitly stated that upon creation of a new router-router adjacency, the routers exchage full routing information. This allows routers to initialize themselves with a reasonably complete map of the network. II. Acquisition of router-router adjacencies For RSPF to operate correctly, all routers must remain reasonably current as to the state of the network at large. This is handled by the propagation of "bulletin" messages among routers. End nodes need not concern themselves with this; they will normally communicate through one "designated" router at any given time, for all (non-adjacent) destinations (not seen by ARP or other lower-layer procedures). End nodes can also, of course, connect to each other directly, bypassing RSPF. Each router maintains an adjacencies table. Each router's adjacency table lists the following information for all other nodes, both routers and end nodes, from which it directly receives packets over a subnetwork: Adjacent node IP address (i.e., 44.56.0.44) Adjacent node datalink (AX.25) address (i.e., K1IO-0) Datalink used (i.e., AX0) Datalink cost (i.e., 4) Number of packets heard since last RRH update (i.e., 50) Packet sequence number in last RRH update (i.e., 12593) Time of last RRH update (i.e., 2130). II.1. Router-router hello For the backbone to create its topology automatically, there must be a way for routers to discover each other with minimal overhead. For this purpose, a router-router hello (RRH) message is provided. Periodically (as an initial suggestion, shortly before beginning to propogate the periodic links state bulletin to known adjacencies), the router sends out the RRH message to the layer 2 multicast address and IP multicast address. RSPF makes no assumption of reciprocity (that links are bidirectional), so receipt of an RRH packet provides the receiver with information about a one-way (received) adjacency. II.2. Connection-oriented procedure If a router uses connection-oriented subnet procedures to its own adjacencies, then when a router receives this RRH packet, it checks to see if it already has a link to that packet's originator in its own links table. If not, it waits a random period of time (initial suggestion: within the range of 0 to 10 times the link's value of T1, DWAIT or SLOTTIME, and in any case much longer than the timers used within a CSMA or Aloha subnet such as AX.25) and then attempts to establish an AX.25 connection with the usual SABM; the router responds with the usual UA (link established) or DM (link rejected). If a two-way connection has been established, then both routers add the link to their adjacency tables. While the existence of this route is set reciprocally, the cost of each side of the route is set separately for each side of the connection. Cost is propagated in the routing update (link state) packet. Thus the adjacency between two routers is not actually used for real traffic until the first routing update packet exchange has taken place. Loss of an adjacency may then be noted by the loss of the subnet connection. When this occurs, the router removes the adjacency from its adjacency table and follows the "bad news" procedures outlined below for link state propagation. II.3. Connectionless procedure If a router uses connectionless datalink procedures to its own adjacencies (suitable to low-loss links), then when a router receives an RRH packet, it checks to see if this adjacency is already in its adjacency table. If not, then it is added. It also sends RRH packets with the same frequency as with connection-oriented subnets. Loss of an adjacency may be noted by timeout; if no RRH message is received, and no frames have been received from the adjacent router for a period of time (initial suggestion: slightly over twice the maximum interval between RRH messages), then the adjacency becomes suspect. The router should attempt (**a settable number of times**) to exchange a packet (ICMP echo) with the suspect adjacency; if unsuccessful (after the usual number of retries), the route is marked lost. It may also be marked lost if other attempts to send data through that router fail, such that the implementation determines that there is a high probability that the link is lost. Table II-1. Coding of the RRH PDU. 1 2 |0 |8 |6 |4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RSPF Version #| Type (RRH) | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Full IP Address of sending router | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | last packet sent seq. # | flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | plaintext |... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Parameters-- An RSPF Router-Router Hello packet is sent within IP with a type of . Each RRH packet contains the following fields: RSPF Version Number: Version number of this protocol (initially 1). Type: Value of 3 for RRH. Checksum: IP-style checksum Address: Full IP address of sending router Last packet sent sequence number: An integer (mod 65535) incremented by 1 for every frame sent by the datalink layer across this interface. This value allows receiving entities using connectionless procedures to use the automatic link quality measurement technique described in II.4. Flags: The low-order bit is 1 if connectionless datalink is preferred; 0 if connection-oriented is preferred. (Set by system management based upon anticipated link quality.) Other bits are reserved (sent 0). Plaintext: An optional text message (length may be up to maximum size remaining in datalink PDU). This might serve, for example, to "broadcast" the router's existence to persons who might be "reading the mail" (monitoring a radio channel promiscuously). II.4. Automatic link quality measurement A connectionless link or subnetwork may have very reliable, or very sporadic, performance. Since there is no procedure for ensuring the reliability of packets sent over a connectionless link, a high rate of packet loss may occur without being detected by the routers. If this loss is high enough, another route may become a better choice; a high enough packet loss rate may be enough to mark a link as "down". The automatic link quality measurement procedure allows links which are not yet "down", but whose performance is substandard, to be noted. Every router shall maintain, for each link, a count of all packets sent over each link. Every time an RRH message is sent, it includes the current value of this counter (modulo 65536). Every router also maintains, in its adjacency table, a count of the total number of packets received from said adjacency since the last RRH message, and the value of that counter as received in the last RRH message. Upon receipt of an RRH message, the recipient router compares the value of the received packet counter with the last received value in the adjacency table. The difference (taking into account wrap-around at the modulus) is compared with the number of packets received since the last RRH message. (This works even if an RRH message is lost.) This packet loss ratio is then maintained as a guideline to determine link quality. If link quality falls below a settable threshhold, the link is suspect. Timestamp can also be used to compute packet arrival rate. Connection-oriented data links presumably deliver 100% of attempted packets. A high-quality connectionless link, such as Ethernet/LLC1, will come close to this. However, single-frequency packet radio links are prone to packet loss for several reasons, including hidden transmitters, lack of collision detection, and rf interference. The packet loss ratio is useful in setting link cost, and may also be used to determine whether a link should use connectionless or connection-oriented procedures. If a router reports, in its link update packets, that a given link has a cost of _n_, then its adjacencies (and only its adjacencies) may apply the packet loss ratio to adjust the cost which they maintain in their link state tables. These adjusted costs, rather than the received costs, may then be propagated to other routers. Such procedures should be applied sparingly, as each change must be propagated and could, if used too frequently, result in network-wide instability. A suggested (experimental) algorithm is as follows: A percentage threshhold x is set, as is a cost-adjustment factor y. If fewer than x% of packets are received during a measurement interval, the cost of that adjacency is multipled by y. For example, if x is 80 and y is 1.5, then an adjacency with a nominal assigned cost of 4 will be up-costed to 6 if only 70% of packets are received, but will be restored to 4 if 80% or more are received during the next measurement interval. The measurement interval is the time between RRH messages, which precede routing update messages. III. Acquisition of end node adjacencies Three possible means of determining adjacencies to end nodes are the use of connected-mode AX.25 links, the use of ARP, and the use of a "wiretap" algorithm (see RFC981). Unless a connection mode Data Link layer (with keepalive timers) is used, adjacent nodes may need to send each other messages at regular intervals (ping) to ensure that the link is still usable. A procedure is outlined below for routers and end nodes to acquire knowledge of each other. It is assumed that RSPF will not be activated in end nodes; this permits them to use simple version of the IP software. A node that has RSPF support in its software but operates as an end node can also use the router-router connection procedures and simply broadcast its adjacency to the router in a one-entry bulletin with a horizon of one. Such a node may also be simultaneously homed on two or more other routers, unlike true end nodes whose traffic either bypasses RSPF (using ARP) or arrives by way of its associated router. There is no "redirect" function provided in RSPF. Since radio does not provide a true "broadcast" topology subnetwork, a router cannot presume that if both end nodes can hear it, that both end nodes can hear each other. If an end node knows the IP address of the router which will connect it to the network at large, it may establish a connected-mode AX.25 (or other subnet) connection to the router; the presence of this connection indicates that the node is reachable from that router, which then adds it to its links table and subsequent bulletins. This may, of course, require an ARP exchange in order to acquire the AX.25 (or other data link layer or subnet) address. Alternately, the end node can simply use ARP and use connectionless link procedures. In this case the router should assume that the end node is available until either a rather lengthy timer expires, or the router is unable to make an ARP contact after the ARP timer expires. (A loss of reachability should not be inferred from the ARP timeout.) Routers should periodically broadcast their availability (suggested interval: every 15 minutes) with an AX.25 UI frame sent to QST-0 (the AX.25 broadcast address). A human-readable "unproto" message may go here, allowing individual operators to recognize routers and connect as appropriate. (No specific PDU coding is provided, as the end nodes do not use RSPF, and thus this is not really an RSPF packet.) A router may also choose to use "Promiscuous ARP" to provide service to an end node which is attempting to connect with an IP address reachable by the router. In such a case the router should wait an extra interval after receiving the ARP request because the desired destination may actually be directly reachable; ARP procedures may need to be modified to provide this. Another potential approach is for routers to simply listen to AX.25 traffic on the link and determine who is adjacent to whom. This is the gist of the "wiretap" algorithm in RFC981, which also finds non-adjacent nodes by taking advantage of the source routing found in AX.25 frames. Integration of wiretap into RSPF is for further study. IV. Link state propagation Link state information is propagated between routers within bulletin envelopes, which are sequences of packets containing partial or full copies of the sending node's link state table. Both point-to-point and broadcast procedures are provided. IV.1. Optional multicast/broadcast Packet radio is inherently a broadcast medium. Packet radio networks, however, may be viewed as a collection of individual links which happen to use a broadcast physical medium. It is also possible to exploit the broadcast nature of the medium. RSPF link state propagation procedures allow but do not require such multicasting. If the link uses connectionless procedures for user data packet exchange, then broadcast procedures should be used for link state packet exchange. The converse may not necessarily be true: The threshhold of loss that militates against connectionless transmission of user data may be more stringent than that which call for non-broadcast transmission of link state packets. (Optimal parameters are for further study.) IV.2. Routing update bulletins Routing updates are passed along from router to router via routing update bulletins, which are broadcast within routing update envelopes. Bulletin propagation is designed to make it extremely likely that every node within a given "horizon" receives every routing update message sent out by a given node. Every router originates information about changes in its own adjacencies, as well as periodic retranmissions of its entire list of adjacencies. These bulletins are then propagated among other routers. The router whose adjacency information is being broadcast is called the _reporting router_; this may be some hops away from the routers which forward it to their own adjacencies. Each reporting router's bulletins (adjacency updates) contain a sequence number (and in some cases, a subsequence number). These sequence and subsequence numbers are generated by the reporting router and propagated; they are not changed when a bulletin is relayed. They are incrememted by 1 (modulo 65536) every time a new one is generated. Bulletins may also carry change information incremental to previous bulletins. These carry the same sequence number as the full routing update bulletin to which they are reporting incremental changes; each such partial routing update bulletin has a subsequence number. The subsequence number is zero for a full routing update bulletin. Every bulletin also has a horizon value, set by the reporting router, associated with each of its constituent links. (A given reporting router may have more than one constituent link, if it is a multi-port router.) Every time a bulletin is propagated, each horizon value is decremented by 1. When it hits zero, the bulletin is not propagated further. Note that for horizon purposes, a bulletin's individual constituent links may have separate horizons; when a link's horizon hits zero, other links' adjacency information from the same reporting router may continue to be propogated. It should also be noted that if a given link has adjacencies with different horizons, these must be treated as separate links, since horizon is reported as a characteristic of link. Such a circumstance may occur, for example, when a router serves a node group. Adjacencies within the node group are typically propagated with short horizons, since they are only of local interest (i.e., to other nodes in or near that node group). Similarly, the node group itself is propagated with a long horizon, across a backbone. However, adjacencies outside the node group may be propagated with long horizons, as they would not otherwise be reached across a backbone dependent upon node groups for long-haul routing. Every router maintains within memory a routers table containing one tuple for every other router on the network. This tuple contains the following elements: IP Address Last received bulletin sequence number Last received bulletin subsequence number Timestamp for when the data was received. Horizon remaining in bulletin when received **NEW IN 2.1** This table is used to coordinate the receipt and transmission of bulletins, using either broadcast or non-broadcast procedures. IV.3. Flooding without congestion Upon initialization of adjacencies Bulletins are forwarded in a routing update envelope which may contain one or more reporting routers' bulletins (adjacency lists), which are flooded to the network. A bulletin envelope may actually concatenate multiple reporting routers' bulletins; this is in fact the typical case, when routing update packets are exchange on schedule, **and when a given router acquires a new adjacency**. However, partial routing updates (good news and bad news) may be sent at any time. The first time an adjacency is acquired, the two routers perform a full routing update with each other, exchanging their complete link lists. Once this is complete, the routers perform the SPF algorithm and compute new routing tables. Preventing loops and retransmissions A bulletin from reporting router X (which lists all of X's adjacencies) is sent, initially by X, to every adjacent (to X) router A. This router A passes it along to all of its own adjacencies B, etc.. This flooding is controlled such that no reporting router's adjacency update is sent more than once between the same pair of routers. After router A sends its bulletin envelope to B, the recipient router B then looks at the sequence number of each reporting router X's adjacency bulletin within that envelope, and for each reporting router's bulletin, compares the sequence number of the just-received adjacency bulletin with the highest sequence number previously originated from X. If the just-received bulletin is newer (higher number), then it sends a positive acknowledgement to A, and joins in the flooding to its own adjacencies. The horizon is, of course, decremented. ** HOW IS IT POSITIVELY ACKED? NEED WE? ** If the bulletin from X has the same number as was stored in B, B checks the horizon left in the received bulletin against the horizon left when it previously received the bulletin. In the event that the latest bulletin received had a shorter (lower number) horizon left than the earlier one, it simply acknowledges the (redundant) message. But if the reporting router X's horizon left is greater for the new copy of the bulletin, router B propagates it as if it were a new bulletin. This way, if the bulletin happened to first arrive via a roundabout path, it won't accidentally fail to reach the intended end of its range. (Summary: Newer or longer-horizon bulletins are both passed along. Same age or older (by sequence number) or same or lower horizon will not be propagated.) If any router B receives from router A a bulletin (from any reporting router X) that contains a lower sequence number than one that had previously arrived at B from said X, then it can be presumed that A has "obsolete" information. B replies by sending a bulletin to A with the latest link state information for that node X. As a corrolary, a router may poll for information about a given reporting router by sending a routing update bulletin for that reporting router with a sequence number that is lower than the latest one it actually has received. Said bulletin may contain zero links' information. (Note that since the sequence number is modular, a value of 0 is not correct for this function as 0 is higher than 65535; the "poll" number should be only slightly lower.) IV.4. Non-broadcast bulletin envelope exchange A router may acquire routing information from adjacent routers via point-to-point procedures which treat every adjacency as a separate logical data link. (Such a procedure also works, of course, over point-to-point links such as wirelines.) This tends to have the highest reliability, since connection-oriented data link control procedures can be used to ensure the accuracy and completeness of the data passed over the link. It has the disadvantage of requiring separate transmission of the same data to different adjacent nodes on the same radio channel. Bulletin envelopes are thus exchanged (when connection-oriented is selected) periodically (based upon timers) and when new information is received (over a new adjacency, and when good and bad news arrive). Delivery of these bulletins is considered to be gnerally reliable. IV.5. Broadcast bulletin propagation When a router is adjacent to several other routers via the same broadcast (i.e., packet radio) channel, retransmission of routing bulletins to each adjacency makes additional use of bandwidth, which may be a scarce resource. A broadcast procedure may be used to pass the bulletin along that link. Note that broadcast propagation of bulletins may be combined with non-broadcast propagation, on a link by link basis. Although packet radio is a broadcast medium, it is not a perfect one: The reliability of multicast packets is not as high as for wireline LANs. This leads to the need for a unique broadcast "flooding" technique. A routing update bulletin is broadcast to the Layer 2 multicast AX.25 address using the IP multicast address. Individual recipient routers may or may not receive the entire bulletin, since there is no acknowledgement provided. In the previously described case of a non-broadcast message sent via a connection-oriented datalink, the entire routing update bulletin can be assumed to have been received intact. Thus, if a given reporting router has originated a new complete list of its adjacencies (new sequence numbers, subsequence numbers are 0), then any adjacency not included is presumed to have ceased to exist. Such a presumption is not always possible when a bulletin was received via unacknowledged broadcast, as the message might have been received in part. This should not make the partially received bulletin unusable. A bulletin envelope is transmitted in one or more packets, each of which constitutes a numbered fragment of the whole bulletin envelope. Within the transmitted bulletin envelope, each individual reporting router's bulletin begins with a node-header which contains the number of links being reported on. Within each bulletin, each link is identified by a link header, each of which specifies the number of adjacencies being reported on (for that link). Since each IP packet that makes up a bulletin contains a fragment number, it is also possible to determine whether or not a fragment was lost. Each packet also contains an envelope-ID, which prevents accidental mis-assembly of different bulletins that may arrive close in time. **NEW IN 2.1** In connection-oriented non-broadcast procedures, a routing update bulletin (not a partial one with a subsequence numbers >0) provides a complete list of adjacencies known to the sending node, except where the horizon is exceeded. Absence of a previouly-known adjacency then implies that the adjacency has been lost. However, that assumption does not apply to fragmented bulletins received in part, which can occur with broadcast procedures: If a fragment was lost before reaching the end of a given reporting router's bulletin within the bulletin envelope, then the absence of a previously-known adjacency in the received bulletin does not mean that the adjacency was lost. RSPF procedures dictate that routing update bulletins with a subsequence number of zero are presumed to be complete lists of adjacencies from their originators; higher subsequence numbers represent incremental changes only. In the broadcast procedures, a routing update bulletin with a subsequence number of zero, if received only in part, is treated as a partial routing update bulletin. If it received in full, it is treated as a full routing update bulletin. Thus, the recipient compares the sequence number with the previously received sequence number from that reporting router. If it is higher than the previously received one, then its adjacencies are used. If it was received in full, and had a subsequence number of 0, then its previous adjacencies are replaced (removed if not contained in the latest full routing update bulletin). If it was not received in full, or if its subsequence number was not zero, then its previous adjacencies are left intact unless explicitly changed by the received bulletin. If a bulletin is received in full, then the routers table is updated with the latest sequence and/or subsequence number, horizon **NEW 2.1** and timestamp. If it is received in part, then these entries are not updated. After the bulletin has been completely transmitted, a recipient node which has received a partial update from a reporting node may poll for that node's latest information, by using the (now known to be obsolete) sequence number for that router in its router table in a bulletin, with zero links for that reporting router. (This is the same polling procedure used for non-broadcast links.) Note that if a fragment is lost, a reporting router whose node-header is in the lost fragment will of course remain unchanged in the recipient's data base. Furthermore, any data received in subsequent fragments, prior to a node-header, will also be meaningless. The last adjacency of the last link in a reporting router's bulletin will have the Last flag set to 1, signaling that following the address, a node header follows. Note also that the first node-header within an envelope is pointed to by the sync byte in the envelope header. The last flag and sync byte should match or an error should be noted. IV.6. Routing update bulletin packets A routing update bulletin envelope (Table IV.1) may contain several different reporting routers' updated link state information, concatenated into one message, with a different sequence number for each source (reporting router). One of the sources, of course, may be the local router. Each router's link state information is further broken down by link, which allows its backbone routing information to be propogated separately from its local end node adjacency information. Incremental changes (good news and bad news) Bulletins that only report changes in state come in two flavors. Good News bulletins inform other routers that an adjacency has been noted. Bad News bulletins inform them that an adjacency has been lost. If an end node establishes a connection with a router whose node group default addresses (based on the significant bit count) already include that end station, however, no bulletin need be sent out, as packets to that end station will already be routed correctly. Theoretically, a router could send out a new full routing update bulletin every time it gained or lost an adjacency. However, the use of shorter Good News and Bad News packets, which do not contain a full routing update, allow the network to remain relatively current with less transmitted traffic. Good news and bad news packets are propogated like other packets, but are not originated by the same rules. While full routing bulletins are originated based upon a timer, good news packets are transmitted immediately upon receipt and initiated immediately after an adjacency is initialized. This enables new links to be useful quickly. Bad news, however, should not travel as fast: A node should cache any bad news message for a time (initial recommendation for this timer: 60 seconds) while waiting for the link to come back up. This helps keep the network stable; if the node receives a packet destined for the lost destination, it may send an ICMP "host unreachable" message to the originator of the packet, unless it can reroute the packet itself (as may happen with the loss of a backbone link when others still exist). Because good news and bad news messages represent changes to the last full link state bulletin propogated, but do not purport to completely represent a node's link states, they carry bulletin subsequence numbers. These use the last bulletin sequence number originated by the reporting router, but the sub-sequence value increments from 1. (A full link state packet has a sub-sequence value of 0, and resets the subsequence counter.) Routes to nearby destinations Sometimes more than one router will serve the same area (determined by the significant bits' matching), and they will need to know which one has the better path to a given station. These adjacency messages may only require a short horizon, as will Good News and Bad News packets which refer to end nodes going on and off the air. Higher horizons are needed for backbone routers. Table IV.1. Routing update (link state packet) bulletin envelope: 1 2 |0 |8 |6 |4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---- | RSPF Version #| Type | fragment # | fragment total| packet +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -hdr | Checksum | sync byte | # nodes below | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Envelope-ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---- | Reporting node #1 full IP Router-Address | node +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -hdr | Node 1 bulletin sequence # | sub-sequence #| # links | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---- | horizon left | ERP factor | link cost | #adjacencies | link +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ _-hdr_ |significant bits| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Adjacent node(s) 1,1,1 IP address | adj. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ --- |significant bits| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Adjacent node(s) 1,1,2 IP address | adj. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ --- ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |significant bits| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Adjacent node(s) 1,1,n IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ --- | horizon left | ERP factor | link cost | #adjacencies | link +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ --- |significant bits| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Adjacent node(s) 1,2,1 IP address | adj. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ --- ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |significant bits| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Adjacent node(s) 1,2,n IP address | adj. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ --- | Reporting node #2 full IP Address | node +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Node 2 bulletin sequence # | sub-sequence #| # links | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ --- | horizon left | ERP factor | link cost | #adjacencies | link +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ --- |significant bits| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ adj. | Adjacent node(s) 2,1,1 IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ --- |significant bits| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Adjacent node(s) 2,1,2 IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | horizon left | ERP factor | link cost | #adjacencies | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |significant bits| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Adjacent node(s) 2,2,1 IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |significant bits| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Adjacent node(s) 2,2,n IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reporting node #n full IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Node n bulletin sequence # | sub-sequence #| # links | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ etc. Parameters-- An RSPF bulletin packet is sent within IP with a type of . Each routing update envelope contains an envelope packet header that contains: RSPF Version Number: Version number of the protocol (initially 1). Type: (Value 1 for Routing Update Bulletin Envelope) Fragment Number: States which fragment, in a segmented message, this is, beginning at 1. Non-fragmented messages use 1. Fragment total: Total number of fragments in message; 1 if not fragmented. Checksum: IP-style checksum. Sync byte: Which octet in this packet (counting from this byte as byte 0) is the beginning of the first node-header. If 0, this fragment has no node-header. Non-fragmented messages use a value of 4 (because 3 bytes follow in packet header). **NOTE CHANGE IN 2.1** Number of nodes reporting: The number of reporting routers in the messages that follows (this packet or a sequence of packets forming the envelope). The node-header (for each reporting router) contains 8 octets: Reporting router #n full IP router address: The IP address of the router whose adjacencies are being reported below. (Note that if a router uses separate IP addresses on its links, it should still adopt a single one as its router address.) Bulletin sequence number: When a bulletin is passed along, this number is NOT changed; every new bulletin from a given Reporting router has a value 1 higher than the previous bulletin from that reporting router. (Note: This is modulo 65536, so implementations must cope with the "wrap around" and consider values below, say, 100, to be "higher" than values above, say, 65400. Between 100 and 65400, modular arithmetic is NOT used.) Sub-sequence number: Good news and bad news packets representing incremental changes from the last full report increment this value by 1; it is 0 for full bulletins. # links: The number of different cost-horizon values (typically, but not necessarily, representing different types of link in a mulitiport gateway) being reported below; the following four octets are the header for each link. [For each reporting router, adjacencies are reported separately by link cost. This is the received cost value for that data link, after any adjustment that may be based upon packet loss ratio. Adjacencies are also reported separately by horizon, even if the cost is the same. It does not matter at this point if there are multiple RF or other links if their cost and horizon are the same. Likewise, separate received costs or horizons on one link will be treated separately and such adjacencies will be grouped under separate link headers:] Horizon left: This number is decremented every time a routing update bulletin is passed along; when it reaches 0, it is not passed further. Link cost: A "figure of merit" for each link, rising with slower/poorer links. This is the number whose total is minimized by SPF. The range is 1-127. As a starting point, a 56000 bps fdx backbone link might have a value of 2, a 4800bps hdx link a value of 5, a 1200bps hdx link a value of 10 and a 300 bps hf "wormhole" a value of 20. An Ethernet or high-speed (1 Mbps+) link might then have a value of 1. A value of 255 denotes the loss of a link; this is found in Bad News packets. These values should be coordinated network-wide; adjusting them will change the way packets are routed by RSPF. Number of adjacencies: The number of adjacencies to be listed for that reporting node. ERP Factor: Used for "approximating" a route; contains the number of significant bits for which a given node can be presumed to be a valid router, even if it doesn't report in detail. A low factor = wider coverage area; thus ERP of 16 means that if NO other match is found for a given address, this router will try to handle it if it matches 16 bits. Basically a handle for future enhancements; its use will not be required in the initial release of RSPF. For each adjacency of the given link cost, the following is provided: Significant bits: The number of bits used for node group routing purposes. Usually 32 but may be lower if a given link purports to serve all end nodes in an area defined using the most-matched-bits node group convention. Higher numbers of bits matched take a higher priority than least cost. This uses the low-order 5 bits of the octet. Last-flag: If this is the last adjacency in the list for this reporting router, this value is 1; otherwise it is 0. (This occupies the high-order bit of the significant bits octet.) Full IP address: The full IP address for this node. Note that the n,n,n vector within the bulletin has three fields in the above representation: Reporting router within bulletin envelope, link cost/horizon within reporting router's bulletin, and reporting adjacency with that link cost/horizon. IV.7. Fragmentation In a moderate to large network, a full routing update can easily exceed the maximum size of an AX.25 frame or IP packet. The RSPF Routing Update message, however, may be sent in fragments. The IP fragmentation function is not used for this; bulletins are not assumed to be terminated by a packet boundary. Each fragment is, however, numbered in the packet header, along with an indication of the number of the total number of fragments in that envelope. In order to permit parsing of the incoming fragments by routers who are using unacknowledged broadcast information (with the high likelihood of lost fragments), every bulletin's packet header contains a sync byte indicator. This indicates which byte, where the next byte in the header is byte 1, is the beginning of a node header. If a previous fragment was lost, the receiver should ignore the number of bytes indicated in the sync byte before resuming parsing of the packet. (If the fragment does not exceed 255 bytes, this will always be sufficient. It is recognized that long packets may not be able to use this mechanism reliably, and a value of "0" should be used to indicate that no sync is possible within this fragment.) Each routing update bulletin envelope takes the form of a three- dimensional list, with the dimensions being reporting router, link and adjacency. A given bulletin envelope may report on link state from one or more remote nodes, as well as from the sending node. Each node may have one or more links; each link may have one or more adjacencies. Packets may not be fragmented within adjacencies, but may be fragmented after an adjacency's address and before the next adjacency's significant bits field. (This presents a 5-octet field that should not be fragmented.) The next fragment, in a new packet, simply begins where the last one left off; the receiver knows how much more to expect based upon the node and link count in the respective node-header and link-header. A router may not start sending a new Routing Update message of any kind to any given IP address until all fragments of a previous message to that address have been transmitted. This applies both to point to point (non-multicast address) and multicast procedures. IV.8. Bulletin Timers The timers used for bulletin updates must be a compromise between maintaing the network's current state and the traffic required to do so. An initial suggestion: Good news messages should be initiated within a few seconds and bad news messages should be initiated within one minute, with relatively short horizons on the bulletins (i.e., the routers within the region). Full routing updates with normal horizons should be sent out every 30 minutes. If the network is small, more frequent updates may be possible; too frequent updates risk choking the network with update traffic. V. The Shortest Path First spanning tree algorithm As a routing node comes onto the network, it acquires over time a complete list of adjacencies between other nodes on the network except as limited by the "horizon". Each adjacency (point to point link) within the entire known network has a "cost" associated with it. It should be noted that adjacencies, for the purposes of this algorithm, are "logical" and not necessarily physical; if a subnetwork link occurs below IP (as in AX.25 digipieating or NET/ROM), the two ends of the link are still adjacent. (Adjacency at the IP internet layer is based upon subnetworks, which may contain their own links.) Cost is set for the receive (** Changed in 2.1 to facilitate automatic link quality adjustment **) side of each link. If the receiver and transmitter do not agree on cost, the network shall apply different routes for packets going in opposite directions between the same two end nodes. (This is not a problem. In a radio environment, one should NOT assume reciprocity across a link.) V.1. Tables Each router builds a _link state table_ that lists, for every known link in the network (from every reporting router), the two ends and the cost of that end of the link. The ends are listed by an IP router address and, for the destination IP router address, a number of significant bits. This is what's updated by the bulletins and by good news/bad news messages. Source IP address Dest. IP addr/bits Cost 44.56.0.44 44.56.0.128/32 5 44.56.0.44 44.56.0.12/25 10 The goal of the SPF algorithm is to build a _paths table_ which lists, for every reachable node (or node group approximation of fewer than 32 bits) on the network, that node address (or node group address and number of significant bits), the adjacent node used to get there (i.e., where you blast the packets to next), and the total cost of the path. (This feeds the Route table in the IP Route module in NET.) This is done by building a spanning tree with the router doing the computation (the home router) as the root of the tree. The paths table thus lists the best way across the tree from the home router to all known destinations. Every router contains, for the purposes of building the tree, a _trial table_ that has three entries: Destination address/bits, adjacent node, and cost of this path. The paths table additionally has one more entry, the _parent_ node, which is the last hop before the destination. Thus in a path A -> B -> C -> D -> E, home router A views E as the destination, D as the parent, and B as the adjacency. Note that in the path from A to B, A is the parent node. Begin building the paths table by using the home router as the first node on the paths table. The cost to reach yourself is always 0, so make the first entry on the paths table as follows: Source=self, destination=self, parent=self, and cost=0. From here on in, parent is always (by definition of the SPF spanning tree) the node most recently added to the paths table. Destination Adjacent Parent Cost 44.56.0.128 44.56.0.128 44.56.0.44 5 44.56.0.131 44.56.0.128 44.56.0.128 10 44.56.0.200 44.56.0.128 44.56.0.131 15 Paths Table showing relationship between destination, parent and adjacent nodes, where the home node is 44.56.0.44 and 44.56.0.200 is three hops away; all hops here have a cost of 5. V.2. Scanning the links The home router now scans its links table looking for all nodes (routers and end nodes) that are adjacent to the current parent node, except for links to nodes which are already on the paths table. (It is generally fastest to build the paths table by scanning the links table in order of increasing link cost.) Each of these new nodes is added to the trial table, except for the parent node (which is already on the paths table, so it can't possibly have a shorter path). The value of cost placed on the trial table is the cost of the link from the parent to the destination plus the cost from home to the parent node (which value is already on the paths table). A node may only appear once in the trial table at any given time. If an adjacency is found to a node that is already on the trial table, a new entry replaces the existing entry if and only if the new total cost is lower. If the cost is higher, it is ignored. (If the costs are equal, then a "tiebreaker" is determined by treating the lower-numbered IP address as the lower cost. This will simply make the spanning tree more deterministic in case of tie.) Thus the trial table always contains the lowest cost path to each destination found so far. Once all of the links from the parent node have had their chance to go onto the trial table, scan the entire trial table for the _one_ entry with the lowest total cost. This not need be a link from that parent node! In case of tie, pick the lower IP address (again just to be deterministic). Move this node to the paths table; guaranteed, there'll be no cheaper path to that node! The adjacency used for that node in the paths table is the adjacency to its parent node. Note that the parent _must_ already be on the paths table since that's the source of the parent; you're working your way outward. This new addition to the paths table becomes the new parent node. Repeat the procedure above (from the beginning of V.2) until there are no nodes left on the trial table. This means you've completed the spanning tree and have found the least-cost path to every other node. One of the router parameters is maximum_cost. If the cost to a given parent node exceeds this value, the procedure stops, since all subsequent entries in the route table will have a higher cost. This value has some semblane to the time-to-live parameter of the IP packet: It makes little sense to compute a routing table to nodes that cannot be reached within time-to-live. (Ideally, ttl will be implemented using a timer rather than a node counter, but this is difficult to do with some of the packet radio data link controller implementations; vis., TNCs.) A router should recalculate its routes periodically based upon the current links table. How frequently depends upon the CPU load involved. Initial estimates are that it should be recalculated after receipt of every routing update bulletin: Partial calculations do not save enough CPU time to make them worthwhile. V.3. Filling in the routing table The route table in NOS and NET (the KA9Q et al implementations of IP) contains, for each entry, the destination address, number of bits, interface, gateway and metric. This is essentially left intact, except that metric is filled in by cost and the routing decision looks for the least cost of all matches. The route table is filled in from the paths table. Since the routing table will be periodically recalculated from scratch, implementation may require two route tables, one "in progress" and one "in service". When the route calculation is complete, the "in progress" table becomes "in service" and the old one is cleared for reuse. This allows packet forwarding to continue while the completed paths table is being converted into the route table. A manual route table may also be maintained. This should be overriden by RSPF-generated entries. Manual routes are useful defaults before RSPF generates routing entries, for destinations not reported on by RSPF, and for interworking with other routing protocols. Appendix I. Router parameters Every router must set a number of parameters in order to properly operate. While RSPF builds its routing matrix automatically, overall network efficiency and stability may require some fine-tuning based upon experience. These include parameters set for each data link or subnetwork layer entity (i.e., each radio channel) and for the router in general. Link/subnet settings: Set Link cost: This is the cost parameter based upon the link speed and type. Since the overall cost of the end-to-end path is minimized by the RSPF spanning tree, link cost should be set to arrive at the best overall network performance. The legal range is 1-127. This is sent in routing update bulletins. Node settings: Add/Delete Node group: [IPaddr]/bits {cost}. This allows a node to announce its ability to serve a group of nodes, identified using the standard NET convention of address/significant bits. Thus a node group setting of [44.56.0.1]/25 will match all nodes from [44.56.0.1] to [44.56.0.127]. Cost is optional; if set, this cost to will be propogated to reach such nodes; otherwise, the link's default is used. Set horizon link: This sets the horizon value for the node's routing bulletins apropos 32-bit addresses of other connected routers. This should be high enough to propogate across the backbone. Set horizon group: This sets the horizon value for the node's routing bulletins apropos node group addresses (fewer than 32 bits matched). Usually matches the horizon link value. Set horizon local: This sets the horizon vaue for the node's routing bulletins apropos full link addresses (32 bits) **to non-routers** within the router's node group area. This is set to a low value so that only other routers serving the same or overlapping node group(s) will receive these messages. Set horizon portable: This sets the horizon value for the node's routing bulletins apropos full link addresses (32 bits) not within a node group. This allows portable end nodes to have their location in the network propogated farther than the local horizon; this is usually set the same as horizon group. Set broadcast timer: This sets the time, in seconds, between router-router hello (RRH) broadcasts. Initial suggestion: 900. Set suspect timer: This sets the time, in seconds, after which an adjacency is "suspect" if no packets are received from it. An ICMP echo is then issued. Initial suggestion: 900. Set suspect count: This sets the number of times an ICMP echo (ping) should be sent after a node is suspect, before it is removed from the adjacency list.