VRRP Concepts

Some quick facts :

Protocol – 112; Multicast address 224.0.0.18; Preemption Enabled(by default);Priority=100 + highest IP; Timers 1/3.6;Only master sends hellos;

  • During the re-election all members will send multicast packets with same virtual source MAC – switch may see port flapping in that moment. 
  • Can’t advertise less than a 1 second timer – because of 1 byte field which can be 0 or 1, timers needs to be set on each router locally if you want to have a lower than 1 sec.
  • You better set the timers equally otherwise you might have two master scenario

What problem does it solves ? :

-It’s designed to eliminate the single point of failure in a statically routed network.

In a nutshell – we are making one logical router of two physical ones.

From user guide :

VRRP specifes a MASTER router that owns the next hop IP and MAC address for end stations on a local area network (LAN). The
MASTER router is chosen from the virtual routers by an election process and forwards packets sent to the next hop IP address. If the
MASTER router fails, VRRP begins the election process to choose a new MASTER router and that new MASTER continues routing trafc

VRRP uses the virtual router identifer (VRID) to identify each virtual router confgured. The IP address of the MASTER router is used as
the next hop address for all end stations on the LAN. The other routers the IP addresses represent are BACKUP routers.

 

RFC3768 describes this in details, but basically we have one virtual router with virtual ip and hosts which are using that virtual IP as a gateway, in case if one of the routers from VRRP instance will fail other will still be routing the packets.

In VRRP we have a Master/Backup routers, election process is based on highest IP or router priority.

Master router will be actively working on routing the packets while backup router should “keep the silence” and monitor the availability of master router(using keep alive messages).

What Backup router does while its in the Backup state: taken from RFC3768

While in this state, a VRRP router MUST do the following:

– MUST NOT respond to ARP requests for the IP address(s) associated
with the virtual router.

– MUST discard packets with a destination link layer MAC address
equal to the virtual router MAC address.

– MUST NOT accept packets addressed to the IP address(es) associated
with the virtual router.

In case of different events :

If a Shutdown event is received, then:

o Cancel the Master_Down_Timer
o Transition to the {Initialize} state

– If the Master_Down_Timer fires, then:

o Send an ADVERTISEMENT
o Broadcast a gratuitous ARP request containing the virtual
router MAC address for each IP address associated with the
virtual router
o Set the Adver_Timer to Advertisement_Interval
o Transition to the {Master} state

– If an ADVERTISEMENT is received, then:

o Set the Master_Down_Timer to Skew_Time

else:

If Preempt_Mode is False, or If the Priority in the
ADVERTISEMENT is greater than or equal to the local
Priority, then:

o Reset the Master_Down_Timer to Master_Down_Interval

else:

o Discard the ADVERTISEMENT

What master router does during :

While in the {Master} state the router functions as the forwarding
router for the IP address(es) associated with the virtual router.

While in this state, a VRRP router MUST do the following:

– MUST respond to ARP requests for the IP address(es) associated
with the virtual router.

– MUST forward packets with a destination link layer MAC address
equal to the virtual router MAC address.

– MUST NOT accept packets addressed to the IP address(es) associated
with the virtual router if it is not the IP address owner.

– MUST accept packets addressed to the IP address(es) associated
with the virtual router if it is the IP address owner.

Here is an example of VRRP config on Dell N-Series switches.

Configuring two instances for different sub-networks in vlan 50.

vrrp_example

On Cisco :

vrrp

In case if you are using Dell Force10 Switches, you can put the VRRP on top of VLT, this would allow you to have both VRRP MAC addresses populated in your LOCAL_DA Switch CAM-Table and allow the active-active routing instead of active-passive as it’s described in RFC.To check that MACs are being populated on both VLT peers you can use a command : show cam mac stack-unit 0 port-set 0 | grep vrrp_virtual_mac

Nice article about this can be found under this link.

Dell Networking VLT concepts

So what is a VLT and what does it does :

Virtual link trunking (VLT) allows physical links between two Dell switches to appear as a single virtual link to the network core or other
switches such as Edge, Access, or top-of-rack (ToR). As a result, the two physical switches appear as a single switch to the connected
devices.

Basically we are creating one logical switch out of two physical switches.

From the left we see how it looks when interconnected physically, from right how end device sees it.

vlt_concept

Configuration steps :

1.Enable spanning tree – RSTP and PVST supported  – step is optional, but nevertheless recommended.

configure

protocol spanning-tree rstp

bridge-priority 4096 (primary VLT switch)

bridge-priority 8192 (Secondary VLT switch)

no disable

Recommended to have a root bridge on VLT master and to set STP priority to secondary VLT switch in case if the first fails no to have topology change when other third unknown device would become a root.

2. Configure ports for VLTi link :

configure

interface range fortyGigE 0/56 , fortyGigE 0/60

no shutdown

interface port-channel 100

channel-member fortyGigE 0/56,60

no shutdown

3. Create VLT domain on both switches, don’t forget to create a backup-link

configure

vlt domain 1

primary-priority 10 (primary VLT switch)

primary-priority 20 (Secondary VLT switch)

back-up destination 192.168.0.2 (Primary VLT switch, management interface)

back-up destination 192.168.0.3  (Secondary VLT switch management interface)

peer-link port-channel 100

Backup links are needed to have a heartbeat messages flowing between two switches.

heartbeat

VLT also would work without the heartbeat but then you can encounter possible split brain scenario in case of VLTi link failure.

After configuring the VLT we should get the following picture :

shvltbrief.png

Now let’s attach a device to our VLT switches.

On both VLT members pick up a port for redundant connection :

interface port-channel xx

no ip address

switchport

channel-member tex/x/x

no shut

vlt-peer-lag port-channel 110

And you are ready to go.

You can tweak the stuff like dampening – just to give some time for routing and other protocols to get online after rebooting the switch ,as ports will get up faster and devices without knowing that routing protocol is not ready yet may black hole the traffic.

You can also play with spanning-tree metrics – to have interruption after reboot as small as possible.

VLT behavior :

vlt_behaviour

You can check that MACs are being synced using the command :

show mac-address-table count

Some of the of interesting points to remember (more you can find by downloading the user guide)

  • When you enable Layer 3 routing protocols on VLT peers, make sure the delay-restore timer is set to a value that allows sufcient time
    for all routes to establish adjacency and exchange all the L3 routes between the VLT peers before you enable the VLT ports.

  • RSTP and PVST is supported only, no other spanning-tree would work properly in vlt config

  • Stacking is not allowed when configuring the VLT.

  • If the source is connected to an orphan (non-spanned, non-VLT) port in a VLT peer, the receiver is connected to a VLT (spanned) portchannel, and the VLT port-channel link between the VLT peer connected to the source and ToR is down, trafc is duplicated due to
    route inconsistency between peers. To avoid this scenario, Dell Networking recommends confguring both the source and the receiver
    on a spanned VLT VLAN.

  • In a scenario where one hundred hosts are connected to a Peer1 on a non-VLT domain and trafc flows through Peer1 to Peer2; when
    you move these hosts from a non-VLT domain to a VLT domain and send ARP requests to Peer1, only half of these ARP requests reach
    Peer1, while the remaining half reach Peer2 (because of LAG hashing). The reason for this behavior is that Peer1 ignores the ARP
    requests that it receives on VLTi (ICL) and updates only the ARP requests that it receives on the local VLT. As a result, the remaining
    ARP requests still points to the Non-VLT links and trafc does not reach half of the hosts. To mitigate this issue, ensure that you
    confgure the following settings on both the Peers (Peer1 and Peer2):
    arp learn-enable and mac-address-table stationmove refresh-arp

  • Don’t use any VLAN config on VLTi – switch will match the vlans automatically

  • Don’t use Dynamic lang on VLTI – static is recommended

  • In a VLT domain, the following software features are supported on VLTi: link layer discovery protocol (LLDP), flow control, port
    monitoring, jumbo frames, and data center bridging (DCB)

  • If the link between the VLT peer switches is established, changing the VLT system MAC address or the VLT unit-id causes the link
    between the VLT peer switches to become disabled. However, removing the VLT system MAC address or the VLT unit-id may
    disable the VLT ports if you happen to confgure the unit ID or system MAC address on only one VLT peer at any time.

  • If the link between VLT peer switches is established, any change to the VLT system MAC address or unit-id fails if the changes
    made create a mismatch by causing the VLT unit-ID to be the same on both peers and/or the VLT system MAC address does not
    match on both peers

  • If VLTi connectivity with a peer is lost but the VLT backup connectivity indicates that the peer is still alive, the VLT ports on the
    Secondary peer are orphaned and are shut down.

    Also the L3 VLANS would be shut down too

Some failure scenarios :

failurescenarios

Overall VLT is a great thing for load balancing, redundancy and availability (you can upgrade the switches one by one without having a downtime) – In stack this wouldn’t be possible.

All info and images were taken from Dell User guide for S4048-ON switch, you can download it by following this link : http://downloads.dell.com/manuals/all-products/esuprt_ser_stor_net/esuprt_networking/esuprt_net_fxd_prt_swtchs/force10-s4048-on_administrator%20guide15_en-us.pdf  

In user guide you can find a lot of detailed info about all the possible switch OS functions and how to use/implement/troubleshoot them.

Dell VLT Peer-Routing

Some important points about VLT Peer routing technology.

Peer routing enables one VLT node to act as a proxy gateway for the other peer in a VLT domain. When you enable routing on VLT peers,
you can also enable the peer routing feature.  

In a nutshell, when peer-routing is enabled on both VLT switches you can load-balance, the L3 packets through both switches – as this allows a switch in VLT domain to forward traffic on behalf of its peer switch.

Example how VLT forwards the traffic without peer-routing enabled :

without_peer_routing

 

When you enable peer-routing :

with_peer_routing

Images taken from Configuration guide

Peer-routing helps to avoid sub-optimal routing, reduces the latency by avoiding another hop in traffic path, no need to have VRRP.

Keep in mind in case if switch – Peer-1 will fail with peer routing enabled, your traffic will still be forwarded without any interruption – but as you don’t have any virtual IP address any control or management plane requests won’t be answered by Switch-1’s peer.

So basically by enabling peer routing we have only one goal – redundancy and traffic sharing for L3 protocols.

During the bootup of VLT peer switches, a forwarding loop may occur until the VLT confgurations are applied on each switch and the
primary/secondary roles are determined.


To prevent the interfaces in the VLT interconnect trunk and RSTP-enabled VLT ports from entering a Forwarding state and creating a
traffic loop in a VLT domain, take the following steps.


1 Configure RSTP in the core network and on each peer switch as described in
Rapid Spanning Tree Protocol (RSTP).
Disabling RSTP on one VLT peer may result in a VLT domain failure.


2 Enable RSTP on each peer switch.
PROTOCOL SPANNING TREE RSTP mode
no disable

forwarding loop3 Configure each peer switch with a unique bridge priority.
PROTOCOL SPANNING TREE RSTP mode
bridge-priority

More info about peer-routing advantages comparing to VRRP.

https://hasanmansur.com/2016/06/09/vlt-peer-routing-and-routed-vlt/

Routed VLT v1.2 – document covers peer-routing in great details.

 

Error Detection – Checksum

So how does the checksum works ?

  • Both devices needs to agree on checksum number – will it be odd or even
  • The higher the number is the more precise is the check

Let’s take the example from one of YouTube videos with a great explanation of how Checksum is working.

Taking the numbers which we want to transmit :

25 11 12 7 13 4

Both devices should agree on checksum – let it be number 16

  1. Sum up the numbers 25+11+12+7+13+4=72
  2. Divide them by checksum 72/16=4.5(ignore what is after the .) so we have 4
  3. 4*16=64 = 72 – 64 = 8  
  4. Now 8 is a checksum
  5. First device takes the numbers and writes them to tcp/ip stack, puts also the number 8 to the checksum field – to transmit it with actual message, this part helps to know if the rest of message is correct and right
  6. Second device reads the numbers from tcp/ip stack and performs the checksum check, if it get’s the same value = 8 that means that data wasn’t corrupted and we can trust it.

This is a very fast check to compute, but unfortunately checksum is not robust and not reliable it can help only against single bit error.

For example if we would send 25 11 12 7 13 4 but message will arrive as 24 12 12 7 13 4 it will also be 72 and checksum won’t detect any problems here.

checksum

IP Checksum picture taken from Stanford Networking Course

A bit info about ICMP, ping and traceroute

I’ve went through RFC 792 and would like to share some basic(high level) info about ICMP and how we are using it.
ICMP is an INTERNET CONTROL MESSAGE PROTOCOL
  • It runs over network layer – so it’s encapsulated in IP datagrams
  • Unreliable – Simple datagram service, there is no retries to re transmit the messages in case if it failed to reach the destination
  • ICMP message is generated using the header of IP datagram(source address, destination address) and it takes first 8 bytes from original IP datagram payload,afterwards the message will be marked with type and code. Some of the types : icmp_types
  • Host unreachable – when IP datagram gets to the last router but last router doesn’t know where the host is
  • Port unreachable –  means that the ports that’s contained inside of outgoing packet is not being recognized at receivers end

How does the PING uses ICMP :

The ping application calls ICMP directly, it sends ICMP echo request – message type 8 code 0 to receiver.

that get’s encapsulated into IP datagram, flows through the network, when receiver will get it it will send echo reply – type 0 code 0

How Traceroute uses ICMP and UDP – *nix version, as Windows uses pure ICMP sends the echo requests until they won’t get echo reply from the target(link) :

  • For TRACEROUTE the goal is to discover all routers in the path, show the path and provide the round trip delay.

  • When  execute the traceroute it generates an UDP message which will be encapsulated in IP datagram, TTL will be set to 1 for the first message.

  • After reaching the first router, TTL will be decremented and equal to 0, that will force the router to drop the packet and generate a ICMP message back to sender with ICMP Type 11 = which means TTL expired.

  • To send that TTL expired message back router will take the IP header data and first 8 bytes from IP payload.

  • When TTL expired message will reach the source – traceroute will know that TTL has expired and this message has arrived from first hop router, also traceroute will measure the round-trip-time ( how long it took from sending the UDP message to receiving TTL expired back)

  • Now it will generate a second UDP message only with one change – the TTL field value will be increased to 2, then the same to 3 and etc. it will stop only when destination port unreachable message arrives back.

  • Traceroute by generating the requests also generates a random unusable UDP port number, when our UDP datagram will get through all the routers to destination, receiver won’t be able to recognize the UDP port number and will send the ICMP Type 3 Code 3 Message – Destination Port Unreachable – after receiving that Traceroute will end the trace.

Picture of ICMP types taken from Stanford Networking course http://online.stanford.edu/course/introduction-computer-networking

 

The TCP and UDP segment format

Putting this here just for reference as repeating Stanford Networking course.

Some of important fields in TCP Segment :

Destination port – tells the TCP layer which application should get the bytes on ether end.

Source port Says where the data should get back, when app starts sending the data it generates unique source port number – to be able to receive the data back. (to differentiate the connection between host A and B)

Sequence number Indicates the position of the byte stream in TCP Data field.

Acknowledgment Sequence Tells the other end which byte we are expecting next, also says that until now we have received all data correctly.

16 Bit Checksum – To detect corrupt data, bit errors on the wire for example.

Header length Tells how long the header is, also shows how many options are present.

Flags : ack, urg, push bit, reset flag, syn, fin 

Window Size – Could be 1 – means stop and wait, could be 0 means connection will be closed, could be 1500 or other value means that much of bytes we can send without the acknowledge.

The unique id of a TCP connection.

In IPv4 Header we have IP Dest. A.; IP Source A., Protocol ID=TCP = 104-bit globally unique ID.

As the first steps host a increments source port for every new connection.

Then TCP picks ISN (initial sequence number) to avoid overlap with previous connection with same ID.

UDP 

UDP has only 4 header fields unlike TCP which as 10.

Fields :

Source port, Destination PortLength, Checksum(Optional field) if it’s used then it’s calculated with UDP header and data otherwise it’s filled with 0 fields.

In a nutshell UDP is unreliable delivery – no acks, no way to detect missing datagrams, no flow control, packets may show up in any order, TCP has all those function and the issue of TCP is that TCP datagram is much bigger than UDP and carries a lot of features which might be not needed for apps like video streaming or DNS or features like flow control might be already implemented in APP itself. For example right now we are observing more and more intensive usage of protocol QUIC in network.