Solaris Link Aggregation

From Genunix

Jump to: navigation, search

Link Aggregation is the process of turning multiple physical Ethernet links into a single logical one. Formally, IEEE 802.3ad Link Aggregation, is the predecessor to IEEE 802.3 Trunking, a change that occurred in 2000 when 802.3ad was accepted.

Aggregations are known also as "Trunks", "Port Trunks", "Teaming", "Port Teaming", and "Bonding". Cisco's varient of 802.3ad is branded "EtherChannel". Its important to note than in some cases "Teaming" and "Bonding" also refer to link multipathing, which is handled on Solaris by means of IP Multipathing (IPMP) and is separate from link aggregation.

It is extremely important to understand: Link Aggregation does *not* work by passing packets across all the links in an aggregate group in a round robin fashion. When a packet arrives a simple calculation is made by XOR'ing the source and destination addresses (which can be L2, L3, or L4) modulo the link id. The result is that any given source-destination pair will be "pinned" to one of the links in the aggregate. Hence a single TCP connection can never achieve speeds surpassing the throughput of a single link. Therefore, while you might aggregate 4 1Gbps links into a single aggregate, you'll never get more than 1Gbps in any single data transfer. In order to test aggregates you should run multiple tests in parallel.

Switch Config: It's worth noting that the balancing algorithm you choose on the host should match the switch. If it doesn't, you'll end up with asymetric data flows and in some cases random behaviour. L4 balancing is usually only available on fairly "high end" switches. Most L3 switches default to using IP source and destination addresses. Pure L2 switches only support XOR'ing on the MAC addresses if they support link aggregations at all.

The Emeryville 172 storage network is based on aggregating at least 2 ports on each system to render a full throughput of 2Gbps.

[private:/tmp] root# /sbin/ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
aggr1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 172.16.165.6 netmask ffff0000 broadcast 172.16.255.255
        ether 0:14:4f:20:dc:1 
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 10.71.165.6 netmask ffffff00 broadcast 10.71.165.255
        ether 0:14:4f:20:dc:0 

[private:/tmp] root# dladm show-aggr   
key: 1 (0x0001) policy: L4      address: 0:14:4f:20:dc:1 (auto)
           device       address                 speed           duplex  link    state
           e1000g1      0:14:4f:20:dc:1   1000  Mbps    full    up      attached
           e1000g2      0:14:4f:20:dc:2   1000  Mbps    full    up      attached
           e1000g3      0:14:4f:20:dc:3   1000  Mbps    full    up      attached

Contents

Modifying Aggregates

Interfaces can dynamically be added or removed from an aggregate.

[atlantis:/] root# dladm show-aggr
key: 1 (0x0001) policy: L4      address: 0:14:4f:3f:b7:42 (auto)
           device       address                 speed           duplex  link    state
           e1000g2      0:14:4f:3f:b7:42          1000  Mbps    full    up      attached
           e1000g3      0:14:4f:3f:b7:43          1000  Mbps    full    up      attached
[atlantis:/] root# dladm remove-aggr -d e1000g3 1
[atlantis:/] root# dladm show-aggr
key: 1 (0x0001) policy: L4      address: 0:14:4f:3f:b7:42 (auto)
           device       address                 speed           duplex  link    state
           e1000g2      0:14:4f:3f:b7:42          1000  Mbps    full    up      attached
[atlantis:/] root# dladm add-aggr -d e1000g3 1
[atlantis:/] root# dladm show-aggr
key: 1 (0x0001) policy: L4      address: 0:14:4f:3f:b7:42 (auto)
           device       address                 speed           duplex  link    state
           e1000g2      0:14:4f:3f:b7:42          1000  Mbps    full    up      attached
           e1000g3      0:14:4f:3f:b7:43          0     Mbps    half    down    standby

Statistics

Statistics per device, link, or aggregation can be found with the appropriate subcommand and the -s argument.

[atlantis:/] root# dladm show-aggr -s
key: 1  ipackets  rbytes      opackets   obytes          %ipkts %opkts
           Total        2357125   273500473   2524810   3729981936  
           e1000g2      880647    111667510   2514674   3728911734      37.4    99.6  
           e1000g3      1476478   161832963   10136     1070202         62.6    0.4  
 
[atlantis:/] root# dladm show-link -s
                ipackets  rbytes         ierrors opackets        obytes      oerrors
e1000g0         16035714  4689453445  0       4283198   4691755344  0       
e1000g1         10175974  973179872   0       15377977  9061905989  0       
e1000g2         0         0           0       0         0           0       
e1000g3         0         0           0       0         0           0       
aggr1           2357147   273504377   0       2524831   3729984088  0   

[atlantis:/] root# dladm show-dev -s
                ipackets  rbytes         ierrors opackets        obytes      oerrors
e1000g0         16036762  4689521110  0       4283231   4691758802  0       
e1000g1         10176143  973196714   0       15378199  9061924565  0       
e1000g2         880710    111675568   0       2514705   3728914930  0       
e1000g3         1476508   161836413   0       10146     1071246     0  

Additionally, tools like nicstat can be used:

[private:/tmp] root# ./nicstat 10
    Time   Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs   %Util     Sat
00:06:01 aggr0/1  1314.9  1570.7  6524.8  6022.9   206.4   267.1    2.36    0.00
00:06:01 e1000g1/0   665.4   244.8  3432.7   673.9   198.5   372.0    0.75    0.00
00:06:01 aggr1  1314.9  1570.7  6524.8  6022.9   206.4   267.1    2.36    0.00
00:06:01 e1000g3/0   453.4   100.5  2396.0   259.3   193.8   396.8    0.45    0.00
00:06:01 e1000g0    0.06    0.28    0.91    0.85   66.59   332.0    0.00    0.00
00:06:01 e1000g2/0   196.1  1225.4   696.1  5089.7   288.4   246.5    1.16    0.00
00:06:01 e1000g0/0    0.06    0.28    0.91    0.85   66.59   332.0    0.00    0.00
    Time   Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs   %Util     Sat
00:06:11 aggr0/1  1255.7  2369.5  5859.5  5996.3   219.4   404.6    2.97    0.00
00:06:11 e1000g1/0   778.3  1155.6  3210.3  1657.3   248.3   714.0    1.58    0.00
00:06:11 aggr1  1255.7  2369.6  5859.7  5996.7   219.4   404.6    2.97    0.00
00:06:11 e1000g3/0   351.7   61.27  2056.8   278.8   175.1   225.0    0.34    0.00
00:06:11 e1000g0    0.14    0.61    2.29    3.18   64.00   196.5    0.00    0.00
00:06:11 e1000g2/0   125.8  1153.7   593.1  4061.4   217.1   290.9    1.05    0.00
00:06:11 e1000g0/0    0.14    0.61    2.29    3.18   64.00   196.5    0.00    0.00

LACP

Several arguments to dladm are used to support LACP:

     -l mode
     --lacp-mode=mode

         Specifies whether LACP should be used and, if used,  the
         mode  in  which it should operate. Legal values are off,
         active or passive.

     -T time
     --lacp-timer=time

         Specifies the LACP timer value.  The  legal  values  are
         short or long.

     -L
     --lacp

         Specifies whether detailed LACP  information  should  be
         displayed.

Examples:

[private:/tmp] root# dladm show-aggr -L
key: 1 (0x0001) policy: L4      address: 0:14:4f:20:dc:1 (auto)
                LACP mode: off  LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    e1000g1   passive  short   yes          no    no   no   no        no     
    e1000g2   passive  short   yes          no    no   no   no        no     
    e1000g3   passive  short   yes          no    no   no   no        no   


Enabling LACP

Example of temporarily enabling LACP in passive mode:

[atlantis:/] root# dladm show-aggr -L
key: 1 (0x0001) policy: L4      address: 0:14:4f:3f:b7:42 (auto)
                LACP mode: off  LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    e1000g2   passive  short   yes          no    no   no   no        no     
    e1000g3   passive  short   yes          no    no   no   no        no 

[atlantis:/] root# dladm modify-aggr -t -l passive 1  

[atlantis:/] root# dladm show-aggr -L
key: 1 (0x0001) policy: L4      address: 0:14:4f:3f:b7:42 (auto)
                LACP mode: passive      LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    e1000g2   passive  short   yes          no    no   no   yes       no     
    e1000g3   passive  short   yes          no    no   no   yes       no 

BREAKS STORAGE NETWORK!

[atlantis:/] root# ping -s 172.16.165.6
PING 172.16.165.6: 56 data bytes
^C
[atlantis:/] root# dladm modify-aggr -t -l off 1
[atlantis:/] root# dladm show-aggr -L
key: 1 (0x0001) policy: L4      address: 0:14:4f:3f:b7:42 (auto)
                LACP mode: off  LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    e1000g2   passive  short   yes          no    no   no   yes       no     
    e1000g3   passive  short   yes          no    no   no   yes       no     
[atlantis:/] root# ping -s 172.16.165.6
PING 172.16.165.6: 56 data bytes
64 bytes from 172.16.165.6: icmp_seq=0. time=0.385 ms
64 bytes from 172.16.165.6: icmp_seq=1. time=0.253 ms
64 bytes from 172.16.165.6: icmp_seq=2. time=0.247 ms
^C
----172.16.165.6 PING Statistics----
3 packets transmitted, 3 packets received, 0% packet loss
round-trip (ms)  min/avg/max/stddev = 0.247/0.295/0.385/0.078

Policies

From the dladm(1M) man page:

     -P policy
     --policy=policy

         Specifies the port selection  policy  to  use  for  load
         spreading  of  outbound  traffic.  The  policy specifies
         which dev object is used to send packets. A policy  con-
         sists  of  a  list  of  one  or  more  layers specifiers
         separated by commas. A layer specifier  is  one  of  the
         following:

         L2       Select outbound device according to source  and
                  destination MAC addresses of the packet.

         L3       Select outbound device according to source  and
                  destination IP addresses of the packet.

         L4       Select outbound device according to  the  upper
                  layer  protocol  information  contained  in the
                  packet. For TCP and UDP, this  includes  source
                  and destination ports. For IPsec, this includes
                  the SPI (Security Parameters Index.)


LACP Mode

LACP can be put into one of 3 modes: off, active, or passive.

         -l mode, --lacp-mode=mode

             Specifies whether LACP should be used and, if  used,
             the  mode  in  which it should operate. Legal values
             are "off", "active" or "passive".

LACP controls modification of the link. Future testing should demonstrate the following:

  • How does removal of a aggregate member (simulated failure) effect the link in different LACP modes?
  • Isn't the point of LACP to enable aggregation without hard-coding the target ports?

Benchmarks

Using Network benchmarking pathload and pathrate the following results were found on the improperly configured aggregates in Emeryville (sung-to-jennifer, e1000g2/3 aggr1 through Dell PowerConnect):

  • Capacity: 950 Mbps
  • Available Bandwidth: 923.08 - 1090.91 (Mbps) (There was one spike to 1714.29Mbps)

Aggregations on the Wire

LACP advertisements are made via Multicast from the destination address 01:80:C2:00:00:02. The following is an example packet as viewed with snoop:

ETHER:  ----- Ether Header -----
ETHER:
ETHER:  Packet 112 arrived at 10:57:24.26997
ETHER:  Packet size = 128 bytes
ETHER:  Destination = 1:80:c2:0:0:2, (multicast)
ETHER:  Source      = 0:1:e8:d5:b6:4c,
ETHER:  Ethertype = 8809 (Unknown)
ETHER:

In the above packet the destination 1:80:c2:0:0:2 is the multicast destination for LACP advertisements [1]. 0:1:e8:d5:b6:4c is the Force10 MAC address. Ethertype 8809 is an IEEE 802.3 [2] packet.

See Also

Attribution

This content was donated by Joyent.

Personal tools