Configuration Guide
"Easy RoCE" Configuration Guide
15 min
introduction introduction rdma (remote direct memory access), as a network based memory access technology, is widely adopted in supercomputing, ai training, storage, and other scenarios initially implemented on infiniband networks, rdma later evolved into ethernet based protocols—iwarp and roce (rdma over converged ethernet) rocev2 operates over the connectionless udp protocol compared to connection oriented tcp, udp offers faster speeds and lower resource consumption however, unlike tcp—which ensures reliable transmission through mechanisms like sliding windows and acknowledgment—rocev2 faces significant performance degradation upon packet loss rdma nics discard subsequently received packets when loss occurs, forcing retransmission of all subsequent data from the sender thus, rdma requires a lossless ethernet environment to address this, rocev2 employs pfc (priority flow control) and ecn (explicit congestion notification) technologies to guarantee data transmission reliability to simplify the difficulty of lossless ethernet deployment and maintenance, asterfusion has launched "easy roce" on asternos focusing on the requirements of rocev2 scenario, we have implemented a business level command line wrapper on high rate products, such as cx532p n, to achieve the best maintainability and usability in this scenario enable "easy roce" enable "easy roce" with key parameters cable length, incast level, traffic model, the system will automatically generate a lossless configuration template it applies the default dscp mapping, with pfc and ecn enabled for queue 3 and 4, and strict priority scheduling set for queue 6 and 7 the key parameters of easy roce are detailed below cable length cable length cable length, used for the calculation of pfc and ecn parameters incast level incast level incast traffic model, optional levels low, medium, high, for the calculation of pfc parameters traffic mode traffic mode service type, optional throughput sensitive service, delay sensitive service, balanced type, used for the calculation of ecn parameters note cx308p 48y n v2 and cx532p n v2 models do not support this function at now the lossless roce configuration template generated by "easy roce" is shown as follows sonic# show running config buffer profile roce lossless 25g profile mode lossless dynamic 1 size 1518 xoff 27104 xon offset 13440 ! buffer profile roce lossless 50g profile mode lossless dynamic 1 size 1518 xoff 28448 xon offset 13440 ! buffer profile roce lossless 100g profile mode lossless dynamic 1 size 1518 xoff 38816 xon offset 13440 ! class map roce lossless 25g class map match cos 3 4 ! class map roce lossless 50g class map match cos 3 4 ! class map roce lossless 100g class map match cos 3 4 ! diffserv map type ip dscp roce lossless 25g diffserv map default copy ! diffserv map type ip dscp roce lossless 50g diffserv map default copy ! diffserv map type ip dscp roce lossless 100g diffserv map default copy ! wred roce lossless 25g ecn mode ecn gmin 15360 gmax 1388576 gprobability 90 ! wred roce lossless 50g ecn mode ecn gmin 15360 gmax 1388576 gprobability 90 ! wred roce lossless 100g ecn mode ecn gmin 15360 gmax 1388576 gprobability 90 ! policy map roce lossless 25g class roce lossless 25g class map priority group buffer roce lossless 25g profile wred roce lossless 25g ecn queue scheduler priority queue 6 queue scheduler priority queue 7 set cos dscp diffserv roce lossless 25g diffserv map ! policy map roce lossless 50g class roce lossless 50g class map priority group buffer roce lossless 50g profile wred roce lossless 50g ecn queue scheduler priority queue 6 queue scheduler priority queue 7 set cos dscp diffserv roce lossless 50g diffserv map ! policy map roce lossless 100g class roce lossless 100g class map priority group buffer roce lossless 100g profile wred roce lossless 100g ecn queue scheduler priority queue 6 queue scheduler priority queue 7 set cos dscp diffserv roce lossless 100g diffserv map meanwhile, when the template above is not fully applicable to your business scenario, we recommend you to modify the parameters via command line, refer to https //docs asternos com/aidc/configuration guide/v31/11 qos configuration//#configuration and parameter tuning for details roce default setting roce default setting the default setting of roce is shown in the following table table 1 default setting of roce table 1 default setting of roce parameters default value cable length 40m incast level low traffic model latency create roce configuration templates create roce configuration templates table 2 create roce configuration templates table 2 create roce configuration templates purpose command description enter global configuration view configure terminal generate lossless roce configuration templates qos roce lossless \[ cable length length ] \[ incast level level ] \[ traffic model model ] length specify the cable length, optional 5m/40m/ 100m/300m level specify the incast model, optional low / medium / high model specify the flow model, optional throughput / latency / balance apply lossless roce configuration to all interfaces apply lossless roce configuration to all interfaces table 3 apply lossless roce configuration to all interfaces table 3 apply lossless roce configuration to all interfaces purpose command description enter global configuration view configure terminal apply lossless roce configuration to all interfaces qos service policy { roce lossless | roce profile name } roce lossless roce template under default parameters roce profile name the name of the specific roce template apply lossless roce configuration to specified interfaces apply lossless roce configuration to specified interfaces table 4 apply lossless roce configuration to specified interfaces table 4 apply lossless roce configuration to specified interfaces purpose command description enter global configuration view configure terminal enter the roce template configuration view qos roce roce profile name roce profile name specifies the roce template name bind the interfaces that need to be enabled for roce configuration bind interface { all | ethernet interface name range interface name list } configuration and parameter tuning configuration and parameter tuning when the default lossless roce configuration above is not fully applicable to your business scenario, you can adjust configurations and parameters through the command line to optimize business performance configure dscp mapping configure dscp mapping table 5 configure dscp mapping table 5 configure dscp mapping operation command description enter global configuration view configure terminal enter dscp mapping configuration view diffserv map type ip dscp roce lossless diffserv map modify dscp to cos mapping ip dscp value cos cos value value dscp value, range 0 63 cos value cos value, range 0 7 default { cos value | copy } cos value indicates that all packets are mapped to the corresponding cos value default copy indicates to use default dscp mapping of the system configure queue scheduling policy configure queue scheduling policy if you have already bound the lossless roce policy to interfaces, please unbind it first before modifying table 6 configure queue scheduling policy table 6 configure queue scheduling policy purpose command description enter global configuration view configure terminal enter lossless roce policy configuration view policy map roce lossless configure sp mode scheduling queue scheduler priority queue queue id queue id range from 0 to 7 configure dwrr mode scheduling queue scheduler queue limit percent queue weight queue queue id percentage specifies the scheduling weight of dwrr, range from 0 to 100 queue id range from 0 to 7 set pfc threshold set pfc threshold table 7 set pfc threshold table 7 set pfc threshold purpose command description enter global configuration view configure terminal enter pfc configuration view buffer profile roce lossless profile modify pfc lossless buffer mode lossless dynamic dynamic th size size xoff xoff xon offset xon offset \[ xon xon ] dynamic th dynamic th is a dynamic threshold coefficient, and the range is \[ 4,3] dynamic th = 2dynamic th remaining available buffer e g , if dynamic th is set to 1, then the dynamic threshold is 2 times of the remaining available buffer, i e , the actual threshold is 2/3 of the total available buffer size size specifies the reservation size in bytes, and the recommended configuration value is 1518 xoff xoff is pfc backpressure frame trigger buffer threshold value, and it is recommended to configure it as an integer multiple of the cell size in bytes xoff is related to the cable length, interface rate and other parameters, and you can refer to the recommended configuration values for configuration xoff must be greater than the xon value xon offset xon offset is pfc backpressure frame stop buffer threshold value, which is recommended to be an integer multiple of the cell size, and the unit is byte the recommended configuration value is 13440 xon xon is an optional parameter and is normally configured as 0 set ecn threshold set ecn threshold table 8 set ecn threshold table 8 set ecn threshold purpose command description enter global configuration view configure terminal enter ecn configuration view wred roce lossless ecn modify ecn parameters mode ecn gmin min th gmax max th gprobability probability \[ ymin min t h ymax max th yprobability probability | rmin min th rmax max th rprobability probability ] min th min th set the low limit absolute value of ecn in bytes when the message length in the queue reaches this value, the interface starts to set the ecn field of the message to ce according to the probability the configurable minimum value is 15 kb the recommended configuration value is 15360 max th max th set the high limit absolute value of ecn in bytes when the message length in the queue reaches this value, the interface starts to set ecn field of all packets to ce the recommended configuration values for the different rate interfaces are as follows 100g(bps) 768000(bytes) 200g(bps) 768000(bytes) 400g(bps) 1536000(bytes) probability probability set the maximum discard probability in integer form the range is \[1,100] it is recommended to set the drop probability to 90 percent for latency sensitive services and 10 percent for throughput sensitive services display and maintenance display and maintenance table 9 "easy roce" display and maintenance table 9 "easy roce" display and maintenance purpose command description display roce related configurations show qos roce \[ all | summary | roce profile name ] by default, only roce configuration templates in use (i e , bound interfaces) are displayed all view all created roce configurations summary to view the summary of roce configuration roce profile name specify the roce configuration template name to be viewed display the relationship between interfaces and policy maps show interface policy map display roce statistics of the interface show counters qos roce interface interface name queue queue id clear roce statistics of all interfaces clear counters qos roce configuration examples configuration examples enable easy roce and apply it to all interfaces sonic# configure terminal sonic(config)# qos roce lossless cable length 40m incast level low traffic model latency sonic(config)# qos service policy roce lossless 40m low latency display roce related configurations sonic# show qos roce notice displaying configurations of in use roce profiles \==> roce profile roce lossless 40m low latency | roce policy map roce lossless 40m low latency 25g <== + + + + \| | operational | description | +====================+===================+=====================================================+ \| mode | lossless | qos roce mode | + + + + \| status | bind 0/104 0/107 | qos roce binding status | + + + + \| cable length | 40m | cable length in meters for qos roce lossless config | + + + + \| congestion control | | | \| congestion mode | ecn | congestion control mode | \| enabled tc | 3,4 | congestion control config enabled traffic class | \| max threshold | 1388576 | congestion control config max threshold | \| min threshold | 15360 | congestion control config max threshold | + + + + \| pfc | | | \| pfc priority | 3,4 | pfc enabled switch priority | \| tx status | enabled | pfc rx status | \| rx status | enabled | pfc tx status | + + + + \| trust | | | \| trust mode | dscp | trust setting for packet classification | + + + + \====> roce dscp >sp mapping configurations <==== + + + \| dscp | switch priority | +=========================+===================+ \| 0,1,2,3,4,5,6,7 | 0 | \| 8,9,10,11,12,13,14,15 | 1 | \| 16,17,18,19,20,21,22,23 | 2 | \| 24,25,26,27,28,29,30,31 | 3 | \| 32,33,34,35,36,37,38,39 | 4 | \| 40,41,42,43,44,45,46,47 | 5 | \| 48,49,50,51,52,53,54,55 | 6 | \| 56,57,58,59,60,61,62,63 | 7 | + + + \====> roce sp >tc mapping & ets configurations <==== + + + + \| switch priority | mode | weight | +===================+========+==========+ \| 6 | sp | | \| 7 | sp | | + + + + \====> pfc profile configurations <==== + + + \| profile name | switch priority | +============================================+===================+ \| egress lossless profile | 3,4 | \| egress lossy profile | 0,1,2,5,6,7 | \| ingress lossy profile | 0,1,2,5,6,7 | \| roce lossless 40m low latency 25g profile | 3,4 | \| roce lossless 40m low latency 50g profile | 3,4 | \| roce lossless 40m low latency 100g profile | 3,4 | + + + …… display roce statistics of the interface sonic# show counters qos roce interface 0/32 queue 3 operational \ roce states ethernet32 3 pfc stats \ pfc rx stats 0 \ pfc tx stats 402 \ pg stats \ total packet 11,380,786,999 \ total bytes 1,456,740,735,872 \ drop packet 0 \ curr occupancy 0 ecn stats \ ecn stats 0 \ ecn buffer \ shared use watermark byte 0 \ total use watermark byte 0 \ total use count byte 0 queue stats \ counter pkts 0 \ counter bytes 0 \ drop pkts 0 \ drop bytes 0 \ counterrate pkts 0 0 \ counterrate bytes 0 0 \ droprate pkts 0 0 \ droprate bytes 0 0 \ occupancy bytes 0 \ sharedoccupancy bytes 0
