[j-nsp] QFX5100 buffer allocation

Discussion:

Brian Rak

2018-05-16 16:06:55 UTC

We've been trying to track down why our 5100's are dropping traffic due
to lack of buffer space, even with very low link utilization.

show interfaces xe-0/0/49:0 extensive

....
    Carrier transitions: 1, Errors: 0, Drops: 276796488, Collisions: 0,
Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0,
Resource errors: 0, Bucket drops: 0
Egress queues: 12 supported, 5 in use
Queue counters:       Queued packets Transmitted packets Dropped packets
    0                                0 1876090180637            276796488
    3                                0 0                    0
    4                                0 0                    0
    7                                0 663877                    0
    8                                0 0                    0

show class-of-service forwarding-class

Forwarding class                       ID      Queue Policing priority
No-Loss
best-effort                          0         0 normal        Disabled
fcoe                                 1         3 normal        Enabled
no-loss                              2         4 normal        Enabled
network-control                      3         7 normal        Disabled
mcast                                8         8 normal        Disabled

So whatever, we've got queues configured for traffic we never see.

show class-of-service shared-buffer egress

Egress:
Total Buffer     : 12480.00 KB
Dedicated Buffer : 3744.00 KB
Shared Buffer    : 8736.00 KB
    Lossless          : 4368.00 KB
    Multicast         : 1659.84 KB
    Lossy             : 2708.16 KB

To me, this appears that the 5100's by default reserve about 70% of the
available shared port buffers for lossless+multicast traffic. The
documentation seems to back me up here:

https://www.juniper.net/documentation/en_US/junos/topics/concept/cos-qfx-series-buffer-configuration-understanding.html#jd0e1441
https://www.juniper.net/documentation/en_US/junos/topics/example/cos-shared-buffer-allocation-lossy-ucast-qfx-series-configuring.html

Has anyone else encountered this? We're going to be adjusting the
buffers per the documentation, but I'd be really interested to hear if
anyone has hit this before.

_______________________________________________
juniper-nsp mailing list juniper-***@puck.nether.n

Michael Still

2018-05-16 17:29:08 UTC

Permalink

If you don't need the lossless queue then you can axe most of it. This
is what I've done:
set class-of-service shared-buffer ingress percent 100
set class-of-service shared-buffer ingress buffer-partition lossless percent 5
set class-of-service shared-buffer ingress buffer-partition lossy percent 95
set class-of-service shared-buffer ingress buffer-partition
lossless-headroom percent 0
set class-of-service shared-buffer egress percent 100
set class-of-service shared-buffer egress buffer-partition lossless percent 5
set class-of-service shared-buffer egress buffer-partition lossy percent 90
set class-of-service shared-buffer egress buffer-partition multicast percent 5

show class-of-service shared-buffer

Ingress:
Total Buffer : 12480.00 KB
Dedicated Buffer : 2912.81 KB
Shared Buffer : 9567.19 KB
Lossless : 478.36 KB
Lossless Headroom : 0.00 KB
Lossy : 9088.83 KB

Lossless Headroom Utilization:
Node Device Total Used Free
0 0.00 KB 0.00 KB 0.00 KB

Egress:
Total Buffer : 12480.00 KB
Dedicated Buffer : 3744.00 KB
Shared Buffer : 8736.00 KB
Lossless : 436.80 KB
Multicast : 436.80 KB
Lossy : 7862.40 KB

You may be able to tweak even further for your own needs.

We've been trying to track down why our 5100's are dropping traffic due to
lack of buffer space, even with very low link utilization.

show interfaces xe-0/0/49:0 extensive

show class-of-service forwarding-class

show class-of-service shared-buffer egress

Total Buffer : 12480.00 KB
Dedicated Buffer : 3744.00 KB
Shared Buffer : 8736.00 KB
Lossless : 4368.00 KB
Multicast : 1659.84 KB
Lossy : 2708.16 KB
To me, this appears that the 5100's by default reserve about 70% of the
available shared port buffers for lossless+multicast traffic. The
https://www.juniper.net/documentation/en_US/junos/topics/concept/cos-qfx-series-buffer-configuration-understanding.html#jd0e1441
https://www.juniper.net/documentation/en_US/junos/topics/example/cos-shared-buffer-allocation-lossy-ucast-qfx-series-configuring.html
Has anyone else encountered this? We're going to be adjusting the buffers
per the documentation, but I'd be really interested to hear if anyone has
hit this before.
_______________________________________________
https://puck.nether.net/mailman/listinfo/juniper-nsp

--
[***@gmail.com ~]$ cat .signature
cat: .signature: No such file or directory
[***@gmail.com ~]$
_______________________________________________
juniper-nsp mailing list juniper-***@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Brian Rak

2018-05-17 00:41:45 UTC

Permalink

Post by Brian Rak
We've been trying to track down why our 5100's are dropping traffic
due to lack of buffer space, even with very low link utilization.

There's only 12 Mbyte of buffer space on the Trident II chip. If you
get 10 Gbit/s bursts simultaneously on two ports, contending for the
same output port, it will only take 10 ms to fill 12 Mbyte. (And of
those 12 Mbyte, 3 Mbyte is used for dedicated per-port buffers, so you
really only have ~9 Mbyte, so you would actually fill your buffers in
7.5-8 ms.)
Do you see any actual problems due to the dropped packets? Some people
would have you believe that TCP suffers horribly from a single dropped
packet, but reality is not quite that bad. So don't chase problems
that aren't there.
Our busiest ports have drop rates at about 1 in every 15'000 packets
(average over a few months), and so far we haven't noticed any TCP
performance problems related to that. (But I should note that most
of our traffic is long-distance, to and from sites at least several
milliseconds away from us, and often a 10-20 ms away.)
That said, for Trident II / Tomahawk level of buffer sizes, I think
it makes sense to configure them to have it all actually used, and
not wasted on the lossless queues.
You should probably also consider enabling cut-through forwarding, if
you haven't already done so. That should decrease the amount of buffer
space used, leaving more available for when contention happens.
/Bellman

A lot of what we run are game servers, which are very heavily UDP. They
can deal with some dropped packets, but it's not ideal.

We're not even doing 10gbit of traffic, so the buffers should last at
least a little bit.

Thanks for the tip about cut-through, we didn't have that enabled. Do
you happen to know if it works from a 10g port to a broken out 4x10g port?

It's annoying to be dropping packets with a bunch of unused buffer space.
_______________________________________________
juniper-nsp mailing list juniper-***@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Thomas Bellman

2018-05-17 06:58:36 UTC

Permalink

Post by Brian Rak
We're not even doing 10gbit of traffic, so the buffers should last at
least a little bit.

And you're not hitting 10 Gbit/s even under very short bursts of a few
milliseconds? Microbursts like that don't show up in "normal" usage
graphs where you only poll your switches/routers every minute or so.

Post by Brian Rak
Thanks for the tip about cut-through, we didn't have that enabled.
Do you happen to know if it works from a 10g port to a broken out
4x10g port?

Should do. From the perspective of the Trident II chip, they are
not any different from normal 10G ports. Cut-through doesn't work
between ports of different speed, and the ports involved must not
have any rate limiting or shaping, but other than that I don't know
of any limitations. (And if you receive broken packets, those will
be forwarded instead of thrown away; that is the only disadvantage
of cut-through mode that I have heard of.)

Post by Brian Rak
It's annoying to be dropping packets with a bunch of unused buffer space.

Just make sure you don't fill your buffers so much that you get a long
(measured in time), standing queue, since that will just turn into a
long delay for the packets without helping anything (search for "buffer-
bloat" for mor information). Not a big problem on Trident II-based
hardware, but if you have equipment that boasts about gigabytes of buffer
space, you may need to watch out.

Oh, and I believe both changing buffer allocation and enabling/disabling
cut-through mode resets the Trident chip, causing a short period (less
than one second, I belive) where traffic is lost.

/Bellman

Brian Rak

2018-05-21 15:27:02 UTC

Permalink

Post by Thomas Bellman

Post by Brian Rak
We're not even doing 10gbit of traffic, so the buffers should last at
least a little bit.

It's possible that we have high traffic bursts, I don't really have the
data to say one way or another (we only graph traffic at 5 minute intervals)

Post by Thomas Bellman

Post by Brian Rak
Thanks for the tip about cut-through, we didn't have that enabled.
Do you happen to know if it works from a 10g port to a broken out
4x10g port?

Post by Brian Rak
It's annoying to be dropping packets with a bunch of unused buffer space.

Just make sure you don't fill your buffers so much that you get a long
(measured in time), standing queue, since that will just turn into a
long delay for the packets without helping anything (search for "buffer-
bloat" for mor information). Not a big problem on Trident II-based
hardware, but if you have equipment that boasts about gigabytes of buffer
space, you may need to watch out.
Oh, and I believe both changing buffer allocation and enabling/disabling
cut-through mode resets the Trident chip, causing a short period (less
than one second, I belive) where traffic is lost.
/Bellman

We've seen significant reductions in dropped traffic after changing the
buffer allocations. We're going to continue to investigate why it's
happening, but things seem to be much happier now.

I'm still not sure why the defaults are the way they are, I can't
imagine FCOE traffic is common enough to warrant inclusion in the
default configs.

_______________________________________________
juniper-nsp mailing list juniper-***@puck.nether.net
https://puck.nether.net/