Discussion:
[j-nsp] OSPF neighbor Down :: without reason
Farhan Jaffer
2008-03-24 05:25:56 UTC
Permalink
Hi,

I am running OSPF in my n/w. Yesterday one neighbor got down & up
after few seconds, i couldn't get any idea from logs.

One side:
rpd[3120]: RPD_OSPF_NBRDOWN: OSPF neighbor a.b.c.d state changed from
Full to Down due to InActiveTimer (event reason: neighbor was inactive
and declared dead)

Other side:
rpd[3055]: RPD_OSPF_NBRDOWN: OSPF neighbor e.f.g.h state changed from
Full to Init due to 1WayRcvd (event reason: neighbor is in one-way
mode)

There was no media flapping, errors on media, router malfunctioning, etc, etc.

Any idea?

Thanks in advance.
Stefan Fouant
2008-03-24 11:47:29 UTC
Permalink
Can you '*set flag event detail*' on the traceoptions within 'protocols
ospf'? You should be able to get a little more detailed information as to
what is causing this problem.

Stefan Fouant
Post by Farhan Jaffer
Hi,
I am running OSPF in my n/w. Yesterday one neighbor got down & up
after few seconds, i couldn't get any idea from logs.
rpd[3120]: RPD_OSPF_NBRDOWN: OSPF neighbor a.b.c.d state changed from
Full to Down due to InActiveTimer (event reason: neighbor was inactive
and declared dead)
rpd[3055]: RPD_OSPF_NBRDOWN: OSPF neighbor e.f.g.h state changed from
Full to Init due to 1WayRcvd (event reason: neighbor is in one-way
mode)
There was no media flapping, errors on media, router malfunctioning, etc, etc.
Any idea?
Thanks in advance.
_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Raymond Cheh
2008-03-24 17:16:57 UTC
Permalink
Post by Stefan Fouant
Can you '*set flag event detail*' on the traceoptions within 'protocols
ospf'? You should be able to get a little more detailed information as to
what is causing this problem.
Such traceoption is defintely useful.

From the 2 log messages below, it also seems that 1 side stop seeing
the OSPF packages from the other side. I would also do a
'monitor traffic interface verbose', filtering only the OSPF packets,
and see if the OSPF packets get through on both sides, and see if
what the hello packets say the same thing about the neighbors.

Thanks.

Raymond
Post by Stefan Fouant
Stefan Fouant
Post by Farhan Jaffer
Hi,
I am running OSPF in my n/w. Yesterday one neighbor got down & up
after few seconds, i couldn't get any idea from logs.
rpd[3120]: RPD_OSPF_NBRDOWN: OSPF neighbor a.b.c.d state changed from
Full to Down due to InActiveTimer (event reason: neighbor was inactive
and declared dead)
rpd[3055]: RPD_OSPF_NBRDOWN: OSPF neighbor e.f.g.h state changed from
Full to Init due to 1WayRcvd (event reason: neighbor is in one-way
mode)
There was no media flapping, errors on media, router malfunctioning, etc, etc.
Any idea?
Farhan Jaffer
2008-03-28 16:04:09 UTC
Permalink
Thanks for all comments.

But these logs are valuable if it happens again. I am running this n/w
more than 2 yrs now. And this is first time OSPF neighbor failed
without any apparent reason.

My question is, Is there any other reasons other than media to flapped
neighbor relationship?

Thanks again for all feedback.

Regards
Farhan Jaffer
Can you 'set flag event detail' on the traceoptions within 'protocols ospf'?
You should be able to get a little more detailed information as to what is
causing this problem.
Stefan Fouant
Post by Farhan Jaffer
Hi,
I am running OSPF in my n/w. Yesterday one neighbor got down & up
after few seconds, i couldn't get any idea from logs.
rpd[3120]: RPD_OSPF_NBRDOWN: OSPF neighbor a.b.c.d state changed from
Full to Down due to InActiveTimer (event reason: neighbor was inactive
and declared dead)
rpd[3055]: RPD_OSPF_NBRDOWN: OSPF neighbor e.f.g.h state changed from
Full to Init due to 1WayRcvd (event reason: neighbor is in one-way
mode)
There was no media flapping, errors on media, router malfunctioning, etc,
etc.
Post by Farhan Jaffer
Any idea?
Thanks in advance.
_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Stefan Fouant
2008-03-28 16:16:49 UTC
Permalink
Sure, there could be many reasons such as mismatched Hello or Dead timers,
duplicate Router IDs, OSPF HELLOs not being processed due to lack of CPU
resources, underlying L2 problems preventing the Hellos from being received,
MTU mismatch, receipt of an unexpected Database Descriptor sequence
number... the list is endless...

And even... sometimes router gremlins come out at night for no other reason
than to make our lives miserable.

Stefan
Post by Farhan Jaffer
Thanks for all comments.
But these logs are valuable if it happens again. I am running this n/w
more than 2 yrs now. And this is first time OSPF neighbor failed
without any apparent reason.
My question is, Is there any other reasons other than media to flapped
neighbor relationship?
Thanks again for all feedback.
Regards
Farhan Jaffer
Can you 'set flag event detail' on the traceoptions within 'protocols
ospf'?
You should be able to get a little more detailed information as to what
is
causing this problem.
Stefan Fouant
Post by Farhan Jaffer
Hi,
I am running OSPF in my n/w. Yesterday one neighbor got down & up
after few seconds, i couldn't get any idea from logs.
rpd[3120]: RPD_OSPF_NBRDOWN: OSPF neighbor a.b.c.d state changed from
Full to Down due to InActiveTimer (event reason: neighbor was inactive
and declared dead)
rpd[3055]: RPD_OSPF_NBRDOWN: OSPF neighbor e.f.g.h state changed from
Full to Init due to 1WayRcvd (event reason: neighbor is in one-way
mode)
There was no media flapping, errors on media, router malfunctioning,
etc,
etc.
Post by Farhan Jaffer
Any idea?
Thanks in advance.
_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Keegan.Holley
2008-03-28 16:38:48 UTC
Permalink
Keegan Holley
Network Managed Services Engineer I
SunGard Availability Services
Mezzanine Level MC-95
401 N. Broad St.
Philadelphia, PA 19108
215.446.1242 (office)
609.670.2149 (cell)
Keegan.Holley at sungard.com
___________________________________________
Keeping People and Information Connected?
http://www.availability.sungard.com



"Farhan Jaffer" <bandhani at gmail.com>
Sent by: juniper-nsp-bounces at puck.nether.net
03/28/08 12:13 PM

To
"Stefan Fouant" <sfouant at gmail.com>
cc
juniper-nsp at puck.nether.net
Subject
Re: [j-nsp] OSPF neighbor Down :: without reason





This looks like a physical layer problem. Maybe not a link flap per se
but some sort of media issue. I'm not familiar with the inactive timer
though so I don't know what the first message means exactly. The one-way
message seems kind of odd as well. Your neighbor declared you dead before
you did the same. The route only noticed because the neighbor sent a
hello without it's router id in the list of known routers. What kind of
link is connecting the two routers? Is it frame-relay or sonet? It seems
like there was a link failure that didn't lead to an interface flap. Are
there any errors on your sh int <> extensive? Also how often does this
happen.
Can you 'set flag event detail' on the traceoptions within 'protocols
ospf'?
You should be able to get a little more detailed information as to what
is
causing this problem.
Stefan Fouant
Post by Farhan Jaffer
Hi,
I am running OSPF in my n/w. Yesterday one neighbor got down & up
after few seconds, i couldn't get any idea from logs.
rpd[3120]: RPD_OSPF_NBRDOWN: OSPF neighbor a.b.c.d state changed from
Full to Down due to InActiveTimer (event reason: neighbor was inactive
and declared dead)
rpd[3055]: RPD_OSPF_NBRDOWN: OSPF neighbor e.f.g.h state changed from
Full to Init due to 1WayRcvd (event reason: neighbor is in one-way
mode)
There was no media flapping, errors on media, router malfunctioning,
etc,
etc.
Post by Farhan Jaffer
Any idea?
Thanks in advance.
_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Andrew Mulheirn
2008-03-28 17:05:36 UTC
Permalink
Hi -

The inactive timer is (I presume) the OSPF dead-timer expiring. If so (and if set to defaults) it means that e.f.g.h didn't receive an OSPF hello for 40 seconds from a.b.c.d.


a.b.c.d discovering that e.f.g.h is in one-way state probably happened immediately after e.f.g.h decided a.b.c.d was dead. I guess a.b.c.d still thought the neighbour relationship was still full, which means that either the problem with hello packets being received only affected one neighbour, or that a.b.c.d's timer hadn't expired.

Presumably the chronology is something like this:

1. e.f.g.h doesn't get hellos from a.b.c.d

2. a.b.c.d may still be getting e.f.g.h's hellos, so hasn't timed-out the neighbour relationship

2. e.f.g.h's inactive timer expires, and decides a.b.c.d is dead

3. e.f.g.h clears the neighbour state internally

4. e.f.g.h sends a hello and goes into one-way, waiting for neighbours to let themselves be known

5. a.b.c.d receives the new hello which no longer contains his RID, and clears his neighbour state to start afresh.


Sounds to me like a layer-2 issue that affected OSPF hello reception on e.f.g.h somehow. Broadcast storm or ARP poisoning of a.b.c.d's address maybe?

Just a few random thoughts.....

Andrew

-----Original Message-----
From: juniper-nsp-bounces at puck.nether.net [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf Of Keegan.Holley at sungard.com
Sent: 28 March 2008 16:39
To: Farhan Jaffer
Cc: juniper-nsp-bounces at puck.nether.net; juniper-nsp at puck.nether.net
Subject: Re: [j-nsp] OSPF neighbor Down :: without reason

This looks like a physical layer problem. Maybe not a link flap per se but some sort of media issue. I'm not familiar with the inactive timer though so I don't know what the first message means exactly. The one-way message seems kind of odd as well. Your neighbor declared you dead before you did the same. The route only noticed because the neighbor sent a hello without it's router id in the list of known routers. What kind of link is connecting the two routers? Is it frame-relay or sonet? It seems like there was a link failure that didn't lead to an interface flap. Are there any errors on your sh int <> extensive? Also how often does this happen.
Post by Farhan Jaffer
rpd[3120]: RPD_OSPF_NBRDOWN: OSPF neighbor a.b.c.d state changed
from Full to Down due to InActiveTimer (event reason: neighbor was
inactive and declared dead)
rpd[3055]: RPD_OSPF_NBRDOWN: OSPF neighbor e.f.g.h state changed
from Full to Init due to 1WayRcvd (event reason: neighbor is in
one-way
mode)
Loading...