[net2-wg] mirage progress

Rodrigo Fonseca rfonseca at cs.berkeley.edu
Fri Jun 16 10:22:22 PDT 2006


> Wow, this is a very powerful (and useful!) debugging infrastructure.

It's really useful for debugging multihop routing problems.


> This case in ForwardingEngineP occurs when acknowledgments are
> enabled but the next hop does not acknowledging the packet. This
> could be because the link has become bad; since the acks are at L2,
> it's not a question of the next hop queue being full (for which we'd
> need ECN). Have you taken a look at node 130 (the next hop), to see
> what's happening there? Is it still forwarding/sending packets, or
> has it gone silent?

130 does not ever report receiving the packet (which it would even if
the queue were full, since it is logged unconditionally at the start
of ForwardingEngineP$SubReceive.receive).
It is still on, and reporting its routing info correctly.

116 does not change its parent, and 11 seconds later it still reports
the same parent, with similar quality:

116 44760 00 FF FF 09 22 16 33 00 82 05 00 38 00 00 11
-> My parent is 130 (0x82), my hopcount 5, and my ETX is 6.6 (0x38)
Then it varies between 34 (6.2) and 3E (7.2).

>
> This data suggests that the links are changing at a rate faster than
> the link updates and much slower than the data rate.

That may be the case. The link estimates do change a little, and are
stable, but the timescale of those is 10 times slower than the data
rate. There could also be a bug in the acknowledgement code at L2?

>If this seems to
> be a common problem, then the forwarding engine needs to be able to
> inform the routing engine that the estimate is off ("try someone
> else!"). I think that this information mismatch is because of the
> broadcast/unicast traffic distinction; the link estimator and route
> selector doesn't use data traffic in its estimates.
>
> Thoughts?
>

A couple of directions:
 1. having more than one option to use as a next hop. This was the
original interface, getNextHops. A sensible policy, given the recent
measurements, would be: try twice on a link, if it fails, try another
neighbor, until some maximum.
 2. feedback from failed acks to the link estimator: differently from
having data packets go through the link estimator, this is similar to
Phil's early email: some sort of signal from the ForwardingEngine to
the LinkEstimator, saying: "I have just had a message transmission
fail in the link to A" and "It took me 5 retransmissions to get this
across". This could be incorporated into the math to compute link
quality, it can probably lead to a good "Expected Transmissions"
measure.

What to do for the release? The code is failing because messages are
being retried forever: this causes interference at nearby nodes and
flows, and fills up the buffers. Should we add a maximum number of
retransmissions and drop the packet?

Rodrigo
> Phil
>
>
>


More information about the net2-wg mailing list