[Tinyos-devel] retransmissions in networking protocols

Philip Levis pal at cs.stanford.edu
Fri Oct 5 17:11:45 PDT 2007


On Oct 5, 2007, at 4:32 PM, Omprakash Gnawali wrote:

>
>> Om -- we have some detailed MLQI traces, right? Did we log duplicate
>> suppression in them? Or, at the very least, we can definitely measure
>> the duplicate delivery rate at the root, right? It would be good to
>> know how bad the problem is.
>>
>> Phil
>
> This is data from Mirage, the sixth column tells you the number of
> duplicate packets received from that origin and the sevent column
> tells you what fraction of the total pkts received from the origin
> were duplicates:
>
> #node total_sent uniq_rcv success_rate   total_rcv repeated_rcv  
> (frac), minseq, maxseq
> 3 369 369 1.000 370 1 0.003, 1, 369
> 4 373 373 1.000 375 2 0.005, 1, 373
> 5 368 367 0.997 381 14 0.037, 2, 369
> 6 374 374 1.000 374 0 0.000, 1, 374
> 9 368 368 1.000 369 1 0.003, 1, 368
> 10 364 364 1.000 366 2 0.005, 1, 364
> 13 366 366 1.000 367 1 0.003, 1, 366
> 16 365 365 1.000 365 0 0.000, 2, 366
> 17 372 372 1.000 372 0 0.000, 1, 372
> 18 369 369 1.000 370 1 0.003, 2, 370
> 19 352 232 0.659 233 1 0.004, 3, 354
> 20 372 372 1.000 372 0 0.000, 2, 373
> 21 372 294 0.790 296 2 0.007, 1, 372
> 22 369 369 1.000 370 1 0.003, 1, 369
> 24 367 358 0.975 364 6 0.016, 1, 367
> 26 372 371 0.997 372 1 0.003, 1, 372
> 27 362 342 0.945 344 2 0.006, 1, 362
> 28 362 351 0.970 352 1 0.003, 1, 362
> 30 369 369 1.000 369 0 0.000, 1, 369
> 32 368 366 0.995 367 1 0.003, 1, 368
> 33 375 375 1.000 375 0 0.000, 2, 376
> 36 364 364 1.000 365 1 0.003, 2, 365
> 37 360 360 1.000 360 0 0.000, 2, 361
> 39 357 357 1.000 357 0 0.000, 2, 358
> 40 358 358 1.000 359 1 0.003, 1, 358
> 41 368 336 0.913 339 3 0.009, 2, 369
> 43 371 371 1.000 375 4 0.011, 2, 372
> 44 372 371 0.997 374 3 0.008, 1, 372
> 47 365 362 0.992 362 0 0.000, 2, 366
> 48 359 359 1.000 361 2 0.006, 2, 360
> 49 373 263 0.705 266 3 0.011, 2, 374
> 50 371 370 0.997 371 1 0.003, 2, 372
> 52 365 356 0.975 357 1 0.003, 3, 367
> 53 357 282 0.790 282 0 0.000, 2, 358
> 55 359 359 1.000 360 1 0.003, 2, 360
> 56 358 229 0.640 230 1 0.004, 3, 360
> 57 367 367 1.000 368 1 0.003, 2, 368
> 58 364 364 1.000 365 1 0.003, 2, 365
> 59 364 349 0.959 351 2 0.006, 3, 366
> 60 366 366 1.000 367 1 0.003, 2, 367
> 61 362 362 1.000 385 23 0.060, 1, 362
> 62 369 369 1.000 369 0 0.000, 1, 369
> 66 360 360 1.000 364 4 0.011, 1, 360
> 68 370 370 1.000 371 1 0.003, 1, 370
> 69 367 367 1.000 368 1 0.003, 2, 368
> 70 370 370 1.000 373 3 0.008, 2, 371
> 71 365 364 0.997 365 1 0.003, 1, 365
> 77 364 363 0.997 368 5 0.014, 1, 364
> 78 365 365 1.000 366 1 0.003, 2, 366
> 82 365 259 0.710 262 3 0.011, 3, 367
> 85 367 352 0.959 353 1 0.003, 3, 369
> 86 373 361 0.968 362 1 0.003, 2, 374
> 89 370 370 1.000 371 1 0.003, 2, 371
> 92 362 362 1.000 365 3 0.008, 3, 364
> 95 379 379 1.000 381 2 0.005, 2, 380
> 96 365 365 1.000 366 1 0.003, 3, 367
> 100 367 367 1.000 368 1 0.003, 2, 368
> 102 368 368 1.000 371 3 0.008, 1, 368
> 104 375 375 1.000 379 4 0.011, 1, 375
> 105 372 372 1.000 373 1 0.003, 2, 373
> 107 363 357 0.983 359 2 0.006, 2, 364
> 109 371 370 0.997 371 1 0.003, 2, 372
> 112 368 366 0.995 367 1 0.003, 2, 369
> 115 371 370 0.997 375 5 0.013, 2, 372
> 116 369 368 0.997 372 4 0.011, 2, 370
> 117 372 369 0.992 369 0 0.000, 2, 373
> 119 360 352 0.978 352 0 0.000, 3, 362
> 120 375 374 0.997 376 2 0.005, 2, 376
> 122 368 352 0.957 353 1 0.003, 2, 369
> 123 355 355 1.000 357 2 0.006, 2, 356
> 124 368 368 1.000 372 4 0.011, 2, 369
> 125 360 360 1.000 362 2 0.006, 2, 361
> 130 371 371 1.000 373 2 0.005, 2, 372
> 131 367 255 0.695 255 0 0.000, 2, 368
> 132 364 339 0.931 339 0 0.000, 3, 366
> 133 360 343 0.953 343 0 0.000, 3, 362
> 137 378 370 0.979 371 1 0.003, 2, 379
> 138 373 351 0.941 352 1 0.003, 2, 374
> 140 366 348 0.951 351 3 0.009, 3, 368
> 141 363 350 0.964 352 2 0.006, 2, 364
> 142 377 313 0.830 314 1 0.003, 2, 378
> 143 373 347 0.930 349 2 0.006, 2, 374
> 145 376 346 0.920 347 1 0.003, 2, 377
> 146 364 243 0.668 244 1 0.004, 2, 365
> 148 369 344 0.932 344 0 0.000, 2, 370
> #delivery 0.959233382475482
> #throughtput 8.35853449121161
> #time 3600.033
>
> There were a total of 161 duplicate packets received by the root out
> of total 30091 pkts received.
>
> Looking at the duplicates in the network, lqi detected and suppressed
> 1943 duplicates out of total 76606 message receptions in the network
> (includes root). So, it seems about 2.5% packets are duplicates. Here
> are the number of duplicate events logged by the nodes:
>
> node id #duplicates dropped
> 68 806
> 78 314
> 6 240
> 18 185
> 26 147
> 89 45
> 112 43
> 71 23
> 102 23
> 22 21
> 37 19
> 130 10
> 50 9
> 123 8
> 120 7
> 33 6
> 125 5
> 104 5
> 96 3
> 48 3
> 133 3
> 95 2
> 5 2
> 39 2
> 115 2
> 107 2
> 70 1
> 61 1
> 57 1
> 36 1
> 3 1
> 24 1
> 16 1
> 10 1
>
>
> Then the question is if any node missed some duplicates due to cache
> size limitation. I wrote a script that checked to see if the same
> <origin, seq> were transmitted (FW_FWD_MSG) by a node within a 100-pkt
> window, and there was not a single instance of missed duplicate due to
> cache size limitation.

Om: This is useful data. It looks like MLQI has a lower duplication  
rate (0.5%) than CTP (1%), which makes sense given its slower route  
adaptation. However, some nodes observe up to 3.7% duplication. My  
guess is that these are nodes whose next hop has heavy load so a  
larger chance of cache flush?

Andreas: One thing about source-based suppression is that it requires  
an additional protocol field. Specifically, you need to be able to  
distinguish packets that are duplicates caused by retransmissions and  
those that are duplicates caused by routing loops. I.e., if you can  
quickly repair routing loops, then it can be OK to let a packet loop  
around a few times. In fact, you can take advantage of that data  
packet to detect and repair the loops. Joe Polastre argued for this  
approach pretty strongly in some early net2 meetings, and our  
experimental results back up his intuition.

In the case of CTP, this field is called THL (time has lived), and it  
increments on each hop. You only suppress a packet if the source ID,  
source sequence number, *and* THL match. Otherwise, it could be a  
looping packet. I can't recall the exact numbers, but dropping looped  
packets definitely prevents you from getting 3 nines of reliability,  
and it might prevent 2 nines.

The reason why CTP is somewhat vulnerable to this is that it can  
switch routes really quickly. MLQI has a slower adaptation rate due  
to its 30s beacons. If you're seeing lots of duplicates, then I think  
it has to do with how you are emulating LQI. Don't forget that LQI on  
the CC2420, being a soft chip decision indicator is (very roughly)  
low-order bits of SNR when you are close to the SNR reception  
threshold, and is pegged when you have a high SNR. If you are using a  
more linear measure (e.g., direct SNR), then you are going to see a  
lot more variation than LQI for the CC2420 does, which will cause a  
much higher rate of route switchovers. One of the motivations for the  
white bit came from the observation that MLQI's scaling function  
essentially means that 95% of the links it uses are near-perfect  
ones: so why not just encode this information as a single bit?

Phil




More information about the Tinyos-devel mailing list