Wouldn't it be better to implement TOSH_uwait() as shown below?
-irfan
/*
* "dec Register" is emulated by "sub #1,Register", which takes 1 CPU cycle.
* Jump instruction takes 2 cycles.
*/
inline void TOSH_uwait(register uint16_t u)
{
asm volatile ( "1:\n\t"
" nop\n\t"
" dec %0\n\t"
" jne 1b"
: "+r"(u) );
}