So that it can work without the assumption that df['t'].iloc[i0]==0, such as in def linear_growth_init(df) Not a big deal however, great job to all the contributors.
I noticed that in the quantizer.py, you have a init_embed_ function which use data to init the embedding weight. In a distributed training environment, each rank has their own data, which leads to ...