Dear admins, It seems that the validation sets (and the test set on the dry run) were sampled so that all categories are ensured to have at least one document. Accordingly, validation and test are not really i.i.d. (small classes are over-represented and medium-to-large classes are under represented). The question is: Can we rely on the fact that the "large" data test set was sampled similarly and therefore correct for that in the predictions? Just need to make sure that whatever we do on the dry-run will carry over to the large data. Thanks in advance.
Answer
All the sets are sampled so that all categories are ensured to have at least one document.