The train executable seems to be allowed to use "validation.txt". So I do not see how the rules can prevent one from using validation.txt directly for training. I do not see how rules can differentiate between examples used for model selection and actual training. This is really problematic for Task2, where at this stage, the smart approach is to simply use validation.txt to train on content only, and not bother about train.txt. PS: this is how you describe the deliverable, could we avoid using validation.txt? train [filename1] [filename2] [filename3] Read [filename1] which contains the training vectors (training.txt) and [filename2] which contains the class description vectors (classDescr.txt). If no class descriptions are available for the task (as in Task 1) this argument should have the value “-none”. Finally read [filename3] which contains the validation vectors (validation.txt).
I agree
I would like to mention that the current leading entry for task-1 (my entry) utilizes validation data for training. Validation data is also training data and I did not want to waste it :)
However, I do agree that using validation data for Task-2 will defeat the purpose of this task. Maybe, the rules should be modified to prohibit using validation data in this task?
Changes in the datasets about Task 2 and 3
You are right about that using validation data in its current form for Task-2 would defeat the purpose of this task.
So we have changed the validation data for task 2 and 3 in order to be like the training files instead of the test files.
Please download the new datasets for Task-2 and Task-3. Changes have been made in the mapping of the files of these tasks in order to prohibit someone from using the old files of these tasks.
We apologize for the changes but if we didn't make them we would ruin the purpose of the task as you mentioned.
Validation data Clarifications
I thank you very much for pointing out the problem regarding task2 and I apologize for any inconvenience.