Creating datasets to train a Language Model (LM) or Large Language Model (LLM) is normally a complex process that often involves several steps and considerations ...