[vc_row][vc_column][vc_column_text] How to transform an open-source data for NLP training