Skip to content

Text Augmentation for Machine Learning tasks. Small data: How to grow your text dataset for classification ?

License

Notifications You must be signed in to change notification settings

Opla/SmallData-Augmentation-MachineLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text augmentation for Machine Learning tasks:

How to grow your text dataset for classification?

This is the implementation of some of data augmentation techniques we explained in our blog article https://medium.com/opla/text-augmentation-for-machine-learning-tasks-how-to-grow-your-text-dataset-for-classification-38a9a207f88d

Text augmentation is a technique used when the training data is not enough to achieve accurate performance on machine learning tasks. The goal here is to extend the text data by creating new similar content.

words_shuffling.py allows to dirsupt the order of a sentence words to create a new senteence.

word_replacement.py allows to replace some words of each sentence with a similar word, a synonym in this case.

Synonyms are listed in ppdb-xl.txt extracted from http://paraphrase.org/#/ .

Requirements:

  • NLTK

About

Text Augmentation for Machine Learning tasks. Small data: How to grow your text dataset for classification ?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages