Skip to content

Codes, datasets, and explanations for some basic natural language tasks and models.

License

Notifications You must be signed in to change notification settings

ruthussanketh/natural-language-processing

Repository files navigation

Natural Language Processing

Codes, datasets, and explanations for some basic natural language tasks and models.

This repository is a set of 5 tutorials which provide a basic knowledge about NLP, and applying it to text to get results without needing a deep mathematical knowledge of the working behind the models. NLTK, Keras and sklearn are the main libraries used in the tutorials. Each folder contains the datasets and a Jupyter Notebook for that tutorial. I've also written detailed Medium articles to explain the code in the Notebooks, which are linked below.

  1. NLP Preprocessing
    Explains the basic preprocessing tasks to be performed before training almost any model. Covers stemming and lemmatization and their differences.
    Medium article

  2. Language Modeling
    Building and studying statistical language models from a corpus dataset. Unigram, bigram, trigram and fourgram models are covered.
    Medium article

  3. Classifier Models
    Building and comparing the accuracy of Naive Bayes and LSTM models on a given dataset using NLTK and Keras.
    Medium article

  4. Conditional Random Fields
    Experimenting with POS tagging, a standard sequence labeling task using CRFs with the help of the scikit-learn crfsuite wrapper.
    Medium article

  5. Word and Character Based LSTM Models
    Building and analyzing word and character based LSTM models. Two different character based models are also trained and compared.
    Medium article