Anti Spam Filter

What is ASF?

A Spam Filter built for VIT-Pugaar.

Demo

What is Pugaar?

Pugaar, is an Android and Web application dedicated for lodging complaints and petty issues related to the maintenance of VIT University, Vellore Men's Hostel. Our goal is to eradicate all the handwritten complaints which takes days to process, by automating the whole process. Once a complaint is into the system, an employee is directly informed using SMS regarding the issue and the details of the one who made the complaint.

Presenting Team members:

Android Application by:

@MINOSai

Web Application by :

@MINOSai: made it with Vue.js
@bhaveshpraveen: made the REST API using Django.
@greed2411: made the Spam filter with custom dataset and scikit-learn.

What does ASF do?

ASF by heart removes spam, i.e., selectively passes complaints dedicated for VIT University Men's Hostel. It passes complaints which fall under the category of Electrical, Toiletries, Room and Air Conditioners. Examples include door knob broken, smoke alarm is not working, bathroom tap is loose, AC not cool enough

What about the Dataset and the algorithm?

Dataset was handtyped out of the past few years records of complaints under J block.

As of the algorithm, MultinomialNB is being used here.

Technical stuff :

Why `Naive Bayes`?

Naive Bayes, the most core feature of it is the independence. Bayes thoerem is based on independence of the events, that means here presence of one word shouldn't affect the presence of another word. This is advantageous and disadvantageous(the reason it's being called Naive) based on how people manipulate their words. For a basic hostel complaint, a ML algorithm such as NB is enough. We could have gone with DenseNets if we had more data. But it's a shame we only had 1500 records.

About the model:

Classification accuracy:

Testing accuracy of Naive Bayes model is 99.436 % on 25% of the actual data. This shows how good is the feature engineering and how good the model is at generalzing.

Confusion matrix:

Model	Spam	Complaint
Spam	190	0
Complaint	2	163

Spam and Spam intersection talks about the True Negatives - 190,
Spam and Complaint intersection talks about the False Positives - 0, (The spam which got classfied as complaint)
Complaint and Spam intersection talks about the False negatives - 2, (The complaints which got classified as spam)
Complaint and Complaint intersection talks about the True Positives - 163.

For prototyping

The model was dumped into a pickle file using joblib to spam_data_pugaar.pkl and Pipeline class.

Necessary libraries :

pandas - For reading the data and manipulation.
scikit-learn - For preprocessing, feature engineering and run MultinomialNB

You can try it by:

Get the repository on your local machine,

git clone https://github.com/greed2411/ASF.git

Examples and Usage :

If the model sees the statement as a complaint it returns True else, if it is a spam it returns False.

>>> from asf import check
>>> check('tubelight broken.')
True        # satisfies as a complaint
>>> check('floor mopping')
True
>>> check('send me nudes xD')
False       # doesn't satisfy as a complaint
>>> check('Bahen ke laude')
False
>>> check("रंडी")
False
>>> check(None)
True

Tested and developed on :

Ubuntu 16.10 , Python 3.6.0 |Anaconda 4.3.1 (64-bit) and scikit-learn version : 0.18.1

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
asf.py		asf.py
check-demo.py		check-demo.py
complaints.tsv		complaints.tsv
preprocessing.ipynb		preprocessing.ipynb
spamOriginal.gif		spamOriginal.gif
spam_data_pugaar.pkl		spam_data_pugaar.pkl

License

greed2411/ASF

Folders and files

Latest commit

History

Repository files navigation

Anti Spam Filter

What is ASF?

Demo

What is Pugaar?

Presenting Team members:

What does ASF do?

What about the Dataset and the algorithm?

Technical stuff :

Why Naive Bayes?

About the model:

Classification accuracy:

Confusion matrix:

For prototyping

Necessary libraries :

You can try it by:

Examples and Usage :

Tested and developed on :

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Why `Naive Bayes`?