Skip to content
/ ASF Public

Anti Spam Filter, a spam filter πŸ—ƒοΈ which uses a model made out of MultinomialNB algorithm πŸ‘ˆ from scikit-learn 🐍 to classify spam and complaints.

License

Notifications You must be signed in to change notification settings

greed2411/ASF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Anti Spam Filter

made-with-python Maintenance Ask Me Anything ! License GitHub issues Awesome Badges

What is ASF?

A Spam Filter built for VIT-Pugaar.

Demo

GIF

What is Pugaar?

Pugaar, is an Android and Web application dedicated for lodging complaints and petty issues related to the maintenance of VIT University, Vellore Men's Hostel. Our goal is to eradicate all the handwritten complaints which takes days to process, by automating the whole process. Once a complaint is into the system, an employee is directly informed using SMS regarding the issue and the details of the one who made the complaint.

Presenting Team members:

Android Application by:

Web Application by :

What does ASF do?

ASF by heart removes spam, i.e., selectively passes complaints dedicated for VIT University Men's Hostel. It passes complaints which fall under the category of Electrical, Toiletries, Room and Air Conditioners. Examples include door knob broken, smoke alarm is not working, bathroom tap is loose, AC not cool enough

What about the Dataset and the algorithm?

Dataset was handtyped out of the past few years records of complaints under J block.

As of the algorithm, MultinomialNB is being used here.

Technical stuff :

Why Naive Bayes?

Naive Bayes, the most core feature of it is the independence. Bayes thoerem is based on independence of the events, that means here presence of one word shouldn't affect the presence of another word. This is advantageous and disadvantageous(the reason it's being called Naive) based on how people manipulate their words. For a basic hostel complaint, a ML algorithm such as NB is enough. We could have gone with DenseNets if we had more data. But it's a shame we only had 1500 records.

About the model:

Classification accuracy:

Testing accuracy of Naive Bayes model is 99.436 % on 25% of the actual data. This shows how good is the feature engineering and how good the model is at generalzing.

Confusion matrix:

Model Spam Complaint
Spam 190 0
Complaint 2 163
  • Spam and Spam intersection talks about the True Negatives - 190,
  • Spam and Complaint intersection talks about the False Positives - 0, (The spam which got classfied as complaint)
  • Complaint and Spam intersection talks about the False negatives - 2, (The complaints which got classified as spam)
  • Complaint and Complaint intersection talks about the True Positives - 163.

For prototyping

The model was dumped into a pickle file using joblib to spam_data_pugaar.pkl and Pipeline class.

Necessary libraries :

You can try it by:

Get the repository on your local machine,

git clone https://github.com/greed2411/ASF.git

Examples and Usage :

If the model sees the statement as a complaint it returns True else, if it is a spam it returns False.

>>> from asf import check
>>> check('tubelight broken.')
True        # satisfies as a complaint
>>> check('floor mopping')
True
>>> check('send me nudes xD')
False       # doesn't satisfy as a complaint
>>> check('Bahen ke laude')
False
>>> check("ΰ€°ΰ€‚ΰ€‘ΰ₯€")
False
>>> check(None)
True

Tested and developed on :

Ubuntu 16.10 , Python 3.6.0 |Anaconda 4.3.1 (64-bit) and scikit-learn version : 0.18.1

About

Anti Spam Filter, a spam filter πŸ—ƒοΈ which uses a model made out of MultinomialNB algorithm πŸ‘ˆ from scikit-learn 🐍 to classify spam and complaints.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published