Skip to content

A deep learning production hello world using Docker (+Compose).

Notifications You must be signed in to change notification settings

NaxAlpha/docnet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

docnet - Document Classification

An end-to-end implementation of deep learn pipeline. From data preparation to training to production.

Get started

Training the model

In training directory, there is a notebook which has been tested to work on colab. You will require google cloud account to cache processed data because doing OCR takes a lot of time so it is stored on google cloud storage.

Follow the notebook, it will prepare Tobacco-3482 dataset for training. Notebook will split data into training and validation dataset as follow:

Fixed seed split

After training, you will see the results like this:

epoch 4

After training, model is exported to google cloud storage. After downloading model, extract that in classifier model directory.

Using pre-trained model

If you want to try out the demo without training, you can download pre-trained model from above notebook here. Extract the model in classifier/model directory and follow the steps below.

Running an end-to-end demo

To run the application, you will need docker and docker-compose. Clone this repo in some directory and cd into that directory. Run the following command:

docker-compose up

This command will build the required containers and configure and run those containers locally. After initialization go to this address and you will see a screen like this:

screen shot

  • Click browse and select some document image
  • Click classify and you will see it added to processing list
  • After processing it will show the class and confidence

Only following document classes are supported:

Email
Form
ADVE
Report
Scientific
News
Letter
Resume
Memo
Note