Skip to content

DARIAH-ERIC/DESIR-CodeSprint-TrackB-BibliographicMetadata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF → GROBID = bibliographic metadata → BibSonomy

This tool allows to extract bibliographical metadata from PDF files using GROBID and to store it in BibSonomy. The tool was developed during the DESIR Code Sprint in Berlin (31.7.-2.8.2018).

Online Version

There is an online version of this tool available here: http://track-b.desir.dariah.eu/

Installation and Setup

The tool consists of a Java-based backend (server) and a Node.js-based frontend (client).

Frontend Setup

Manually with npm

# install dependencies
npm install

# serve dev version with hot reload at localhost:8080
npm run dev

# build for production with minification
npm run build

# build for production and view the bundle analyzer report
npm run build --report

Via asdf

First install asdf, see installation

# install asdf plugin for nodejs
asdf plugin-add asdf-vm/asdf-nodejs
# build for production with minification
npm run build

Backend Setup

GROBID can be used with a local installation or using the REST-based web api.

When using GROBID locally (by default it uses a public API)

For a local installation the GROBID model files must be downloaded (e.g., https://dl.bintray.com/rookies/maven/org/grobid/grobid-home/0.5.5/grobid-home-0.5.5.zip) and placed into an appropriate folder which is configured via the option grobid.home.path in application.properties.

Copy the file install-files/application.properties into your application root and set the correct paths and keys:

grobid.home.path=/Users/YourUserName/Work/Grobid/grobid-home/

Starting

To start the application use

mvn spring-boot:run 

Or (if you want to use your local installation of GROBID):

mvn -Dspring.config.location=file:/....../DESIR-CodeSprint/trackB/backend/application.properties spring-boot:run

where you replace file:/....../DESIR-CodeSprint/trackB/backend/application.properties with the path to your local configuration file.

Usage (as a service)

Extra configuration file

Install folder example: /opt/trackB/

Make a copy of the configuration template install-files/trackB.conf and add it to the install folder. This is in order to let the init.d script use extra property files for your server.

Create executable

In the build folder:

mvn clean package

Or (if you want to use your local installation of GROBID):

mvn -Dspring.config.location=file:/....../DESIR-CodeSprint/trackB/backend/application.properties clean package

Copy the executable (.jar) to the installation folder.

Centos 7

Create a symbolic link (ln -s) from /opt/trackB/trackB.jar to /etc/init.d/trackB to be able to launch the tool as a service (usable for CentOS 6.x servers for example).

Ubuntu 18.04

Set the owner of the files (for simplicity, I use the same user as for apache2 on Ubuntu): sudo chown -R www-data:www-data /opt/trackB Make a copy of the configuration file install-files/trackB.service to /etc/systemd/system/. Of course, change the path to the jar file and the correct user to launch the command.

Start the service

service trackB start

service httpd restart

The server should now listen on the port 8080 by default:

http://localhost:8080/trackB/

Redirect from Apache HTTPD to our own Service

Here is an example of a conf file for Apache httpd using SSL and redirection from the port 443 (SSL) to our application running on port 8080. The port 80 is also redirected to 443 and therefore to 8443 when used. (Example using a server: trackB.dariah.eu)

NameVirtualHost *:80
NameVirtualHost *:443

<VirtualHost *:443>
    SSLEngine on
    SSLProxyEngine On
    SSLCertificateFile /etc/letsencrypt/live/trackB.dariah.eu/cert.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/trackB.dariah.eu/privkey.pem
    SSLCertificateChainFile /etc/letsencrypt/live/trackB.dariah.eu/chain.pem
    ServerName https://trackB.dariah.eu/
    Redirect / https://trackB.dariah.eu/trackB/
    ProxyPass /trackB/ http://localhost:8080/trackB/
    ProxyPassReverse /trackB/ http://localhost:8080/trackB/
</VirtualHost>
<VirtualHost *:80>
    ServerName http://trackB.dariah.eu/
    DocumentRoot /var/www/
    ErrorLog /var/log/httpd/trackB_error_log
    CustomLog /var/log/httpd/trackB_access_log combined
    Redirect / https://trackB.dariah.eu/
</VirtualHost>

Further Information

BibSonomy

BibSonomy is a social bookmarking system that helps you to organize your scientific work. Use BibSonomy to collect publications and bookmarks, to collaborate with your colleagues, and to discover interesting researches for your daily work.

You can get your BibSonomy API key from the settings page. Do not put your API key into a public repository.

GROBID

GROBID is a machine learning library for extracting, parsing and re-structuring raw documents, such as PDF documents, into structured TEI-encoded ones.

Contributing

Contributions are welcome! Just fork and send your pull requests.

Credits

Created at the DESIR CodeSprint by yoannspace, rjoberon, ChristophHubeL3S, ctot-nondef, and schmima. See contributors.

License

This project is licensed under the Apache License 2.0 - see the LICENSE.md file for details.