Underthesea - Vietnamese NLP Toolkit
-
Updated
Jun 9, 2024 - Python
Underthesea - Vietnamese NLP Toolkit
Solves basic Russian NLP tasks, API for lower level Natasha projects
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
A Vietnamese natural language processing toolkit (NAACL 2018)
Code for Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
Bitextor generates translation memories from multilingual websites
Rule-based token, sentence segmentation for Russian language
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
CKIP CoreNLP Toolkits
A toolkit for discourse segmentation (EDU segmentation).
🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec
Port of PragmaticSegmenter for sentence boundary detection
A sentence segmentation library with wide language support optimized for speed and utility.
NLP tools, word segmentation, sentence segmentation, New-Word-Discovery,新词发现
A flexible sentence segmentation library using CRF model and regex rules
Deep neural approach to Boundary and Disfluency Detection - Based on my Master's work
Pre-trained models for tokenization, sentence segmentation and so on
Sentence Segmentation for Spacy
A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.
HTML2SENT modifies HTML to improve sentences tokenizer quality
Add a description, image, and links to the sentence-segmentation topic page so that developers can more easily learn about it.
To associate your repository with the sentence-segmentation topic, visit your repo's landing page and select "manage topics."