SmartPredict
  • Documentation
  • OVERVIEW
    • Presentation
      • Key features
      • Who may benefit from its use
      • The SmartPredict Graphical User Interface (GUI)
        • The SmartPredict Modules
        • Focus on the Notebooks
        • Practical shortcuts
    • Prerequisites
  • Getting started
    • Getting started (Part1) : Iris classification project
      • Project description
      • Step 1. Create the project
      • Step 2. Upload the dataset
      • Step 3. Preprocess the dataset
      • Step 4. Build the flowchart
        • Set up the flowchart
        • Configure the modules
      • Step 5. Run the build
      • Step 6. Deploy the project
      • Step 7. Make inferences with our model
      • Conclusion
  • Getting started (Part 2): Predicting the passengers' survival in the Titanic shipwreck
    • Project description
    • Step 1. Create the project
    • Step 2. Upload the dataset
    • Step 3. Preprocess the dataset
    • Step 4. Build the flowchart
    • Step 5. Run the build
    • Step 6. Deploy the pipeline
    • Step 7. Make inferences with our pipeline
  • MODULE REFERENCE
    • CORE MODULES
      • Introduction
      • Basic Operations
        • Item Saver
      • Web Services
        • Web Service IN and OUT
      • Data retrieval
        • Data fetcher
        • Data frame loader/converter
        • Image data loader
      • Data preprocessing
        • Introduction
        • Array Reshaper
        • Generic Data Preprocessor
        • Missing data handler
        • Normalizer
        • One Hot Encoder
        • Ordinal Encoder
      • Data selection
        • Features selector
        • Generic data splitter
        • Labeled data splitter
      • Training and Prediction
        • Predictor DL models
        • Predictor ML models
        • Predictor ML models (Probabilistic models)
        • Trainer ML models
        • Trainer/Evaluator DL models
      • Evaluation and fine-tuning
        • Cross Validator for ML
        • Evaluator for ML models
      • Machine Learning algorithms
        • ML modules in SmartPredict
        • Decision Tree Regressor
        • KNeighbors Classifier
        • KNeighbors Regressors
        • Linear Regressor
        • Logistic Regressor
        • MLP Regressor
        • Naive Bayes Classifier
        • Random Forest Classifier
        • Random Forest Regressor
        • Support Vector Classifier
        • Support Vector Regressor
        • XGBoost Classifier
        • XGBoost Regressor
      • Deep learning algorithms
        • Dense Neural Network
        • Recurrent Neural Networks
      • Computer Vision
        • Convolutional Recurrent Networks
        • Fully Convolutional Neural Networks
        • Face detector
        • Image IO
        • Image matcher
        • Yolo
      • Natural Language Processing
        • Introduction
        • Text cleaner
        • Text vectorizer
      • Times Series processing
        • TS features selector
      • TensorFlow API
        • LSTM Layer
        • Dense Layer
      • Helpers
        • Data/Object Logger
        • Object Selector (5 ports)
      • Conclusion
  • CUSTOM MODULES
    • Function
    • Class
    • Use cases
Powered by GitBook
On this page
  • Description
  • Parameters
  • Cleansing operations

Was this helpful?

  1. MODULE REFERENCE
  2. CORE MODULES
  3. Natural Language Processing

Text cleaner

This module belongs to the category " Natural Language Processing" .

PreviousIntroductionNextText vectorizer

Last updated 5 years ago

Was this helpful?

Description

While text cleansing might be a time-consuming NLP operation, it is also a well-known fact that it is at the core of sentiment analysis, ontology feeding and knowledge base construction.

In fact, parsing operations, lemmatization and stemming are definitely indispensable for harvesting the useful data out of the huge amount of words after web scraping, providing the right idiom translation and designing chatbots that can express themselves in meaningful sentences etc.

These are the pinpointed issues that SmartPredict intends to solve by this unique module.

The Text cleaner module is used to clean input text for NLP purposes , in English or French languages or in mixed language. Select the language used by the doc in the parameters for applying stop words removal.

Parameters

The text cleaner is a comprehensive little module which includes all typical cleansing operations on its own. We can select to apply these cleansing operations to a part of the sentence or as an option, decide to completely eliminate the occurrences of certain words.

Splitting

As options, we can split the text according to key words. The Input to clean and split could be a:.

  • dataframe

  • series

  • list of text

  • string

Cleansing operations

Among other text cleansing operations, we may choose to remove:

  • text between parentheses

  • email addresses

  • HTML

  • tags

  • text inside brackets

  • retain alphabetic words only

  • remove stop words

  • remove web URLS

As an input, the text cleaner receives several kinds of data .
The text cleaner module is a comprehensive handy NLP toolset.
The text cleaner module offers all the classic functions for text cleansing.