Text cleaner

This module belongs to the category " Natural Language Processing" .

Description

While text cleansing might be a time-consuming NLP operation, it is also a well-known fact that it is at the core of sentiment analysis, ontology feeding and knowledge base construction.

In fact, parsing operations, lemmatization and stemming are definitely indispensable for harvesting the useful data out of the huge amount of words after web scraping, providing the right idiom translation and designing chatbots that can express themselves in meaningful sentences etc.

These are the pinpointed issues that SmartPredict intends to solve by this unique module.

The Text cleaner module is used to clean input text for NLP purposes , in English or French languages or in mixed language. Select the language used by the doc in the parameters for applying stop words removal.

Parameters

The text cleaner is a comprehensive little module which includes all typical cleansing operations on its own. We can select to apply these cleansing operations to a part of the sentence or as an option, decide to completely eliminate the occurrences of certain words.

Splitting

As options, we can split the text according to key words. The Input to clean and split could be a:.

  • dataframe

  • series

  • list of text

  • string

Cleansing operations

Among other text cleansing operations, we may choose to remove:

  • text between parentheses

  • email addresses

  • HTML

  • tags

  • text inside brackets

  • retain alphabetic words only

  • remove stop words

  • remove web URLS

Last updated