SmartPredict
  • Documentation
  • OVERVIEW
    • Presentation
      • Key features
      • Who may benefit from its use
      • The SmartPredict Graphical User Interface (GUI)
        • The SmartPredict Modules
        • Focus on the Notebooks
        • Practical shortcuts
    • Prerequisites
  • Getting started
    • Getting started (Part1) : Iris classification project
      • Project description
      • Step 1. Create the project
      • Step 2. Upload the dataset
      • Step 3. Preprocess the dataset
      • Step 4. Build the flowchart
        • Set up the flowchart
        • Configure the modules
      • Step 5. Run the build
      • Step 6. Deploy the project
      • Step 7. Make inferences with our model
      • Conclusion
  • Getting started (Part 2): Predicting the passengers' survival in the Titanic shipwreck
    • Project description
    • Step 1. Create the project
    • Step 2. Upload the dataset
    • Step 3. Preprocess the dataset
    • Step 4. Build the flowchart
    • Step 5. Run the build
    • Step 6. Deploy the pipeline
    • Step 7. Make inferences with our pipeline
  • MODULE REFERENCE
    • CORE MODULES
      • Introduction
      • Basic Operations
        • Item Saver
      • Web Services
        • Web Service IN and OUT
      • Data retrieval
        • Data fetcher
        • Data frame loader/converter
        • Image data loader
      • Data preprocessing
        • Introduction
        • Array Reshaper
        • Generic Data Preprocessor
        • Missing data handler
        • Normalizer
        • One Hot Encoder
        • Ordinal Encoder
      • Data selection
        • Features selector
        • Generic data splitter
        • Labeled data splitter
      • Training and Prediction
        • Predictor DL models
        • Predictor ML models
        • Predictor ML models (Probabilistic models)
        • Trainer ML models
        • Trainer/Evaluator DL models
      • Evaluation and fine-tuning
        • Cross Validator for ML
        • Evaluator for ML models
      • Machine Learning algorithms
        • ML modules in SmartPredict
        • Decision Tree Regressor
        • KNeighbors Classifier
        • KNeighbors Regressors
        • Linear Regressor
        • Logistic Regressor
        • MLP Regressor
        • Naive Bayes Classifier
        • Random Forest Classifier
        • Random Forest Regressor
        • Support Vector Classifier
        • Support Vector Regressor
        • XGBoost Classifier
        • XGBoost Regressor
      • Deep learning algorithms
        • Dense Neural Network
        • Recurrent Neural Networks
      • Computer Vision
        • Convolutional Recurrent Networks
        • Fully Convolutional Neural Networks
        • Face detector
        • Image IO
        • Image matcher
        • Yolo
      • Natural Language Processing
        • Introduction
        • Text cleaner
        • Text vectorizer
      • Times Series processing
        • TS features selector
      • TensorFlow API
        • LSTM Layer
        • Dense Layer
      • Helpers
        • Data/Object Logger
        • Object Selector (5 ports)
      • Conclusion
  • CUSTOM MODULES
    • Function
    • Class
    • Use cases
Powered by GitBook
On this page
  • Cleaning data with the Data Processor
  • Handling missing values
  • The Processing pipeline

Was this helpful?

  1. Getting started
  2. Getting started (Part1) : Iris classification project

Step 3. Preprocess the dataset

Preprocessing is an important stage in any data analytics process. For the purpose , SmartPredict have a powerful data processor.

PreviousStep 2. Upload the datasetNextStep 4. Build the flowchart

Last updated 5 years ago

Was this helpful?

Cleaning data with the Data Processor

Having uploaded our iris dataset, next we will be able to clean it thanks to the SmartPredict Data Processor .

  1. To access it, first , click on “Applications” in the left pane's menu.

  2. Then click on ''Dataset Processing and Visualization".

A new dashboard expands. It is very similar to the first one (seen in the dataset application) except that this time, data are filtered here by date of creation and size.

From this new dashboard, we might also add new datasets by clicking on the round yellow + button as an alternative to uploading them directly from the dataset table.

From wherever on the platform we may retrieve a dataset, clicking on the processor' s cog icon leads to the ‘Dataset Processor’’s console.

The Dataset Processor allows to sample, filter and download datasets.

Flaws in ‘dirty dataset’ are highlighted, just like here where missing items are set in red. For dealing with these data impurities, we are able to choose the kind of processing we want .

As a strategy, dropping the rows is systematic here for rows (and columns) with missing values. To do so, we just need to specify it in the processor type, then as a "Strategy" , choose Drop.

  1. Click on the + button to add a processing step.

  2. As we are asked to select a processor , scroll until 'Handle missing values'.

  3. Click on it.

We can pursue with further data cleansing /processing by clicking once more on the + button.

  1. So, click on the + sign to add processor step

  2. Scroll until 'Sort' and select it.

  3. In choosing column, select 'Variety' for it is the scrambled one.

  4. Validate.

The table shows that the processor took into account the steps we wanted it to apply as the quality of our data has obviously increased.

Now that we have obtained a clean dataset, let us just keep in mind that our main aim is to produce a processing pipeline : this is our next step from here.

.

The processing pipeline pane appears on the right side of the Dataset processor . It can fold and unfold and contains its own set of functions.

Once we have finished the cleaning operations, we are able to utilize the new dataset for our Processing pipeline , the list of which is located on the right sidebar next to the Data Processor's dashboard .

  1. Look for the exporting icon (3rd icon from the left ) on the menu below the right pane . The tooltip shows its label'' Export processing pipeline to SmartPredict" .

  2. Click on it.

  3. Then, click on "Export".

We are allowed to choose which processing steps we want to include in a processing pipeline. If we do not want some steps to be included, we can intentionally delete them on their own after having ticked the check-box placed next to them .

Now let us get back to our workspace, enter through the flowchart icon on the left to switch to project view. From the right sidebar, as we click on the third button named ‘Processing pipelines’, we see the pipeline we have just configured before displaying there.

We might need to note the name under which we exported the pipeline we want to use in the build, to avoid ending up using another later as the number of builds grows.

Just like any other module, we can drag and drop it into the layout now, in order to attach it to the flowchart .

Handling missing values

Let us also, for instance, initiate the sorting process . We shall recapitulate the same steps as in 'handling missing values', except that for this time, we shall choose 'Sort' instead.

The Processing pipeline

Exporting the processing pipeline into SmartPredict

Finding the processing pipeline in the right sidebar

The processing pipeline module.
⚙️
🃏
⚙️
⬆️
💡
⚙️
Dataset Processing and Visualization are located in the Application menu.
Uploading datasets from the Dataprocessor is possible.
The Dataset Processor handles missing values well.
To pursue further cleansing, we can initiate a sorting process.
Processing pipeline can be exported through the export function.