Step 4. Build the flowchart

This section is about how to build or flowchart our Titanic project workflow .

Building a flowchart with SmartPredict is just as easy as transporting modules into the workspace in drag and drop mode.

As our flowchart is borrowed from a classification template, all that is left to do is configure the modules' parameters to meet our specific needs.

It is deemed useful to remind that the flowchart we are going to build represents the ML pipeline from the processing steps to the model for initiating the prediction.

The default build flowchart is composed of the following elements:

  • the Dataframe Loader

  • the Features Selector

  • the ML trainer

  • the Item saver

  • the ML evaluator

  • the Labeled Data splitter

  • the Data Object logger

  • and the Support Vector Classifier

We furthermore need to add :

  • an Ordinal Encoder in order to correctly handle the integer type of data.

  • [a processing pipeline + the original dirty dataset ] OR [a dataframe loader and a clean dataset]

The dataframe loader is useful for loading the dataframe provided by a clean dataset. This latter comes from the train dataset we initially had after some cleansing with the data processor .

Within the fields 'Columns to keep' and 'Columns to drop' enter the corresponding information .

The second flowchart shown above is another option for structuring our workflow , this time with a processing pipeline. It is what we are going to use for all the next steps.

To obtain the required configuration starting from the default flowchart, we furthermore need to add :

  1. the processing pipeline

  2. the unprocessed dataset

  3. an ordinal encoder

For our project , as we already have a processing pipeline, we are allowed to dispose of the Dataframe loader . To do so, select the Dataframe loader module in the flowchart and then delete it by clicking on the dot menu , then on delete. Click on OK.

Otherwise, by choosing to keep it , we shall need to save the processed dataset as a new dataset and place this latter within the flowchart right on top of the Dataframe loader.

Then, configure the settings and load parameters such as dataframes and features from the new processed dataset into the Dataframe loader.

In either ways , notice that both flowchart configurations need an Ordinal Encoder, else the model will signal an error.

Common configurations

The Features selector is part of the Core modules. It is located under the sub-tab of Data Selection modules. To configure it , select the features from the dataset inside the drag and drop area and select 'Survived' as a label.

Ordinal encoding deals with categorical data just like what we have here. You might already be familiar with the Ordinal encoding function. However, if you feel the need for more information , check its official documentation.

The Ordinal Encoder module 's configuration is shown below:

Last updated