Pravin Regismond

Data Engineer | Senior DBA
Transforming Complex Data into Actionable Insights

Hi, I’m Pravin Regismond. I work with data—processing it, storing it, and making it useful. I started my career in database administration and spent over two decades managing and optimizing systems. More recently, I’ve been focusing on data engineering, working with ETL pipelines, cloud platforms, and real-time data processing.

Feel free to check out my projects here or connect with me below.

Portfolio

Visualizing Car Sales and Dealer Profits Using Snowflake Snowsight

Nov 2024

This project aimed to analyze car sales and dealer profits for SwiftAuto Traders by creating visualizations and presenting them as dashboards. The approach involved using Snowflake’s Snowsight to create and analyze business intelligence (BI) dashboards. Additionally, a Streamlit app was provided to produce the same visualizations using Streamlit-in-Snowflake (SiS).

Objectives:

Impact:

Skills: Python (Programming Language) · Data Engineering · Snowflake · Data Visualization · Security · Streamlit · Problem Solving · Data Warehousing · Business Intelligence

Sales Dashboard - Snowsight Service Dashboard - Snowsight Sales Dashboard - Streamlit Service Dashboard - Streamlit

Predicting Food Truck Locations Using Snowpark ML and XGBoost

Sep 2024

This project aimed to predict the locations of the Freezing Point food truck by analyzing historical location data and creating a predictive model. The approach involved using Snowflake’s Snowpark ML and XGBoost to develop and evaluate the model. Additionally, the project included creating a complete end-to-end workflow for data processing and model training.

Objectives:

Impact:

Skills: Python (Programming Language) · Machine Learning · Snowflake · Data Engineering · Data Visualization · XGBoost · Problem Solving · Data Warehousing · Business Intelligence

Create Dataset Upload Dataset Train XGBoost Model Freezing Point Model

Diabetes Prediction Using PySpark MLlib

Aug 2024

This project sought to build a logistic regression classifier using the PySpark Machine learning library (MLLIB) and Python to classify between diabetic and non-diabetic patients. My approach was to build a machine learning model to accurately predict whether the patient possesses diabetes or not.

Objectives:

Impact:

Skills: Python (Programming Language) · Data Engineering · Apache Spark ML · PySpark · Machine Learning · Data Science · Problem Solving · Apache Spark

Waste Management Data Warehouse using PostgreSQL and Cognos Analytics

Jun 2024

This project involved indentifying patterns in volume and location of waste collection across Brazil. My approach was to design a data warehouse and subsequent visual representation of the waste collected by truck type, city, station ID and month.

Objectives:

Impact:

Skills: Data Modeling · Data Engineering · Data Visualization · Problem Solving · PostgreSQL · Data Warehousing · IBM Cognos Analytics

Design MyDimDate Load DimDate Grouping Sets Create MQT Solid Waste Dashboard

Traffic Flow Optimization with Airflow and Kafka

Apr 2024

This project sought to improve traffic flow on national highways by analyzing road traffic data from various toll plazas. My approach was to consolidate the disparate data from different toll operators and IT systems into a single file and then create a data pipeline to continue collecting the streaming data into a database for future analysis. During the process, I encountered carriage return characters (^M) and provided two potential solutions.

Objectives:

Impact:

Skills: Extract, Transform, Load (ETL) · Python (Programming Language) · Apache Airflow · Data Engineering · Bash · MySQL · Problem Solving · Apache Kafka · Shell Script

Acquiring and Processing Information on the World’s Largest Banks

Mar 2024

This project required the creation of a database wherein managers from London, Berlin and New Delhi could query the top 10 largest banks by market capitalization in their local currency. My approach was to compile the list of the top 10 largest banks ranked by market capitalization in billion USD and then transform and store it in USD, GBP, EUR and INR based on the provided exchange rate.

Objectives:

Impact:

Skills: Extract, Transform, Load (ETL) · Python (Programming Language) · Beautiful Soup · Data Engineering · Pandas · Web Scraping · Problem Solving · SQLite

Build a Machine Learning Pipeline for Airfoil Noise Prediction

Feb 2024

This project aimed to identify the optimal angle of attack and flow direction for airfoil noise reduction. My approach was to Extract, Transform, Load (ETL) and construct ML pipelines on data from a series of aerodynamic and acoustic tests of airfoil blade sections conducted in an anechoic wind tunnel.

Objectives:

Impact:

Skills: Extract, Transform, Load (ETL) · Python (Programming Language) · Data Engineering · Apache Spark ML · PySpark · Problem Solving · Apache Spark

Data Analysis using Spark

Jan 2024

The project required the creation of a robust data pipeline capable of ingesting employee data in CSV format. For this I analyzed the data, implemented necessary transformations, and enabled the extraction of valuable insights from the processed data.

Objectives:

Impact:

Skills: Data Engineering · Problem Solving

Working with NoSQL Databases

Dec 2023

This project tasked me with providing analysts with usable data. My approach was to move data from external sources into various databases, transfer data between different types of databases, and execute basic queries across various databases.

Objectives:

Impact:

Skills: MongoDB · Data Engineering · IBM Cloudant · Problem Solving · Cassandra