K Santosh Kumar

Email

grey.ethics@gmail.com

Website

http://www.ksk-ventures.com

Phone

(+91) 973-973-4323

Location

Bangalore, India

About Myself

Full Stack Developer | AI & ML Engineer | Data Scientist | Cloud Architect

I’m a dedicated and passionate tech enthusiast skilled in Full Stack Development, AI & ML Engineering, Data Science and Cloud Computing.
With 9+ years of experience in the field of computer science, I am skilled with a variety of tools and technologies.

My journey in the world of technology has been an exciting one, fueled by a relentless curiosity and a drive to make a positive impact.
Whether it’s diving deep into coding or leading a teams towards strategic business goals, I’m always eager to learn and grow.

I am currently seeking Full Stack AI/ML Software Development Projects where I can leverage my expertise to drive innovative solutions.

Educational Qualification

Pursuing Applied Generative AI Specialization – January 2025

Purdue University

Post Graduation in Artificial Intelligence & Machine Learning – April 2024

California Institute Of Technology

Bachelor of Science – Computer Science – June 2009

Indian Institute of Management & Technical Education

Professional Certifications

Notable Projects

AI & ML Projects

Project 4 – Developed a Face Detection Model using Python, TensorFlow, OpenCV and Albumentations

Description

  • Developed a custom face detection model using Python, TensorFlow, OpenCV and Albumentations for image augmentation.
  • Implemented an end-to-end pipeline for collecting the training pictures, annotating them, preprocessing them, training the face detection model, and finally deploying the face detection model.

Result Preview

Steps in Model Development:

  1. Setup and Data Collection:
    • Installed necessary dependencies (labelme, tensorflow, opencv-python, etc.).
    • Collected images using OpenCV and annotated them with LabelMe.
  2. Data Processing and Visualization:
    • Loaded images into the TensorFlow data pipeline and visualized them using Matplotlib.
  3. Data Partitioning:
    • Manually split the data into training, testing, and validation sets.
  4. Image Augmentation:
    • Applied image augmentation techniques using Albumentations library to increase dataset variability.
  5. Building and Running Augmentation Pipeline:
    • Augmented images and labels for training, testing, and validation sets.
  6. Preparation of Labels:
    • Loaded and processed labels into TensorFlow datasets.
  7. Combining Image and Label Samples:
    • Combined image and label samples for training, testing, and validation datasets.
  8. Building Deep Learning Model:
    • Utilized the Functional API to build a custom face detection model.
    • Incorporated the VGG16 architecture for feature extraction.
  9. Losses and Optimizers:
    • Defined classification and localization losses.
    • Utilized the Adam optimizer with a customized learning rate.
  10. Training the Neural Network:
    • Implemented custom training and testing steps.
    • Trained the model for multiple epochs and monitored performance using TensorBoard.
  11. Model Evaluation and Deployment:
    • Made predictions on the test set and evaluated model performance.
    • Saved the trained model for future use.
    • Implemented real-time face detection using OpenCV.

Please click the button to download the .ipynb or .py Code File of this Model.

Project 3 – Developed an Open Source NLP-Solutions Model by using Falcon-7B by Huggingface and Langchain

Description

1. LLM Model and Chatbot Creation:

  • Utilized the Falcon-7B open-source Large Language Model (LLM) from Hugging Face for natural language processing tasks.
  • Integrated the model with LangChain to create an interactive chatbot capable of answering user questions.

2. YouTube Video Summarizer Creation:

  • Took the Chatbot a step further and developed a YouTube Video Summarizer application using LangChain.
  • Implemented text splitting techniques to process large video transcripts effectively.

Result Preview

Steps in Model Development:

  1. Setting Up the Environment:
    • Installed necessary dependencies including LangChain, HuggingFaceHub, and other required libraries.
    • Loaded the Falcon-7B model from the HuggingFaceHub API.
  2. Developing the LLM Chatbot:
    • Integrated the Falcon-7B LLM model with LangChain to create an interactive chatbot.
    • Implemented a PromptTemplate and LLMChain for chatbot interaction.
  3. Implementing the YouTube Video Summarizer:
    • Utilized LangChain to develop a YouTube Video Summarizer application.
    • Loaded video transcripts from YouTube using the YoutubeLoader module.
    • Employed Recursive Character Text Splitter for effective processing of large video transcripts.
  4. Preparing Data and Text Processing:
    • Loaded and processed video transcripts for summarization tasks.
    • Implemented text splitting techniques to handle large documents effectively.
  5. Building Functionalities:
    • Developed functionalities for the chatbot to respond to user queries with meaningful answers.
    • Created a video summarization mechanism to extract key insights from lengthy video transcripts.
  6. Testing and Optimization:
    • Tested the chatbot and video summarizer functionalities to ensure proper operation.
    • Optimized the summarization process for efficiency and accuracy.
  7. Evaluating Performance:
    • Evaluated the performance of the chatbot and video summarizer in terms of accuracy and user experience.
    • Fine-tuned the models based on evaluation results and user feedback.

Please click the button to download the .ipynb or .py Code File of this Model.

Project 2 – Developed a Face-Swap Model Using InsightFace and OpenCV

Description

Developed a face-swap model leveraging InsightFace and OpenCV technologies to manipulate facial images and generate synthetic content.

Result Preview

Steps in Model Development:

  1. Setting Up Environment:
    • Imported necessary libraries including numpy, os, cv2, matplotlib, and InsightFace.
    • Initialized the FaceAnalysis module for facial analysis and manipulation.
  2. Face Detection and Analysis:
    • Utilized the FaceAnalysis module to detect and analyze faces within images.
    • Extracted facial features and bounding box coordinates for further processing.
  3. Image Swapping and Manipulation:
    • Implemented face swapping functionality using InsightFace’s model_zoo.
    • Integrated the InSwapper model to swap faces between source and target images.
  4. Image Generation and Visualization:
    • Generated synthetic images by swapping faces between source and target images.
    • Visualized the manipulated images using matplotlib for analysis and validation.
  5. Handling Real-World Images:
    • Applied the deep-fake model to real-world images of individuals like Elon Musk.
    • Demonstrated the model’s capability to swap faces across different contexts and subjects.
  6. Testing and Evaluation:
    • Tested the deep-fake model with various scenarios and input images to evaluate performance and accuracy.
    • Assessed the visual quality and realism of generated synthetic images.

Please click the button to download the .ipynb or .py Code File of this Model.

Project 1 – Developed a Language Translation Model using the Transformers Library and Gradio UI

Description

Developed a language translation model using the Transformers library and integrated it with Gradio to create an interactive user interface for real-time text translation.

Result Preview

Steps in Model Development:

  1. Language Translation Model:
    • Implemented a language translation model using the Transformers library for English-to-German and English-to-French translation.
    • Utilized the translation_pipeline function from the Transformers library to perform translation tasks.
    • Developed a custom function, translate_transformers, to handle text inputs and return translated outputs.
  2. Gradio Integration:
    • Integrated the Gradio library to create an intuitive user interface for text translation.
    • Configured input and output components using Gradio’s Textbox and output=’text’ parameters.
    • Launched the Gradio interface to enable users to input text and receive translated outputs in real-time.

Please click the button to download the .ipynb or .py Code File of this Model.

Data Science Projects

Project 4 – Data-Driven Customer Behavior Analysis for Marketing Optimization and Customer Acquisition

Client’s Requirement

The client aims to enhance their marketing mix by gaining a comprehensive understanding of customer acquisition factors (‘Marketing mix’ is a popular concept used in implementing marketing strategies).
The client wanted to explore the impact of product offerings, pricing, distribution channels, and promotional campaigns on customer engagement.

Solution Provided

Overview:

  • Conducted thorough data exploration and preprocessing on a marketing dataset to extract valuable insights.
  • Utilized Python libraries such as Pandas, NumPy, Matplotlib, Seaborn, and SciPy for data manipulation, visualization, and statistical analysis.
  • Formulated and tested hypotheses to derive actionable insights for marketing strategies.

Key Steps:

  1. Data Exploration and Cleaning:
    • Imported the dataset and performed initial exploration to understand its structure and contents.
    • Addressed missing values and inconsistencies in data types.
    • Conducted basic statistical analysis to gain insights into the dataset’s distribution and characteristics.
  2. Feature Engineering:
    • Engineered new features such as ‘Days_Since_Enrollment’, ‘Total_Children’, ‘Age’, and ‘Total_Spending’ to enhance predictive power.
    • Standardized and transformed data for better model performance.
  3. Outlier Detection and Treatment:
    • Identified and removed outliers in the ‘Income’ column using statistical methods.
    • Ensured data integrity and model robustness by handling outliers appropriately.
  4. Encoding Categorical Variables:
    • Utilized Ordinal Encoding for ‘Education’ column and One-Hot Encoding for ‘Marital_Status’ column to convert categorical variables into numerical representations.
    • Prepared the data for machine learning algorithms by encoding categorical features effectively.
  5. Correlation Analysis:
    • Visualized the correlation between different variables using a heatmap.
    • Identified significant correlations that could influence marketing strategies and decision-making.
  6. Hypothesis Testing:
    • Formulated and tested hypotheses related to customer behavior and preferences.
    • Employed statistical tests such as t-tests to validate hypotheses and derive actionable insights.

Results:

  • Established significant differences in purchasing behavior between different demographic groups.
  • Provided actionable insights for marketing strategies, including targeted promotions and channel optimization.
  • Contributed to data-driven decision-making processes and improved business outcomes.

Technologies Used:
Python, Pandas, NumPy, Matplotlib, Seaborn, SciPy, Jupyter Notebook

Please click on “View Code” to view the Code File.

Project 3 – Employee Turnover Prediction and Retention Strategy Implementation

Client’s Requirement

Portobello Tech is an app innovator that wishes to devise an intelligent way of predicting employee turnover within the company (Employee turnover refers to the total number of workers who leave a company over a certain time period).
It periodically evaluates employees’ work details including the number of projects they worked upon, average monthly working hours, time spent in the company, promotions in the last 5 years, and salary level.

Solution Provided

Overview: The project aims to analyze employee turnover within an organization using data science techniques. By exploring various factors contributing to turnover and building predictive models, the goal is to gain insights that can help in understanding and mitigating employee attrition.

Key Steps:

  1. Loading and Basic Exploration: The dataset is loaded from an Excel file, and basic exploration is performed to understand its structure and contents.
  2. Data Quality Check and Encoding: Missing values are checked and handled. Categorical variables are encoded using techniques like one-hot encoding and ordinal encoding to prepare the data for analysis.
  3. Exploratory Data Analysis (EDA): Visualizations such as correlation matrix heatmap, distribution plots, and bar plots are created to uncover patterns and insights in the data.
  4. Clustering Employees Who Left: KMeans clustering is used to group employees who left based on relevant features, providing insights into different departure patterns.
  5. Handling Class Imbalance: Techniques like SMOTE are employed to address class imbalance in the dataset, ensuring balanced representation of classes in model training.
  6. Model Training and Evaluation: Multiple classification models, including Logistic Regression, Random Forest, and Gradient Boosting Classifier, are trained using 5-fold cross-validation. Model performance is evaluated using classification reports.
  7. Identifying the Best Model and Retention Strategies: The best-performing model is identified based on evaluation metrics. Using this model, probabilities of employee turnover are predicted for the test data. Employees are categorized into risk zones, and retention strategies are suggested based on their probability scores.

Results:

  • The dataset consists of 14,999 records with 10 features related to employee characteristics and work-related factors.
  • Initial exploration indicates no missing values in the dataset.
  • EDA reveals correlations between various features and provides insights into employee satisfaction, evaluation scores, and project involvement.
  • Clustering analysis helps identify distinct groups of departing employees based on key attributes.
  • Models trained include Logistic Regression, Random Forest, and Gradient Boosting Classifier, with Gradient Boosting achieving the highest performance based on evaluation metrics.
  • Retention strategies are suggested for different risk zones based on predicted probabilities of employee turnover.

Technology Used:

  • Programming Language: Python
  • Libraries: Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, Imbalanced-learn
  • Tools: Jupyter Notebook, Excel
  • Techniques: Data preprocessing, Exploratory Data Analysis (EDA), Clustering, Classification, Model evaluation, Feature encoding, Handling class imbalance.

Please click on “View Code” to view the Code File.

Project 2 – Tableau Project: Sales Comparison Dashboard

Client’s Requirement

The director of a leading organization wants to compare the sales between two regions. He has asked each region operators to record the sales data to compare by region. The upper management wants to visualize the sales data using a dashboard to understand the performance between them and suggest the necessary improvements.

Solution Provided

Helped the organization analyze sales performance and identify areas for improvement by creating a comprehensive dashboard.
Utilized the Sample Superstore dataset for data analysis and visualization.

  1. Created a hierarchical structure called Location for the variable Country to categorize sales data by region.
  2. Developed two parameters, Primary Region and Secondary Region, to enable comparison between selected regions.
  3. Created a Calculated Field named First Order Date to track the initial purchase date.
  4. Designed a dashboard layout to display key metrics for the Primary and Secondary Regions:
    • First Order Date
    • Total Sales
    • Average Sales per Order
    • Number of Customers
    • Number of Orders
    • Number of Products in Sale
  5. Created a dashboard using all the sheets:
    • Compiled individual visualizations into a cohesive dashboard to present a comprehensive view of sales comparison between the Primary and Secondary Regions.
    • Ensured the dashboard layout is visually appealing and intuitive for users to navigate.

Please click on “View/Download Document” button to view and/or download the Document Data.

Project 1 – Power BI Project: Amazon Prime Videos Dashboard

Client’s Requirement

An entertainment company seeks insights into viewer preferences and engagement on Amazon Prime Videos. They need a dynamic dashboard to visualize show types by genre, ratings distribution, geographical distribution of content, and the ratio of movies to TV shows.
The objective is to empower decision-makers with actionable insights to optimize content strategy, enhance viewer engagement, and drive business success on the Amazon Prime Videos platform.

Solution Provided

Assisted the entertainment company in optimizing content strategy with a comprehensive Power BI dashboard. Leveraged the Amazon Prime Videos dataset for in-depth analysis and visualization. Ensured the dashboard layout is user-friendly and visually appealing, facilitating easy navigation and interpretation of insights for decision-makers.

  1. Leveraged Amazon Prime Videos dataset for analysis and visualization.
  2. Developed interactive visualizations, including:
    • Bar charts for genre types and ratings distribution.
    • Geographical map for content distribution analysis.
    • Donut chart for visualizing movie versus TV show ratio.
  3. Integrated informative cards for key metrics at-a-glance.
  4. Ensured user-friendly and visually appealing dashboard layout.
  5. Empowered data-driven decision-making, enhanced viewer engagement, and drove business growth on the Amazon Prime Videos platform.

Please click on “View/Download Document” button to view and/or download the Document Data.

Cloud Computing Projects

Project 4 – Deployed Azure Protected Geo-Redundant Solution with Path-Based Routing (Azure AZ : 305 Level)

Business Scenario

Tyrell Crop wants to build a highly secured Globally distributed application. This application serves two types of content: images and dynamically rendered webpages. As their user base comes from across the globe this must be geographically redundant. The design demands that it should serve its users from the closest (lowest latency) location to them. For distinction, Tyrell Crop has decided that any URLs that match the pattern /images/* are served from a dedicated pool of VMs that are different from the rest of the web farm.

Solution Provided

Business Scenario:

  • Designed and implemented a highly secure globally distributed application for Tyrell Crop.
  • Served two types of content (images and dynamically rendered webpages) with geographically redundant infrastructure to minimize latency for users worldwide.

Key Responsibilities:

  1. Load Balancing Architecture Design:
    • Provisioned Application Gateway in East US region to ensure efficient load distribution.
    • Implemented path-based routing to distinguish between content types (/images/* routed to dedicated VM pool).
  2. Geo-Redundancy Setup:
    • Configured Traffic Manager to manage multiple Azure regions for failover and performance optimization.
    • Added additional Application Gateways to Traffic Manager endpoints for enhanced redundancy and scalability.

Technologies Utilized:

  • Azure Portal
  • Application Gateway
  • Traffic Manager
  • Virtual Machines (VMs)
  • Load Balancing
  • Geo-Redundancy

Achievements:

  • Successfully deployed a geo-redundant solution for Tyrell Crop, ensuring high availability and low latency for global users.
  • Implemented path-based routing to efficiently manage different types of content delivery.

Please click the “View Detailed Report” button to view all the steps followed in the execution of this project.

Project 3 – Deployed an Online Movie Watching Application on the Cloud (AWS Solutions Architect : Professional Level)

Business Scenario

‘Binge Watch Online’ which is an online entertainment provider company has created a website which uses a public cloud to deploy the website. After deploying it on cloud, users are complaining about the reloading speed of the pages.
The website is getting global traffic and static assets like pages that are served from a single server. It wants the traffic coming to the website from different parts of the world to be load balanced at the DNS level.

Solution Provided

Business Scenario:

  • Engaged in a project for ‘Binge Watch Online’, an online entertainment provider, to address slow website loading times and enhance global accessibility.
  • Aimed to optimize website performance and mitigate latency issues arising from global traffic and static asset delivery from a single server.
  • Required implementation of a cloud-based solution to enable DNS-level load balancing of traffic for improved user experience.

Key Responsibilities:

  1. Infrastructure Deployment:
    • Provisioned AWS infrastructure utilizing Route 53, S3 Bucket, CloudFront, and EC2 instances.
    • Implemented a scalable architecture to optimize website performance and deliver static assets efficiently.
  2. Governance and Cost Management:
    • Established resource governance framework to manage development, testing, and production environments effectively.
    • Implemented cost tracking mechanisms to monitor the billing life cycle and optimize cloud expenses.
  3. Content Upload and CDN Configuration:
    • Uploaded static website content to an S3 bucket for global accessibility and efficient content delivery.
    • Configured CloudFront CDN endpoint to serve static files and improve website loading speed.
  4. Storage and Collaboration:
    • Utilized AWS storage services for file sharing among team members, ensuring seamless collaboration and access to resources.
  5. VM Connectivity:
    • Connected Windows or Linux VMs to the storage service, enabling secure access and efficient resource utilization.

Technologies Utilized:

AWS (Amazon Web Services):

  • EC2: Virtual machines for hosting website components.
  • S3 Bucket: Storage for static website assets.
  • IAM: Role-based access control for EC2 instances.
  • Route 53: DNS routing for load balancing traffic.
  • CloudFront: CDN for efficient content delivery.

Achievements:

  • Successfully deployed a scalable and efficient AWS infrastructure to address website performance concerns.
  • Enhanced collaboration and resource sharing among team members through effective utilization of AWS storage services.
  • Facilitated secure connectivity to virtual machines, promoting streamlined operations and efficient resource utilization.

Please click the “View Detailed Report” button to view all the steps followed in the execution of this project.

Project 2 – Setting-up and Monitoring a WordPress Instance (AWS Solutions Architect : Associate Level)

Business Scenario

An organization which publishes blogs and provides documentation services for other businesses and technologies requires the following:

  • To set up a live WordPress instance to publish blogs
  • To set up a WordPress instance that can be used for development and testing purposes so that any work done on this instance will not impact the live blog
  • To configure the WordPress instance for development and testing purposes, which will be available only for business hours (9 AM – 6 PM)
  • To be able to monitor the health of the WordPress instance

Solution Provided

Business Scenario:

  • Executed a project to set up and monitor a WordPress instance for organizational blogging and documentation services.
  • Required the deployment of a live WordPress instance for publishing blogs and a separate instance for development and testing purposes to prevent disruptions to live operations.
  • Mandated configuration of the WordPress instance for development and testing, restricting access to business hours only.

Key Responsibilities:

  1. CloudFormation Stack Setup:
    • Created an AWS CloudFormation stack named “Assessments2-InstanceScheduler” to manage instance scheduling.
    • Configured scheduling parameters in the DynamoDB table of the Instance Scheduler stack.
  2. WordPress Instance Deployment:
    • Utilized AWS CloudFormation to deploy a WordPress instance stack named “Assessment2-WordpressStack.”
    • Installed and configured WordPress on the instance using the provided website URL.
    • Implemented tags for the instance to facilitate linking with the Instance Scheduler.
  3. AMI Creation and Configuration:
    • Generated an AMI (Amazon Machine Image) of the WordPress instance, named “Assessment2-WordpressInstanceImage.”
  4. Route53 Health Check:
    • Established a new Health Check in AWS Route53 using the domain to monitor the health of the WordPress instance.

Technologies Utilized:

  • AWS Management Console
  • Amazon EC2
  • AWS CloudFormation
  • Amazon Route 53

Achievements:

  • Successfully deployed and scheduled WordPress instances for organizational blogging and testing, ensuring uninterrupted service availability during business hours.
  • Implemented an efficient instance scheduling mechanism using AWS CloudFormation, enhancing resource utilization and cost-effectiveness.
  • Configured automatic instance shutdown and monitoring through Route 53 health checks, ensuring optimal performance and resource management.

Please click the “View Detailed Report” button to view all the steps followed in the execution of this project.

Project 1 – Implemented Azure IaaS (Azure AZ : 104 Level)

Business Scenario

The OSS Corporation is a globally distributed firm which has their headquarters in East US with another branch office in SouthEast Asia.
Currently they are working on a project and have decided that the application tier of this project will reside in one of its branch regions, however for security reasons they want to keep their data tier in the headquarter region.
And so, the company wants a suitable infrastructure setup in the headquarter region for its database, and it wants another infrastructure setup in the branch region for the application.
In addition, because the company wants the communication between App and Data to be secure it wants the connection to happen over a private channel.

Solution Provided

Business Scenario:

  • Executed project for OSS Corporation, a globally distributed firm, to deploy application and data tiers on Azure IaaS.
  • Application tier deployed in SouthEast Asia region while data tier kept in headquarters at East US for enhanced security.

Key Responsibilities:

  1. Infrastructure Deployment:
    • Created IaaS v2 virtual network in headquarters region for data tier.
    • Deployed another IaaS v2 virtual network in branch region for application tier.
  2. Connectivity Setup:
    • Configured branch office virtual network for connectivity to headquarters’ network.
    • Established communication between App and data tiers over private channel.
  3. Testing and Validation:
    • Deployed test IaaS Standard DS1 v2 VMs to both virtual networks.
    • Validated connectivity using Ping to ensure seamless communication.

Technologies Utilized:

  • Azure Infrastructure as a Service (IaaS)
  • Virtual Networks
  • Virtual Network Gateway
  • Virtual Machines (VMs)
  • VNet peering

Achievements:

  • Successfully implemented Azure IaaS deployment strategy meeting OSS Corporation’s security requirements.
  • Ensured reliable connectivity between application and data tiers, enhancing operational efficiency.

Please click the “View Detailed Report” button to view all the steps followed in the execution of this project.

My Skills

Programming Languages
Java Script
Python
Java
C++
Tech Stack + Backend
MERN Stack
REST API
CRUD Operations
Git (Version Control)
Artificial Intelligence
NLP
Computer Vision
Generative AI
Deep Learning
Cloud Service Providers
AWS
Azure
GCP
IBM Cloud
Databases
MongoDB
PostgreSQL
MySQL
Casandra DB
Web Infrastructure
Apache Server
Nginx
Docker
Kubernetes
Data Analytics
Tableau
Power BI
Excel
Google Sheets
Low Code Technologies
Wordpress
Magento
DirectAdmin
cPanel
Operating Systems
Windows
Linux
Android
MAC

Teamwork and Collaboration

Thriving in teamwork.
Collaboration and conflict resolution.

Technology Proficiency

Competent with business software, showcasing strong tech-savviness.

Continuous Learning

Committed to pursuing knowledge and skills continually.

Time Management

Prioritizing, organizing and meeting deadlines.

Analytical Thinking

Data analysis.
Problem-solving and critical thinking.

Presentation Design

Creating impactful multimedia presentations.

Client Relations

Customer satisfaction and client relationship management.

Financial Acumen

Budgeting and financial analysis.
Cost management.

Contact Me

Contact Form

Thank you

Your interest in my work is greatly appreciated.
Thank you for taking the time to explore my profile.