avatar
Le Tuan Vu
Data Scientist | Machine Learning Engineer

Education

Ho Chi Minh City University of Technology 2014-2019
  • PFIEV Program: EUR-ACE Master in Telecomunications. GPA: 8.01/10
  • Publication: “Smart Subtitling for better readability” published at the conference of ISEE 2017
  • Scholarship:
    • eSilicon - Sunflower Mission Engineering and Technology Scholarship 2018
    • Kansai Electric Power Corp. - Kanden System Solutions Scholarship 2018

Experience

Data Scientist at Galaxy One (Sovico Group) 2022 - Present
  • Galaxy Customer Data Platform

    • Supported data engineer team to clean VietJet Air passenger profiles from free text. Acquired 70% of passengers with contactable information.
    • Identified millions of unique customers amongst a big pool of traveling data of VietJet Air. Ready for cross-selling campaigns.
    • Designed Graph-Database for the Sovico Customer Tracking Platform including VietJet Air, HDBank, and HDSaison.
    • Researched the Graph Link Prediction for the Product Recommendation System using Deep Graph Learning.
  • Customer Segmentation for VietJet Sky+ Membership Campaign

    • Segmented customers based on features recency, frequency, expenditure, booking size, booking channel, and route using PySpark and SparkSQL.
    • Deployed production model and store output on AWS Athena. Provided data for the Galaxy Commercial team in order to support VietJet Air in designing Sky+ Membership Program.
  • MLOps Platform for managing and operating Galaxy Product Recommendations

    • Developed the Kubeflow platform along with reusable components which helped reduce the team’s development cost.
    • Designed an auto-retrainable ML pipeline using Data Drift Detection that results in decreased platform operation cost.
Data Scientist at One Mount (VinGroup x Techcombank) 2019 - 2022
  • Personalized Recommendation system for VinMart online shopping and Evoucher platform

    • Partnered with the product and VinMart commercial team to provide personalized experiences for millions of VinMart shoppers on the VinID app.
    • Developed and A/B tested multiple models for different app modules using Python, and BigQuery and deployed models on the Kubeflow Platform. Increase overall conversion rate from view to click for all modules by 80% compared to rule-based widgets, which account for 35% of revenue.
      • Product-Score model to rerank scores of products on VinID’s home widget. CR: 14.6%
      • Frequently-Bought-Together model using the Apriori algorithm on VinID’s checkout page. CR: 26.4%
      • Popular-by-Segmentation model on VinID’s home widget. CR: 15%
    • Partnered with the Evoucher team to personalize online voucher selling. Increase vouchers purchased CR by ~14% compared to the first version.
      • Similar-Items model. CR: 4.41%
      • Content-based-item-to-item-for-user model. CR: 2.04%
  • Standardized product dataset of VinID - VinMart

    • Cleaned unstandardized product titles using regex and business rule-based. Completely remove the human labeling process and replace it with an automated labeling process.
      • Cleaned product dataset from free text. Coverage: 97.3%
      • Built Name-Entity-Recognition model to label and recognize VinMart products from Merchant products. Accuracy: 79%
  • Face Recognition

    • Applied for Time Attendance System across all the offices of One Mount.
    • Developed face recognition model using MTCNN and ArcFace. Deployed products in C++. Obtained real-time recognition.
  • MLOps

    • Deployed and operated the Kubeflow platform for all team data science projects.
    • Implemented CI/CD using Gitlab CI/CD for fast delivering and ensuring the recommendation quality.
    • Built a custom development kit and hosted training workshops about the MLOps stack to enhance the team’s productivity and reduce the team’s development cost
  • In-house research and testing

    • Built real-time face recognition on edge devices - AIoT - tracking employees at warehouses.
    • Sentiment Analysis - analyze the feedback of VinID’s customers.
      • Trained SVM model with the support of TFIDF. Accuracy: 82.77%
AI Engineer at EyeQ Tech 2019
  • Customer retention system - tracking customers at chain stores
    • Deployed face detection module and operated the entire system.
  • Customer’s waiting time detection system at POS - Demo on Sharktank Vietnam
    • Developed human detector and tracker to determine customers’ waiting times in queue.
  • Face Recognition for Banking
    • Operated the entire product.

Skills

  • Language: Python, C/C++, CUDA C, Java, Javascript, HTML/CSS.
  • Framework: Pytorch, Tensorflow, Mxnet, Scikit-learn, SpaCy, Numpy, Pandas, OpenCV, Flask, FastAPI, ReactJS.
  • Visualization: PowerBI, Matplotlib, Plotly, Seaborn, Tensorboard.
  • Big Data: Spark, BigQuery, Dataflow, Data, Redshift, Athena, EMR, Snowflake.
  • MLOps: Kubeflow, Airflow, MLFlow, Feast, BentoML, Seldon Core, KServe, Gitlab CI/CD, Kubernetes.