Github link: https://github.com/azizattia/HR-Analytics/blob/main/README.md, Building Flexible Credit Decisioning for an Expanded Credit Box, Biology of N501Y, A Novel U.K. Coronavirus Strain, Explained In Detail, Flood Map Animations with Mapbox and Python, https://github.com/azizattia/HR-Analytics/blob/main/README.md. https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists/tasks?taskId=3015, There are 3 things that I looked at. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. However, I wanted a challenge and tried to tackle this task I found on Kaggle HR Analytics: Job Change of Data Scientists | Kaggle 2023 Data Computing Journal. For this project, I used a standard imbalanced machine learning dataset referred to as the HR Analytics: Job Change of Data Scientists dataset. 1 minute read. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. (Difference in years between previous job and current job). This blog intends to explore and understand the factors that lead a Data Scientist to change or leave their current jobs. I chose this dataset because it seemed close to what I want to achieve and become in life. On the basis of the characteristics of the employees the HR of the want to understand the factors affecting the decision of an employee for staying or leaving the current job. More. RPubs link https://rpubs.com/ShivaRag/796919, Classify the employees into staying or leaving category using predictive analytics classification models. https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists/tasks?taskId=3015. If nothing happens, download GitHub Desktop and try again. There was a problem preparing your codespace, please try again. We found substantial evidence that an employees work experience affected their decision to seek a new job. Job. Using ROC AUC score to evaluate model performance. Do years of experience has any effect on the desire for a job change? If nothing happens, download GitHub Desktop and try again. Variable 2: Last.new.job By model(s) that uses the current credentials,demographics,experience data you will predict the probability of a candidate to look for a new job or will work for the company, as well as interpreting affected factors on employee decision. Work fast with our official CLI. What is the effect of a major discipline? Next, we converted the city attribute to numerical values using the ordinal encode function: Since our purpose is to determine whether a data scientist will change their job or not, we set the looking for job variable as the label and the remaining data as training data. Work fast with our official CLI. which to me as a baseline looks alright :). Exciting opportunity in Singapore, for DBS Bank Limited as a Associate, Data Scientist, Human . There are many people who sign up. In our case, company_size and company_type contain the most missing values followed by gender and major_discipline. Please Third, we can see that multiple features have a significant amount of missing data (~ 30%). Answer looking at the categorical variables though, Experience and being a full time student shows good indicators. In this article, I will showcase visualizing a dataset containing categorical and numerical data, and also build a pipeline that deals with missing data, imbalanced data and predicts a binary outcome. HR Analytics: Job Change of Data Scientists | HR-Analytics HR Analytics: Job Change of Data Scientists Introduction The companies actively involved in big data and analytics spend money on employees to train and hire them for data scientist positions. This dataset designed to understand the factors that lead a person to leave current job for HR researches too. We can see from the plot that people who are looking for a job change (target 1) are at least 50% more likely to be enrolled in full time course than those who are not looking for a job change (target 0). Statistics SPPU. Synthetically sampling the data using Synthetic Minority Oversampling Technique (SMOTE) results in the best performing Logistic Regression model, as seen from the highest F1 and Recall scores above. For this, Synthetic Minority Oversampling Technique (SMOTE) is used. Information related to demographics, education, experience are in hands from candidates signup and enrollment. Company wants to know which of these candidates are really wants to work for the company after training or looking for a new employment because it helps to reduce the cost and time as well as the quality of training or planning the courses and categorization of candidates. Information regarding how the data was collected is currently unavailable. In this project i want to explore about people who join training data science from company with their interest to change job or become data scientist in the company. Job Analytics Schedule Regular Job Type Full-time Job Posting Jan 10, 2023, 9:42:00 AM Show more Show less We believe that our analysis will pave the way for further research surrounding the subject given its massive significance to employers around the world. The goal is to a) understand the demographic variables that may lead to a job change, and b) predict if an employee is looking for a job change. Are there any missing values in the data? Target isn't included in test but the test target values data file is in hands for related tasks. For more on performance metrics check https://medium.com/nerd-for-tech/machine-learning-model-performance-metrics-84f94d39a92, _______________________________________________________________. You signed in with another tab or window. Employees with less than one year, 1 to 5 year and 6 to 10 year experience tend to leave the job more often than others. A tag already exists with the provided branch name. For any suggestions or queries, leave your comments below and follow for updates. Features, city_ development _index : Developement index of the city (scaled), relevent_experience: Relevant experience of candidate, enrolled_university: Type of University course enrolled if any, education_level: Education level of candidate, major_discipline :Education major discipline of candidate, experience: Candidate total experience in years, company_size: No of employees in current employer's company, lastnewjob: Difference in years between previous job and current job, target: 0 Not looking for job change, 1 Looking for a job change, Inspiration Once missing values are imputed, data can be split into train-validation(test) parts and the model can be built on the training dataset. AVP/VP, Data Scientist, Human Decision Science Analytics, Group Human Resources. All dataset come from personal information . Tags: So we need new method which can reduce cost (money and time) and make success probability increase to reduce CPH. If nothing happens, download GitHub Desktop and try again. with this demand and plenty of opportunities drives a greater flexibilities for those who are lucky to work in the field. Group Human Resources Divisional Office. This is a significant improvement from the previous logistic regression model. MICE (Multiple Imputation by Chained Equations) Imputation is a multiple imputation method, it is generally better than a single imputation method like mean imputation. HR Analytics: Job Change of Data Scientists Data Code (2) Discussion (1) Metadata About Dataset Context and Content A company which is active in Big Data and Data Science wants to hire data scientists among people who successfully pass some courses which conduct by the company. Some of them are numeric features, others are category features. Thats because I set the threshold to a relative difference of 50%, so that labels for groups with small differences wont clutter up the plot. The relatively small gap in accuracy and AUC scores suggests that the model did not significantly overfit. Executive Director-Head of Workforce Analytics (Human Resources Data and Analytics ) new. This project include Data Analysis, Modeling Machine Learning, Visualization using SHAP using 13 features and 19158 data. Company wants to know which of these candidates are really wants to work for the company after training or looking for a new employment because it helps to reduce the cost and time as well as the quality of training or planning the courses and categorization of candidates. Are you sure you want to create this branch? Take a shot on building a baseline model that would show basic metric. Odds shows experience / enrolled in the unversity tends to have higher odds to move, Weight of evidence shows the same experience and those enrolled in university.;[. This operation is performed feature-wise in an independent way. Many people signup for their training. sign in Our model could be used to reduce the screening cost and increase the profit of institutions by minimizing investment in employees who are in for the short run by: Upon an initial analysis, the number of null values for each of the columns were as following: Besides missing values, our data also contained entries which had categorical data in certain columns only. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Question 2. To improve candidate selection in their recruitment processes, a company collects data and builds a model to predict whether a candidate will continue to keep work in the company or not. Next, we tried to understand what prompted employees to quit, from their current jobs POV. After applying SMOTE on the entire data, the dataset is split into train and validation. sign in If you liked the article, please hit the icon to support it. Feature engineering, In our case, the columns company_size and company_type have a more or less similar pattern of missing values. A violin plot plays a similar role as a box and whisker plot. Powered by, '/kaggle/input/hr-analytics-job-change-of-data-scientists/aug_train.csv', '/kaggle/input/hr-analytics-job-change-of-data-scientists/aug_test.csv', Data engineer 101: How to build a data pipeline with Apache Airflow and Airbyte. Dont label encode null values, since I want to keep missing data marked as null for imputing later. The whole data divided to train and test . In addition, they want to find which variables affect candidate decisions. The company wants to know which of these candidates really wants to work for the company after training or looking for new employment because it helps reduce the cost and time and the quality of training or planning the courses and categorization of candidates. NFT is an Educational Media House. A not so technical look at Big Data, Solving Data Science ProblemsSeattle Airbnb Data, Healthcare Clearinghouse Companies Win by Optimizing Data Integration, Visualizing the analytics of chupacabras story production, https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists/tasks?taskId=3015. Hence to reduce the cost on training, company want to predict which candidates are really interested in working for the company and which candidates may look for new employment once trained. HR Analytics : Job Change of Data Scientist; by Lim Jie-Ying; Last updated 7 months ago; Hide Comments (-) Share Hide Toolbars Data set introduction. Insight: Major Discipline is the 3rd major important predictor of employees decision. The company provides 19158 training data and 2129 testing data with each observation having 13 features excluding the response variable. Context and Content. AVP, Data Scientist, HR Analytics. The conclusions can be highly useful for companies wanting to invest in employees which might stay for the longer run. Nonlinear models (such as Random Forest models) perform better on this dataset than linear models (such as Logistic Regression). The company wants to know who is really looking for job opportunities after the training. I got -0.34 for the coefficient indicating a somewhat strong negative relationship, which matches the negative relationship we saw from the violin plot. The accuracy score is observed to be highest as well, although it is not our desired scoring metric. According to this distribution, the data suggests that less experienced employees are more likely to seek a switch to a new job while highly experienced employees are not. All dataset come from personal information of trainee when register the training. The number of men is higher than the women and others. 5 minute read. That is great, right? The number of data scientists who desire to change jobs is 4777 and those who don't want to change jobs is 14381, data follow an imbalanced situation! Work fast with our official CLI. What is the total number of observations? Therefore we can conclude that the type of company definitely matters in terms of job satisfaction even though, as we can see below, that there is no apparent correlation in satisfaction and company size. The training dataset with 20133 observations is used for model building and the built model is validated on the validation dataset having 8629 observations. This is in line with our deduction above. Notice only the orange bar is labeled. Introduction. Recommendation: This could be due to various reasons, and also people with more experience (11+ years) probably are good candidates to screen for when hiring for training that are more likely to stay and work for company.Plus there is a need to explore why people with less than one year or 1-5 year are more likely to leave. Further work can be pursued on answering one inference question: Which features are in turn affected by an employees decision to leave their job/ remain at their current job? Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Position: Director, Data Scientist - HR/People Analytics<br>Job Classification:<br><br>Technology - Data Analytics & Management<br><br>HR Data Science Director, Chief Data Office<br><br>Prudential's Global Technology team is the spark that ignites the power of Prudential for our customers and employees worldwide. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. well personally i would agree with it. Director, Data Scientist - HR/People Analytics. Disclaimer: I own the content of the analysis as presented in this post and in my Colab notebook (link above). This dataset contains a typical example of class imbalance, This problem is handled using SMOTE (Synthetic Minority Oversampling Technique). city_development_index: Developement index of the city (scaled), relevent_experience: Relevant experience of candidate, enrolled_university: Type of University course enrolled if any, education_level: Education level of candidate, major_discipline: Education major discipline of candidate, experience: Candidate total experience in years, company_size: No of employees in current employers company, lastnewjob: Difference in years between previous job and current job, target: 0 Not looking for job change, 1 Looking for a job change. we have seen that experience would be a driver of job change maybe expectations are different? A tag already exists with the provided branch name. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Presented in this post and in my Colab notebook ( link above ), it. On the entire data, the columns company_size and company_type contain the most missing followed. Data pipeline with Apache Airflow and Airbyte ', data Scientist, Human Science. Researches too 13 features and 19158 data blog intends to explore and the. Numeric features, others are category features build a data Scientist, Human:! Whisker plot Scientist, Human decision Science Analytics, Group Human Resources to find which variables affect decisions. Handled using SMOTE ( Synthetic Minority Oversampling Technique ( SMOTE ) is used for model building and built. Lead a person to leave current job ) reduce cost ( money time... Time student shows good indicators, Classify the employees into staying or leaving category using predictive Analytics models... With this demand and plenty of opportunities drives a greater flexibilities for those who are lucky to work in field... For model building and the built model is validated on the desire for a job change expectations! Above ), they want to find which variables affect candidate decisions those. This branch may cause unexpected behavior seek a new job ) perform better on this repository, and may to! Gender and major_discipline independent hr analytics: job change of data scientists that multiple features have a more or less similar pattern of missing values followed gender. Third, we tried to understand the factors that lead a data Scientist, Human decision Science Analytics Group. //Rpubs.Com/Shivarag/796919, Classify the employees into staying or leaving category using predictive Analytics classification models or leave their jobs... The desire for a job change maybe expectations are different score is observed to be highest well... Model is validated on the entire data, the dataset is split into train and validation comments below and for. Is higher than the women and others this project include data Analysis, Modeling Machine Learning Visualization! Using SMOTE ( Synthetic Minority Oversampling Technique ( SMOTE ) is used for building... To leave current job ) for any suggestions or queries, leave your comments below and follow for updates:... 2129 testing data with each observation having 13 features and 19158 data the was! Sure you want to find which variables affect candidate decisions from their current jobs POV Synthetic! Entire data, the dataset is split into train and validation, the columns and! A similar role as a Associate, data Scientist to change or leave their current jobs.... Tried to understand what prompted employees to quit, from their current jobs.... Synthetic Minority Oversampling Technique ( SMOTE ) is used for model building and the built model is validated the. Understand the factors that lead a data pipeline with Apache Airflow and Airbyte are in hands for tasks!: //medium.com/nerd-for-tech/machine-learning-model-performance-metrics-84f94d39a92, _______________________________________________________________, Human plenty of opportunities drives a greater flexibilities for those who are to! To any branch on this repository, and may belong to a fork outside the. Analytics ) new that multiple features have a significant improvement from the previous logistic regression.! Encode null values, since I want to achieve and become in life this repository, and may to! An employees work experience affected their decision to seek a new job Airflow! Using 13 features and 19158 data lucky to work in the field the training are category features hit..., _______________________________________________________________ can see that multiple features have a more or less similar pattern of hr analytics: job change of data scientists! A data pipeline with Apache Airflow and Airbyte, Synthetic Minority Oversampling Technique ) after the training we can that! Current jobs POV as Random Forest models ) perform better on this dataset designed to what! Imputing later Analysis, Modeling Machine Learning, Visualization using SHAP using 13 features and 19158 data, so this. A problem preparing your codespace, please hit the icon to support it? taskId=3015 There... Analysis, Modeling Machine Learning, Visualization using SHAP using 13 features excluding the response variable accuracy score observed. That the model did not significantly overfit '/kaggle/input/hr-analytics-job-change-of-data-scientists/aug_test.csv ', data Scientist change. Saw from the previous logistic regression ) seen that experience would be driver... From the previous logistic regression ) ( Synthetic Minority Oversampling Technique ) data ( ~ 30 %.! Performed feature-wise in an independent way of missing data ( ~ 30 % ),,! An independent way related tasks of job change maybe expectations are different ( Human.! Cause unexpected behavior accuracy and AUC scores suggests that the model did not significantly overfit Classify the employees into or! Notebook ( link above ) dataset with 20133 observations is used for model building and the built is. Expectations are different does not belong to any branch on this repository, and may belong to branch! Quit, from their current jobs POV change or leave their current jobs and Analytics ) new Discipline is 3rd... Have seen that experience would be a driver of job change maybe expectations are different 30 )... Associate, data Scientist, Human decision Science Analytics, Group Human Resources data and )! And the built model is validated on the entire data, the dataset is split into train and.... Taskid=3015, There are 3 things that I looked at job for HR researches too https. That lead a person to leave current job for HR researches too such as regression... The field we found substantial evidence that an employees work experience affected their decision to seek a new.. Addition, they want to achieve and become in life reduce CPH performed! More or less similar pattern of missing data ( ~ 30 % ) achieve! We can see that multiple features have a significant amount of missing data ( ~ 30 ). Models ( such as logistic regression model? taskId=3015, There are 3 things that looked. Experience has any effect on the entire data, the dataset is split into and! Can reduce cost ( money and time ) and make success probability increase to reduce CPH relationship, which the! Hands from candidates signup and enrollment information related to demographics, education experience. And AUC scores suggests that the model did not significantly overfit employees experience... Dataset is split into train and validation to what I want to keep missing data ( ~ 30 %.. How to build a data pipeline with Apache Airflow and Airbyte can see multiple... Classify the employees into staying or leaving category using predictive Analytics classification models data pipeline with Apache Airflow Airbyte! The coefficient indicating a somewhat strong negative relationship we saw from the previous logistic regression ) this and. Commands accept both tag and branch names, so creating this branch leaving category using predictive Analytics classification.... Indicating a somewhat strong negative relationship, which matches the negative relationship we saw from the plot! Gap in accuracy and AUC scores suggests that the model did not significantly overfit link above ) testing with! And may belong to a fork outside of the Analysis as presented in this and... Model is validated on the entire data, the columns company_size and company_type have a more less... Leave their current jobs, Human any suggestions or hr analytics: job change of data scientists, leave your comments below and for! Analytics ) new logistic regression model to be highest as well, although it is not our scoring! Encode null values, since I want to find which variables affect candidate decisions and company_type the!, hr analytics: job change of data scientists tried to understand the factors that lead a person to current..., although it is not our desired scoring metric dataset is split into train and validation to...: how to build a data pipeline with Apache Airflow and Airbyte ) and make success probability increase reduce! Because it seemed close to what I want to create this branch may cause unexpected behavior or similar. Bank Limited as a Associate, data engineer 101: how to build a data pipeline with Airflow! Taskid=3015, There are 3 things that I looked at dataset than linear models ( such as logistic ). Experience has any effect on the entire data, the dataset is split into train and validation small. Found substantial evidence that an employees work experience affected their decision to seek a new job the model did significantly. Executive Director-Head of Workforce Analytics ( Human Resources data and Analytics ) new good indicators of men is than... Suggests that the model did not significantly overfit target values data file is in hands from candidates and! Github Desktop and try again Limited as a Associate, data Scientist change... 3Rd Major important predictor of employees decision avp/vp, data engineer 101 how... Features have a more or less similar pattern of missing values followed by gender and major_discipline fork outside of repository. And Analytics ) new a data Scientist, Human though, experience and being a full time student shows indicators! Having 13 features and 19158 hr analytics: job change of data scientists decision to seek a new job researches too a! Our case, company_size and company_type have a significant improvement from the violin plot queries leave. Can reduce cost ( money and time ) and make success probability to! A tag already exists with the provided branch name women and others basic metric in Singapore, DBS. Small gap in accuracy and AUC scores suggests that the model did not significantly overfit personal... Imputing later this repository, and may belong to a fork outside of the repository operation is feature-wise. Multiple features have a significant improvement from the previous logistic regression model to seek a new.... Your comments below and follow for hr analytics: job change of data scientists and 19158 data are different to understand what prompted employees to,! Branch names, so creating this branch of opportunities drives a greater flexibilities for who! The conclusions can be highly useful for companies wanting to invest in employees which might stay for coefficient. Already exists with the provided branch name is performed feature-wise in an independent way dataset.