Advice for Aspiring Data Scientists from CRIF's VP of Analytics

Anindya Sengupta is Vice President-Analytics at CRIF India. Anindya leads pre-sales and delivery for CRIF's analytics unit for India market. He has over eleven years of industry experience in executing and implementing projects in the area of predictive analytics and reporting enabled solutions for insurance, health care, banking and crime prediction. Anindya has done his M.Phil in Economics from IGIDR, Mumbai. During his research days way back in 2005 itself he started working with large data sets on the Indian economy mainly the NSSO data set and developed statistical models to address various social and economic development problems of the country.

Then his journey in data science began from Fractal Analytics where he started working since 2007. He has seen the journey of data analytics from the era of statistical models to the era of machine learning. His work started in developing parametric models using structured data sources. With the advent of Artificial intelligence, he started working extensively on unstructured data sources, viz text and voice data and applied advanced AI enabled techniques to generate insights from various unstructured data sources.

What was the first data set you remember working with? What did you do with it?

Anindya Sengupta: As I had mentioned above, the first data set, I had worked as the NSSO data. There were multiple rounds of data from different time frames. The aim was to understand how the employment scenario in India had changed over the years and develop a predictive model to predict the factors explaining the probability of unemployment and also the duration of unemployment. I used Heckman sample selection models to predict the probability of unemployment and used Generalized regression models with Negative Binomial distribution for the duration of unemployment.

Was there a specific "aha" moment when you realized the power of data?

Anindya Sengupta: I can say I am lucky to have many such "aha" moments in by 14 years' journey in the field of Data analytics. When we were able to solve a problem in healthcare sector using advanced machine learning techniques. The data distribution and structure was very complex. An ensemble of 17 different models had to be prepared to address the particular problem. The joy of solving this and getting a significant lift is something that I will not forget ever in my life.

How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?

Anindya Sengupta: There is no specific source. I follow various data science and machine learning forums in LinkedIn and also blogs published by various top data science companies to keep myself updated.

Team, Skills and, Tools

Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?

Anindya Sengupta: At this point, my major focus in on using Python and R. My teams focus is also on the same with special focus on PySpark.

What are the different roles and skills within your data team?

Anindya Sengupta: At CRIF, We are a mix of Data Scientists, Data Engineers ,and Analytics Consultants.

Help describe some examples of the kind of problems your team is solving in this year?

Anindya Sengupta: CRIF'smajor focus next year is on unstructured data and using AI enabled techniques to generate insights from various unstructured data sources.

How do you measure the performance of your team?

Anindya Sengupta: Data scientist's role to me does not stop only at creating the solution. It is hugely dependent on how the solution is working in the market after implementation. So to measure performance one has to look at both the aspects.

Industry Readiness for Data Science

Are the industries looking to understand what they can do with data? Do they have the required data in place?

Anindya Sengupta: Yes, there is a huge focus across the industries to have a data-based decision. In many public sector organizations in India also there is a huge push to have data enabled the decision-making process. This is driven by top management. Now everybody understands and acknowledges the power of data. No, they don't have all the required data in place always. In general, the focus is to look into whatever data available with them. But then there are also organizations who are keen to make investments to gather all the requisite data and embark in a journey towards data enabled automated decision-making process.

Advice to Aspiring Data Scientists

According to you, what are the top skills, both technical and soft skills that are needed for Data Analysts and Data Scientists?

Anindya Sengupta: I think these should be the area of focus:

(i) R/Python
(ii) Machine Learning algorithms ( LSTM, CNN, RNN, GBM, SVM, RF etc)
(iii) Statistics
(iv) Business understanding (the industry they are working)
(v) Presentation skills

How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?

Anindya Sengupta: Yes, it really helps. The more complicated data structure you handle, the sharper your knowledge becomes.

What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?

(i) Programming and software skills – R, Python, SAS or Excel
(ii) Visualization Tools
(iii) Statistical foundation and applied knowledge
(iv) Machine Learning

Anindya Sengupta: My only advice to them will be that they should not be mechanical in terms of learning. A perfect data scientist should be an artist and everything will not fall in a formula. This essence has to be understood. The best way to understand this is to do more and more projects. The only way to really grasp the true knowledge of data analytics is focusing on learning by doing approach.

What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?

Then came R and Python and now with the advent of Big data we are seeing a focus on the park. This is on the software side.In the technical side also we see a move from traditional statistics to decision tree based ensemble techniques to deep learning. I am confident there will newer techniques and newer software depending upon the need will emerge and will continue to emerge in regular intervals. The aspiring data scientists should remember that learning never ends and they will have to keep the student in them alive as long as they want to be in the data science field.The moment they start feeling they are expert in something and lose the urge of learning, that will be an indication that they should think of change in the job profile.

Original Source: Digital Vidya

Businesses in India

CRIF by your side

Aspiring Data Scientist? - Take a note of these two cents from Anindya Sengupta,Vice-President, Analytics at CRIF India