Data Science: What is the role of a chief data scientist?
The Chief Data Scientist or CDS for short is a job tile in IT field whose responsibility is to design, develop, deliver and maintain large scale machine learning solutions involving huge data management. The roles and responsibility of Chief data scientists varies according to the organization needs but the goal of delivering large scale machine learning solution remains the same. Depending on the organization it may or may involve actual coding, testing and delivery. If it's small organization then chief data scientist design, code, test and deliver the solution along with the help of small team. If the organization is big and there big team in that case chief data scientist mainly involve in designing of the system and the managerial role.
In this article we are going to explore the core responsibility of a chief data scientist. Even if you are working in big organization and not involved in actual hands-on, you must have prior hands-on in all the technologies used in data science. Because chief data scientist is big role in industry and there is huge investment in any data science project; hence you will be responsible for all the outcome of the project. You have to manage the team and work in such a manner that the product is delivered by team at its maximum capacity.
The chief data scientist is the final decision maker on the algorithms and methodology to use for the delivery of a project. You should have solid understanding of all the algorithms, methodology, data ingestion, data storing, data processing logic and visualization technologies used in data science projects. Having prior experience in Big Data and programming technologies are must to become highly productive chief data scientist.
The chief data scientist role involves understanding the client business requirement, convert into a mathematical problem, solve the mathematical problem with programming/machine learning and finally present the result in dashboard format to client. So, this role requires interaction with client and stakeholders. The software project management and team management skills are a must for this role.
You must have experience in designing; coding, testing and production deliver of various machine learning models. These skills are required because if model is not performing as expected then you can work with your team to tune the model to get best results.
In a small organization you have to hands-on while in big organization such as global banks role changes to 95% admin-related work. In big organization you have to lead a team of reporters, forecasters, data scientists and data modelers. In big organization team management, project management and client handling skills is required along with data science skills. Here you may have to work with team or yourself do the hands-on to create POC for the client fast.
In bigger organizations roles includes the administrative tasks such as insuring data integrity and compliance. You should make sure that the data science processes followed for the projects are robust and really useful for the business. You will be handling a big team and reporting to higher management. Here you will be working as VP of engineering team or we can say VP of data science.
Job description for chief data scientists
Usually companies are looking for highly talented and experienced data scientists having 17+ years of experience in IT field with 5 or more years of experience in data science. For this role you must have experience in applied AI in ML, deep Learning, ANN, CNN platform and large scale AI system integration. Prior MNC or experience in research organization is a must for this position.
Job responsibility of chief data scientists
- The chief data scientist must lead and mentor a team of data scientists and data engineers.
- Chief data scientist must drive machine learning / deep learning initiatives
of the company in all areas.
- You should be able to to design and deploy Machine Learning algorithms for consumer and commercial
products.
- This role requires one to collaborate with data and subject matter experts to seek, understand,
validate, interpret, and correctly use new data elements.
- You will be collaborating with engineering teams to develop prototypes and software products
as per client specification. Machine learning and deep learning
solution must be developed to solve the actual business problem.
- Key tasks of project and delivery man agent like manage
stake holders expectations, and working with business users to
gather requirements, resolve business rule requirements, design
ML solution, create POC for the proposed solution and perform
joint conceptual data model reviews.
- Define the Enterprise Data Strategy which involves the
interaction with multiple systems, disparate sources, databases,
global teams, and varying data needs based on volume, variety
and velocity.
- The enterprise strategy should cover Data Management and
Architecture, Enterprise Information Management (EIM), Master
Data Management (MDM), Meta Data Management, Quality and Data
Governance strategies, methodologies, guidelines and standards.
- Design and develop the enterprise-level conceptual, logical,
and physical data models.
- The chief data architect role requires both hands on
technical expertise as well as strategic problem solving skills
to achieve the goal of an enterprise
- The chief data scientist should be able to define overall,
strategic and tactical, Big Data roadmap for design, development
and implementation of the enterprise data warehouse (EDW) and
its associated data stores so that it can be used for ML/DL
activities.
- Should be well versed with the range of existing Big data
technologies and data modelling techniques. One should be able
to design complete data model, ingestion pipeline, data
pre-processing pipeline, data cleansing strategy and finally the
system for large scale data analysis.
- The project and team management skills are also very
import for chief data scientist role. One should be able to
manage team of lead architects, data modelers, or data
scientists, and supporting multiple projects.
- The chief data scientist should be able to work various
project development methodology including agile or waterfall
techniques.
- The candidate should be able to build new data sets, enhance existing data sets and design data structures when required. Prior experience in managing the distribution, replication and archiving of data throughout the enterprise is a must to have skill.
Required Skills for chief data scientists
Following skills are required for this job role:
- Prior experience in statistical, mathematical, predictive modeling
to build models to solve the business problem.
- Experience with the reporting tools/software packages to
communicate their findings visually to stakeholders.
- Good experience and theoretical knowledge of Artificial
Neural Network ( ANN ), AI Chatbot, CNN and NLP. Experience in
different programming language can be an added advantage for
chief data scientist.
- Chief data scientists must have experience in Natural
Language Processing, Machine learning, Deep learning, Artificial
Intelligence, Conceptual modeling, Statistical
analysis, Predictive modeling, Hypothesis testing.
- Should be able to apply hardcore in ANN /Deep Learning
/Machine Learning/NLP to solve the specific complex business
problem.
- Experience on AWS Data lake (Amazon Elastic Compute Cloud
(EC2), Amazon Data Pipeline, S3, DynamoDB NoSQL, Relational
Database Service (RDS), Elastic Map Reduce (EMR) and Amazon
Redshift, Kinesis, Amazon Machine Learning, AWS Lambda, and the
Relational Database Service (RDS)) is also required for chief
data scientist post.
- Understanding the concepts like MPP databases, noSQL (e.g.,
MongoDB) storage, Graph databases, Data Warehouse design, BI
reporting and Dashboard development is also required.
Required educational qualification for chief data scientists
- B.Tech / M.Tech / M.Tech ( 5 Yrs Integrated ) in CSE / Maths
& Computing in Data Science / Machine Learning from top IIT and OR with Ph.D. (
AI Machine Learning, Deep Learning Data Science) from Top-rated Tech University.
But this is not limited a simple science graduate possessing all
these skills and experience can also get this job in
multi-nationals.
- MS or PhD in Computer Science, Electrical Engineering, Statistics, or equivalent
fields.
- Experience in all the machine learning, big data and
programming technologies.
-
Strong English verbal and written communication.
Required programming skills
- Candidate should be well versed with design patterns and
software engineering principles.
- Responsible for development of applications using artificial
intelligence/machine learning technology and application
analysis. Candidate should be able to understand latest
industrial and academic developments in AI/ML.
- Must have experience in Bigdata Mobility, cloud and hands on
experience in Hadoop/ Hortonworks preferred but open to Cloudera/
MapR or even Apache Hadoop/ PIG/ HIVE, Mapreduce/ Flume/ Kafka/
Sparks etc.
- Should have prior experience in design competitive AI/ML
services and creating prototypes for demonstration.
Machine learning skills
One should have experience of many project using various machine learning, deep learning and artificial intelligence technologies.
- Well versed with all the machine Learning, Statistics,
Regression and all programming languages like Python, Scala,
Spark, TensorFlow, R, Matlab and Java.
- Candidate should have hands-on experience in data science
and machine learning technique like- linear regression, logistic
regression, random forest, support vector machines,
ANOVA/ANCOVA, optimization techniques, time series modelling,
segmentation, decision tree, clustering, recommendation engines
and forecasting.
- Candidates must have solid understanding of Artificial
Intelligence technologies including knowledge representation &
reasoning (KRR), natural language processing (NLP), speech
recognition, unsupervised machine learning, and/or reinforcement
learning.
- Experience in statistical modeling/data mining
algorithms such as:
o Multivariate Regression, Logistic Regression, clustering algorithms, Support Vector Machines, Decision Trees etc
o Machine learning, or graph mining.
o DOE, Forecasting, Segmentation, Uncertainty Analysis etc.
o Data Mining i.e. Text Mining, Classification Methods SVM, NN, etc
o Vector Space model for Unstructured Text
o Sentiment Analysis, Association Mining, Semantic Analysis
- The required key skill sets includes Text Mining R Machine Learning Statistical Modeling Logistic Regression Data Mining Python Analytics Data Visualization Segmentation Data science.
In this article we have discussed the roles and responsibility of chief data scientist. You can learn many technologies on our website. Here are the links: