Data science: 12 tricks to crack your data science problems in this guide

Your subscription could not be saved. Please try again.

Your subscription has been successful.

Are you often confused about the terms of Artificial Intelligence (AI) and Data Science? While AI refers to intelligent systems that solve business problems, most of the AI is implemented using “Data Science.”

Data science is one of the means of implementing AI solutions. It is based on the analysis of large quantities of data to solve a business problem.

Data science involves many steps such as identifying the data sources, data collection, data preparation, and finally building “models” using machine learning.

While technical components of your data science problems are easy to understand, some of the most challenging areas lie in understanding the business domain, business processes, and the heuristics around the business logic.

Here are some steps that can help with those challenges.

1. Understand the business use case:

We often are part of projects that start right with the implementation steps or usually planning without pausing to think about the goals and business drivers of the organization. With your AI and data science projects, it becomes imperative to think through the goals. For example, if your goal is customer retention, then looking at your customer service logs to get insights into the feedback would be helpful.

These mission-critical attributes would help you go a long way in successful implementation:

Business drivers and mission of the organization
Current Customers and their feedback
Portfolio of products and services
Determine your success criteria

2. Think broader about the solution

Whenever we think of a project, we always think in terms of the trio of time, money, and scope. Thinking in terms of project management factors often leads to contrived solutions.

I suggest thinking broader as if you don’t have the constraints of time, money, scope, or resources. Narrow it down by adding constraints one at a time. This helps with innovative thinking and brings about inventive solutions.

3. Deal with data the right way

How do you think about the data? Do you think of it as a spreadsheet with columns, rows, and cells? Or do you think of a database with multiple data sources?

Whatever is the format of your data, the role of data in your solution is crucial to detect patterns using machine learning algorithms.

AI depends on data mining methods that extract useful information from large data sets, and data is the driver of the algorithm.

The 5 Vs. of data: Value, Velocity, Variability, Veracity, and Volume will determine the solution that you would employ.

The data preparation is at the heart of the data science process. This is where the technologists work with SMEs (Subject matter experts) to determine what the data means, what data should be dropped, what data should be transformed, what do you do with NULL values etc.

4. Plan with human-centric design

One of the things that separate machine learning from traditional software development is that we are coding for the likelihood of an outcome based on past results. Therefore, the need for human-centric design is crucial.

A project that I worked on comes to mind – I was involved in designing and developing an intelligent bot for a doctor’s office. We took the time to meet with the doctor and identify their customer personas, the types of issues they address, and the outcomes they achieve for their patients.

Their team walked through the steps of prospecting, customer on-boarding, and finally, customer results. From that, they were able to create an intelligent agent that appears seamless to the customers, and the conversation flows naturally.

Other things to keep in mind are:

What metrics are you measuring?
What are the inputs and outputs of this system, and where do they come from?
What is the ROI?

The human-centric design as defined by ISO standards.

5. Engage business users during the process

This seems self-explanatory, but you would be surprised at how often we forget to do this. We might be the experts at implementing machine-learning algorithms, gathering insights from data, and implementing sophisticated technologies, but the business users are the subject matter experts.

If you are working in the healthcare sector, you will spend a lot of time with the physicians and other team members to explain the business process, what the data elements mean, and how they measure outcomes.

While technologists can work across diverse industry sectors, it is only the business users who are the domain experts in their industry. They will help with understanding the data as well as the metrics.

6. Understand the process workflow

Today’s organisations are running a mile a minute and often do not discuss, document, or debate their business processes.

This usually leads to exciting discoveries by the business since it might be the first time that they are thinking through their processes.

For example, if the algorithm is used for medical image analysis and diagnostics, questions about when are the images taken, what steps do they undergo before arriving at a physician’s desk and what determines the diagnoses, are some things to think and talk about with the business users. In this case, they are the radiologists, physicians, and other technicians.

7. Simple pilot implementation

Enterprises have several legacy systems that have been brought together over the years and form an intricate web that makes data challenging to figure out.

You can overcome this complexity of interconnected legacy systems with an assessment of current technology infrastructure. A technology architecture gap evaluation can help all stakeholders understand what systems are in place, the synergies between these systems, and what meaningful data exists.

Armed with this knowledge, get a clearer understanding of the successes and shortcomings of how they currently operate, you can develop small pilot initiatives that can be validated before you proceed with an overall solution.

8. Break departmental silos

In large organizations, the left hand usually does not talk to the right. Bringing multiple departments, teams, and sometimes vendors together would help break the silos and move you towards developing a holistic solution.

Reaching out to relevant stakeholders to ensure that misunderstandings or corporate policies don’t impede successful execution is a must.

Ensuring that all parties involved are informed about the inputs, outputs, and the success criteria will lead to an effective and efficient solution.

9. Select the right tools and algorithms

Select the programming language: There is no wrong tool. Select the tools, either R or Python, depending on your level of expertise. The other criteria would be the type of problem that you are trying to solve. For example, Tensorflow has useful libraries for image classification.

Select and research the algorithm: You can find plenty of open-source implementations of algorithms that you can code review, diagram, internalise, and implement in other languages.

10. Keep updated about the latest tools

Machine learning and data science is an evolving field, and every day there are discoveries made, new tools introduced, and more open-source libraries and projects available for reference.

Various research publications, books, blogs, and GitHub repositories can help you be on top of the learning curve and avoid any rework in the long run.

11. Unit Test

Always ensure that you are testing to make sure your machine learning model works. The easiest way to do this is to use a small subset of your data to overfit the model. This would be a quick test to confirm your model is sound.

You can also use test-driven model development that helps you test in small modules. Tests can be written for functions and methods, whole classes, programs, web services, complete machine learning pipelines, neural networks, random forests, mathematical implementations, and many more.

12. Present business results in a business language

Almost all business users are concerned about the business outcomes rather than the magic that is behind the scenes (in this case, the machine learning models).

Excellent business presentations include presenting the representative data elements that were used (e.g., customer engagement, customer feedback, customer’s buying patterns, etc.) and the outcomes the model provided. Also, make sure to include the accuracy levels since AI models are not exact results but present predictive analytics about the results.

Author

Swathi Young

Swathi Young is a keynote speaker, blogger, community-builder and Chief Technology Officer of Integrity Management Services Inc., a healthcare services company, where she is leading innovative AI solutions for clients. In her 20+ years of technology experience, she has led over 100+ projects globally - Belgium, India, and the United States across a number of Fortune 100 companies like GE and Oracle. Swathi is passionate about using cutting edge, artificial intelligence technologies to increase the performance of organizations. She believes that the intersection of Artificial Intelligence and humanities is important to focus on as we lay the foundation of AI applications for future generations. Swathi is currently researching the ethical implications of biased data in AI applications. She has built a community of 3000 strong emerging technology enthusiasts in Washington DC. Her Tech Tempo Tuesday newsletters are received by over 1000 members to learn about trends in AI. She often keynotes on bias in AI and how to build ethical AI systems.

View all posts

Swathi Young March 21, 2020

5 minutes read