Category
AI Strategy

Transform your product with Traditional AI

The success of a traditional AI/ML initiative is predicated on access to data and to training sets from which algorithms can learn prediction approaches. The Big Data mantra of the past decade has led to customers building up data warehouses to record vast amounts of data and to archive them to support various analysis/prediction initiatives.

The success of a traditional AI/ML initiative is predicated on access to data and to training sets from which algorithms can learn prediction approaches.

The Big Data mantra of the past decade has led to customers building up data warehouses to record vast amounts of data and to archive them to support various analysis/prediction initiatives.

However, access to data does not automatically lead to success in ML initiatives—we have developed and honed a process of evaluating data sets to guide successful integration of ML into existing products or to create new products.

Key questions for signal/data experimentation

The objective of the signal/data experimentation process, which comes between a product-strategy phase, and the deployment phase, is to answer the following questions:

Can we extract compelling value, aka a signal, from our datasets by applying machine learning?

What is the benefit of the extracted signal to our product? Is it compelling enough to warrant integration into the product?

Can we build a sustainable advantage by powering the “Virtuous cycle of AI”—ensuring that as we scale up our customer base, the value of our offering increases for all customers.

How would we productize an ML workflow—including on prem vs. cloud-hosted, specific framework choices, product integration readiness?

Assessment: Frame the problem

Much of the strategic work on problem framing is done in the assessment phase.

This problem framing should focus on customer-facing product outcome.

For example—

How might we predict spam in an email dataset so that we can place spam in a junk mail folder?

How might we extract  keyword-relationships from our unstructured text so that our customers can navigate to documents that rank for the chosen related keywords?

How might we predict failure conditions in our IOT infrastructures so that we can schedule preemptive maintenance?

How might we identify faulty parts on our assembly lines so that a picker robot can discard them before they are used in production?

Choose the prediction variable

A key decision is about what variable(s) you would like to predict using ML. Once a thorough framing has been done, it is pretty straightforward to list out the variables that are needed for your particular problem domain.

For a given problem there may be a handful of prediction variables, each of which may require a different ML approach.

Some examples include—

  • Is an email spam? Predicted values can either be True, or False. Alternately it can be a number between 0.0 and 1.0 representing the likelihood of being Spam
  • What is the likelihood of imminent failure of a part in a manufacturing system: Range of predicted values are a number between 0.0 and 1.0
  • Is a part in a manufacturing line faulty? Predicted values can be either True, or False
  • What recent news articles match a user’s interest? Predicted values are a list of news articles, each with a score ranging between 0.0 and 1.0

Traditional AI/ML: 6 Key Steps

Traditional AI/ML Process

1/Gather the data set

Gathering the dataset can sometimes be straightforward. Often times, however, it can be a challenging exercise in and of itself.

The needed dataset may need to be sourced from disparate data warehouses that were collected by different software elements perhaps hosted in different organizations

Key data may not be present in the data owing to earlier choices about what to store and what to discard. 3rd party datasets may be useful in generating results useful to the business—there is a wealth of public-domain datasets available for modeling. Also, various data providers have datasets available for purchase, often at reasonable rates.

Data annotations may not be available—this could require a step of framing a data annotation strategy using inhouse resources, a crowd sourcing platform such as Mechanical Turk or the services of a company that specializes in data-labeling.

2/Research literature

Advances in ML techniques are occurring at breakneck pace— new techniques keep coming up and placed in the public domain. Internet sites such as medium.com and github are repositories of prior art that may be applicable to your domain.

We conduct a search of prior implementations to learn how others have approached similar problems, what techniques were useful and which pitfalls to avoid.

3/Choose the ML algorithm and model for the data

From the problem framing, one or more of these will be applicable:

  • Unsupervised Learning
  • Supervised Learning
  • Reinforcement Learning
AI/ML techniques when applying algorithms to build models:
Unsupervised Learning

Supervised Learning

Reinforcement Learning

Many different techniques can be applied. For example, processing / NLP tends to be a combination of multiple approaches and domain-specific techniques such as Part-of-speech tagging and entity recognition. However, IIOT analysis often requires time series analysis using techniques such as RNNs. Other choices to be made at this phase are Deep learning or classical techniques. Also, whether to use a cloud framework  (eg Azure ML, SageMaker, Google Colab) or a local deployment of Python/Keras/TensorFlow.

At this time, don’t be constrained to select one particular approach— It may be useful to try several different approaches on the same dataset and assess each on factors — such as Accuracy Prediction Time, Resource cost (eg memory, # cpus, GPUs etc.).

4/Frame experiments

Experiments start with a hypothesis.

Then the following elements must be defined:

•Specify the measure of success —eg precision/recall, or the accuracy of a failure prediction

•Determine the evaluation methodology —eg K-fold testing, hold-back testing, AUC/ROC etc.

•Define the data cleansing preprocessing approach

•Number of experiments to run

•Reporting strategy, what and with whom—eg Whiteboard, Google Docs, Slide set presentation to management

5/Run experiments

Run experiments to validate the presence of signal in the dataset.

•Prepare the data set(s)

•Develop the model(s)

•Evaluate the predictions

•Iterate on data set(s) and approaches

•Tune hyperparameters

•Log results, and associated experiment settings including hyperparameters, carefully in an experiment notebook

6/Evaluate results

The results of the experiments, both what was successful and what discarded, should be explained to the stakeholders with an appropriate setting of expectations.

•Define a clear statement on whether the data set is useful for ML

•Assess the ML pipeline

•The key data fields that are needed by the model

•Data cleansing steps needed before feeding data into the algorithm

•Data filtering needed such as normalization and regularization

•Hyperparameter choices and the model sensitivity to each hyperparam

•Develop assertion of which algorithms have been found to be useful

•Create report summarizing detailed outcomes of experiment runs including algorithms chosen, hyperparameters, run time memory / time footprint, values of the chosen measure(s) of success

We guide you through our process to get your product AI-ready

Bring business strategy together with the latest AI/ML technologies and best practices, to help deliver leading-edge AI-powered solutions.

Contact

Let us create your category,and define your AI  strategy

Request a consultation today.
Get in Touch