Periodically my colleagues and I raise the topic of machine learning and artificial intelligence in the everyday application of data management and data quality and every day it seems I read something that suggests that if you are not using or considering AI or ML you may be missing out.
It is often stated in the press and articles on data management and data science that the biggest drag on insights and innovation is around getting data into a fit state to be used for any meaningful use.
By a fit state, I am of course referring to not just the reshaping of data but also the application of basic rules and conditions to the data. You would do this to ensure that inappropriate outliers and glaringly wrong data is eliminated so as to ensure that the data that one use for insights and analysis, is as good as it can possibly be.
There is a pragmatism that needs to be applied to this of course and that I guess, is where my thoughts dwell, when thinking about how software can be taught to accelerate the data cleansing and data hygiene function. It is not the consideration of AI or ML as a knee jerk response to a fear of missing out so much as a concern that in the absence of application “smarts”, users are wasting a lot of resources on fixing data when it could be fixed proactively.
A year or two back, Gartner predicted that by next year, almost every new piece of software on the market will leverage AI technology in some form or fashion. So it’s interesting to note that the recent acquisitions by both Google (of Looker) and Salesforce (of Tableau) both focused on products that offer data prep, data wrangling, data cleansing, data visualization, data presentation, and data action as capabilities in their technology stacks.
Google seems to have so many existing capabilities that it is easy to think that its acquisition could be relegated to the Google graveyard of killedbygoogle but there are some interesting AI and ML pieces even in there and one wonders if this was part of the acquisition decision.
A lot of these capabilities are squarely aimed at what Looker referred to as “professional and citizen data scientists”.
The Salesforce acquisition of Tableau seems perhaps even more deliberate in this regard – it is argued that the Tableau purchase lands Salesforce, the leading analytics platform in the market. An important position given that small or large business using Tableau (86,000+ organizations) now open up to Salesforce pitching more offerings. Tableau too, has AI and ML components to ease the manipulation and usefulness of data.
Mid 2018, Tableau acquired Empirical Systems an AI startup specializing in automated statistics.
Starting out as the MIT Probabilistic Computing Project, they had developed a unique analytics engine applying statistical techniques to automatically uncover insights hidden inside all kinds of data.
Typically, this kind of approach to data requires a trained statistician with familiarity with the data and programming techniques to analyze and review all the data at their disposal and then work out what data is worth investigating further and which data is, in fact, relevant to the ultimate goal and intent of the analysis.
“Empirical” supposedly automated the analysis and data modelling and easily identified correlations, outliers, and patterns in the data. What Tableau would go onto describing later as “smart analytics”.
So, with that as a bit of a background, I thought I would think a little more on where AI and ML might be specifically applied in this area as it relates to data quality and data improvement.
SMURF, SMURF, SMURFETY, SMURF!
One area where we know that it is pretty obvious to consider AI and ML, is in the area of fraudulent or non-compliant transaction identification. It’s obvious because it is either associated with revenue leakage, the unbridling of cost management and the risk of penalties and fines.
Most of us would have experienced this with our banks when we use a bank or credit card abroad or on a foreign website. Transactions get flagged, possibly blocked, and we get contacted in near real-time by email, phone call, in-app notification or SMS. Banks use a variety of techniques to manage this, a lot of it wrapped up in where they think the transaction is happening relative to where you spend most of your time shopping and transacting.
When the horse has bolted so to speak, it is still often useful to evaluate transactions and master data in batches, on a schedule perhaps, to identify anomalous events. Sometimes this is handled by audit and exception reporting, but often those audit and exception reports have very specific thresholds against which the transactions are assessed.
Continuous training of an AI monitor on transactional data, could, for example, recognize explicit behaviour where a user or operator is running or putting through transactions that are outliers from the normal distribution of transactions. Manual training of exceptions or periodic batches improves the analysis algorithms. Smurfing, is the practice of ensuring that you make deposits into bank accounts that avoid triggering compliance reporting to the regulators, by the banks.
While banks likely use a number of explicit methods to identify this and related kinds of behaviour, it is a great candidate for AI in environments where you simply want to know where some activity contains weird outliers or anomalies. These same practices can also be applied to stock trades and in ERP and CRM, stock pricing, inventory and SKU pricing, materials cost assessment and inventory handling.
Reducing inventory complexity and costs
I had occasion recently to look at a list of inventory items being purchased by a large healthcare organization and of particular interest to me, was the likelihood of there being duplicates items in the inventory.
In my cursory manual analysis, I found items with different record identifiers but with exactly the same attributes, right down to lot size and manufacturer, however differently priced. Only with further investigation did I determine that the same items were located in different categories, classes or groupings of inventory attributes for aggregation or segmentation but not for inventory collation as a whole.
Now arguably, the items did need to be listed more than once, because in all likelihood, the application search that the users might engage in, would focus on just the area that they are interested in, but this is another area where applying AI could quickly identify the presence of duplicate records and alternative approaches could be considered in order to minimize the likelihood of duplicate or redundant records and in fact save the institution on poorly negotiated procurement agreements. Cash strapped healthcare organizations, particularly in public health would always want to leverage the lowest costing items whenever possible and duplicate entries introduce value dilution and could cause confusion and decision delay.
This deduplication process needed, in fact, spans much more than inventory, it affords the opportunity to consolidate orphaned patient records elsewhere in health care and the unification of customer and vendor accounts in order to have a more holistic view of the business partner.
Filling in the blanks
Using AI to auto-fill missing data seems a straightforward thing to have done, as long as it can be done without introducing unnecessarily negative results or vulnerabilities.
Many applications are still fraught with the risk of incomplete data entry due to poor capture screens, a lack of validation and insufficient data entry rules.
In such circumstances, AI could potentially prepopulate and include data without the need for explicit entry or manual intervention, this could include the appending of supplementary data which could enrich or accentuate the data further for additional downstream activities like marketing segmentation.
Using a combination of matching algorithms and machine learning a captured data set can go from routine master or transactional data to a rich source for more advanced analytics and decision making.
Salesforce Einstein and SAP Leonardo
Incidentally, Salesforce uses an AI-powered duplicate management system for automatically checking that data is “accurate and devoid of duplicate entries” and this feature is automatically activated in the platform – it has some limitations though and does not work well with high levels of duplicative data. Salesforce also offers the Einstein Prediction Builder which is a customizable AI for Salesforce admins.
SAP hasn’t made much that I have seen, in terms of using Leonardo in the same ways as Salesforce has Einstein, it has focused on “Conversational AI” and metadata-based (read that as structured rules) bots for what has been described as “Intelligent Robotic Process Automation” services. SAP views these as being ways to accelerate and improve interactions across a broad swathe of CRM, ERP, HCM and Supply Chain applications.
In principle, you could have AI scrutinize sales opportunities and work out whether sales reps are either naively over-confident or withholding their opportunities and only entering them when they are very confident that they will secure a deal or when sales incentives are introduced, particularly with the intent of accelerating deal closes at quarter-end or period-close.
I have also mentioned Microsoft Dynamics 365 in the past, and Cognitive Services. Dynamics 365 also comes with a host of built-in AI. These are largely out of the box including Relationship Insights, Lead Scoring, Cortana based Product Recommendations that push the right items to online shoppers, rather than waiting until they check out to make suggestions and Predictive Sales & Inventory Forecasting.
In this case, AI could smooth the pipeline and rationalize the expectations about the future of deals and forecasting.
Resolving data issues has to be considered with a bi-modal approach in mind even if one considers AI and ML as part of the solution mix. On the one hand, one considers how to avoid real-time data entry issues through intelligent searching and matching accompanied by auto-filling and enrichment. At rest, the data suffer similar challenges. The data becomes stale, out of date or becomes contaminated or compromised as a result of a great many possible factors.
A batched or scheduled approach needs to be considered as complementary and this too can incorporate AI and ML in how decisions are made and actions are taken. The technology has the power to execute for most use cases today, perhaps not at massive scale and perhaps in very specific and possibly narrow or as ‘weak AI’ propositions but these may just be the starting point for what Amazon's Jeff Bezos considers the AI Golden Age.