There’s a lot of talk about machine learning and AI. Every day, it seems, there is a new story about some remarkable thing powered my machine learning, such as diagnosing cancer, finding oil and gas, writing music, or detecting plant diseases. According to the Harvard Business Review, machine learning is already changing business, improving the quality of work, and making prediction cheaper.
Open data science is the critical capability that makes it possible for organizations to apply machine learning and artificial intelligence at scale. Working data scientists prefer to use open source software, such as Python, R, and Apache Spark, for many reasons. These include comprehensive functionality, flexibility, extensibility, transparency, and innovation.
However, many organizations have a large footprint of legacy analytics software. Executives in these organizations struggle to manage the growing cost of this software and to encourage users to adopt open source tooling.
Migration to open data science is challenging for several reasons. Existing users of legacy software often have strong personal preferences and resist switching. Programs written with legacy software must be rebuilt in new tools. Data may be siloed within the legacy platform. Complicating matters, commercial software vendors use community-building techniques to cultivate loyalty among end users.
Nevertheless, we see organizations successfully transition to a culture of open data science. This makes it possible for us to identify a series of transitional steps for organizations. These include understanding user needs; aligning software to needs; eliminating data silos; code migration; and training users on new tools.
We close the presentation with a discussion of keys to success in building an open data science culture. They include such things as executive leadership, cost transparency, and clear metrics of user adoption and success with open data science tools.
Thomas W. Dinsmore is a Senior Director for DataRobot, an AI startup based in Boston, Massachusetts, where he is responsible for competitor and market intelligence. Thomas’ previous experience includes service for Cloudera, The Boston Consulting Group, IBM Big Data, and SAS. Thomas has worked with data and machine learning for more than 30 years. He has led or contributed to projects for more than 500 clients around the world, including AT&T, Banco Santander, Citibank, CVS, Dell, J.C.Penney, Monsanto, Morgan Stanley, Office Depot, Sony, Staples, United Health Group, UBS, Vodafone, and Zurich Insurance Group. Apress published Thomas’ book, Disruptive Analytics, in 2016. Previously, he co-authored Modern Analytics Methodologies and Advanced Analytics Methodologies for FT Press and served as a reviewer for the Spark Cookbook. He posts observations about the machine learning business on his personal blog at thomaswdinsmore.com.