Experfy
No Result
View All Result
  • Home
  • Future of Work
  • AI & Machine Learning
  • Big Data & Cloud
  • IoT & Automation
  • Software
  • ConsumerTech
  • HealthTech
  • FinTech
  • Home
  • Future of Work
  • AI & Machine Learning
  • Big Data & Cloud
  • IoT & Automation
  • Software
  • ConsumerTech
  • HealthTech
  • FinTech
No Result
View All Result
Experfy Insights
No Result
View All Result
Home Big Data & Cloud

AI Series: Data Scientists, the modern alchemists.

Michele Vaccaro by Michele Vaccaro
November 9, 2018
in Big Data & Cloud
5 min read
0
AI Series: Data Scientists, the modern alchemists.
Share on FacebookShare on Twitter

Ready to learn Machine Learning? Browse Machine Learning Training and Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.

"…The narrow spiral staircase led into a larger room barely illuminated by few torches hanging on the brick wall. Two tables in the center of the room were completely covered by the strangest shapes of alchemical stills. A glass alembic was inhaling smelly and lazy vapors produced by a bubbling liquid in a heated cucurbit near a mortar and its pestle. Copper retorts of different sizes, small flasks containing white lead, sulfur and mercury and other distilling vessels were aligned on old wooden shelfs. Strange light effects were created by a bottle of Spiritus Vini reflecting the light coming from a heated pot were vaporized sulfur was transforming liquid mercury into a yellow solid, very similar to gold…"

Even if many centuries have passed since they were trying to transmute base metal into gold, our current scientific knowledge is so hugely deeper and broader on all fields and alembics and cucurbits are replaced by powerful computers, I cannot avoid recalling medieval alchemists when I think about modern data scientist fascinating mission of transforming data…into gold.

In first place, data scientists need to understand the nature of the problem they have to solve. In machine learning there are mainly 3 types of problems: Classification, Regression and Clustering. Classification tasks involve the ability to assign the input data to categorial labels, like “yes” or “no”, “True” or “False” or more complex ones like face recognition by assigning a face to the name of the person it belongs to. Regression tasks are similar to the classification tasks, but the prediction is related to a continuous value rather than a category of objects. Teaching an algorithm to predict how the prices related to a specific product or service will change under a specific set of circumstances is a regression problem. Clustering problems are closer to the traditional data mining tasks where the need is to analyze unlabeled data to discover specific and hidden patterns allowing the extraction of powerful insights like in the case of product recommendation.

Once the problem is clear, the data scientist will have to define which learning strategy will best serve the cause. Choices depends upon many different elements, including: How many data are available? Are they labelled or not? Are there algorithms or neural networks that have been previously trained on similar datasets? In my previous post I’ve already introduced the most popular learning strategies: Supervised and Unsupervised.

A supervised learning approach is probably the best choice if I have big datasets of labelled data, lots of computing power and I’m dealing with classification or regression problems while unsupervised learning is the best choice in case of clustering tasks and no availability of labeled data. But many other learning strategies have emerged over time like in the case of transfer learning which leverage an existing network previously trained on a similar domain and fine-tune the model by re-training only the last few fully-connected layers thus re-using the features that were detected and learned from supervised training cycles applied to a different task. 

Another approach is offered by Deep Belief Networks, or DBNs. They use standard neural networks but implement a radically different training method. Instead of starting from random values, the network is initialized by an unsupervised pre-training phase using unlabeled data sets from which it will learn multiple layers of features. When the pre-training phase is over, all the weights and biases of the net will be very close to their optimal values and the final phase will just consist of a short, supervised fine-tuning session with back propagation and relatively few labeled examples.

Both transfer learning and BMNs allow to reduce the training time and the need of huge labeled datasets.

Last, but definitely not least, the data scientist will have to decide which algorithm, among a big variety of algorithms, will provide the best performances.

In my previous article I’ve introduced the very popular neural networks which can come in many different flavors: from its simplest form of Multi-layer Perceptron to the powerful architecture of the Convolutional Nets or the sophisticated complexity of the Recurring Neural Networks specialized in treating sequential data where the next data point depends on the previous ones, like in the case of stock prediction, text generation and voice recognition.

But Neural Networks and Deep Learning are just elements of a much broader and richer set of Machine Learning algorithms, that can cover all possible problems. The regression algorithms family, clearly well suited to solve regression type of problems, offers algorithms fast to model, particularly useful when the relationship to be modeled is not extremely complex and if you don’t have a lot of data. Linear and Logistic Regression algorithms are the simplest algorithms of this family. Clustering algorithms, as the name suggests, are particularly efficient with unsupervised learning tasks when grouping sets of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups. It is a main task of exploratory data mining and a common technique for statistical data analysis borrowed by machine learning. K-Means and Hierarchical Clustering are popular algorithms that belong to this family. For Supervised Learning on Regression and Classification tasks, decision trees and Bayesian algorithms are often a good, simple and powerful approach.

And these are just few examples among the many available machine algorithms that data scientists can use to solve their challenges and that we’ll explore in the next articles.

But while data scientists can leverage existing best practices and guidelines around which combination of problems, datasets, learning strategies and algorithms should be used to achieve the best results, it’s also true that Machine Learning is not an exact science, it’s evolving rapidly and it’s relatively new. And this is where the art of experimenting new approaches, by wisely combining, often empirically, the different ingredients, make the mission of our modern data scientist so complex and fascinating to appear magical.

A data scientist is not ‘only’ a physician or a mathematician who knows how to implement code in Python. He or she, develops his/her abilities use case by use case, leveraging best practices but often exploring new ways to approach old problems, combining different learning techniques or chaining different classes of algorithms to optimize data to improve prediction quality and performances or to overcame previously unseen obstacles and challenges.

And similarly to their alchemist ancestors who, in their stride of transforming rocks into gold, paved the way to modern chemistry, our modern data scientists, in their effort of extracting gold out of data, are laying the foundations for future generations of AI.

Tags: Data Science
Previous Post

The Evolution of Data Preparation and Data Analytics

Next Post

Ways In Which Machines Learn

Michele Vaccaro

Michele Vaccaro

Michele Vaccaro is Solution Consultant Director at OpenText, a market leader in Enterprise Information Management software and solutions.

Next Post
Ways In Which Machines Learn

Ways In Which Machines Learn

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR POST

  • A Comparison of Tableau and Power BI, the two Top Leaders in the BI Market

    A Comparison of Tableau and Power BI, the two Top Leaders in the BI Market

    11137 shares
    Share 4460 Tweet 2782
  • Insights To Agile Methodologies For Software Development

    2473 shares
    Share 989 Tweet 618
  • Why you should forget loops and embrace vectorization for Data Science

    2287 shares
    Share 915 Tweet 572
  • Greedy Algorithm and Dynamic Programming

    1748 shares
    Share 699 Tweet 437
  • Cloudera vs Hortonworks vs MapR: Comparing Hadoop Distributions

    1624 shares
    Share 649 Tweet 406
Experfy Insights

Experfy Insights provides cutting-edge perspectives on Big Data and analytics. Our unique ability to focus on business problems enables us to provide insights that are highly relevant to each industry.

Join Us At

About Us

Contact Us


1700 West Park Drive, Suite 190
Westborough, MA 01581

Email: [email protected]

Toll Free: (844) EXPERFY or
(844) 397-3739

© 2020, Experfy Inc. All rights reserved.

No Result
View All Result
  • Home
  • Future of Work
  • AI & Machine Learning
  • Big Data & Cloud
  • IoT & Automation
  • Software
  • ConsumerTech
  • HealthTech
  • FinTech

© 2020, Experfy Inc. All rights reserved.