ERAN SHLOMO

Unstructured Data Management, Labeling and Pipelines

Chapters

Unstructured Data Management, Labeling and Pipelines

BY ERAN SHLOMO

Book

Intro

Over the past decade, I have spent a significant part of my time working on data and machine learning systems, always with a human angle attached to it. I have gained a lot of knowledge, experience, and led the development of several human-machine expert systems.

Chapters

7 chapters

Intro

Over the past decade, I have spent a significant part of my time working on data and machine learning systems, always with a human angle attached to it. I have gained a lot of knowledge, experience, and led the development of several human-machine expert systems. In my current role, I’m the CEO & CTO of Dataloop.ai, which develops a data lifecycle management platform.

Chapter 1

Whenever I wish to deeply understand a topic I will always start with the bigger picture, as I believe it is critical to waterfall the “why” understanding along with the value chain. The why flows all the way from global trends, through to our workplace values, and finally to our daily work and life.

Chapter 2

I define a cognitive application as an application that can completely replace the collection of human cognitive actions, or the “thinking” part of a given work task or skill. In many cases these applications will start as human assistants, gradually replacing humans completely as they get more reliable and broader.

Chapter 3

So, data is critical for developing AI bots or cognitive applications, but that line of thinking can be misleading. The common phrase of “data is the new oil” is often used to express the value of data while ignoring the more important aspects of information and knowledge.

Chapter 4

AI development is essentially the process of collecting and organizing information. Data is collected, its meaning is extracted as information pieces, and then it’s structured into a format that allows future learning for the knowledge that these information pieces represent.

Chapter 5

The training process is the process in which we take our training set, i.e. the collection of data examples we’ve collected and create a model that learned from this example. We call this “training” and not “coding” since the model is created automatically from our data, with no coding involved. The result of our training session is a code we can then run that predicts its learned properties as a result of the new data.

Chapter 6

While often bias and variance terms are usually being discussed by data scientists and ML experts, understanding them requires no technical skills and is critical for anyone working with data-driven products, after all these are the data modeling bugs that will hurt our user’s experience and our product competitiveness. Time to gain deeper intuition on these concepts, no worries, you will understand them without a single equation involved.