Data vs Information
and how to reach knowledge
Post published on 04/11/2020 by Donata Petrelli and released with licenza CC BY-NC-ND 3.0 IT (Creative Common – Attribuzione – Non commerciale – Non opere derivate 3.0 Italia)
If I say 150 … what are you thinking about? Probably nothing (no, that’s not the total of the shoes in my collection 🙂 ).
A help for you. I add the unit of measurement, centimeters. If I say 150 cm what do you think? To a length. Now I add that 150 cm is the side of the square of an empty space in a corner of my house
If I multiply 150 cm x 150 cm I get an area to furnish and with which I can decide whether to buy that lovely table that I saw yesterday at Ikea and that measures 55 cm x 55 cm.
Since 3025 cm2 < 22500 cm2 then yes, I’ll take the Ikea table!
By replacing this value with another one that is significant for you, such as a cost, a deviation in turnover, an increase in newsletter subscribers, etc., we immediately realize the importance of information, starting from reliable data, in order to make decisions.
From this simple example where:
- 150 is data
- 150 cm length is information
- 22500 cm2 area is knowledge
- 3025 cm2 < 22500 cm2 represents the decision-making process
you understand that everything starts with data. But how do you reach final decision?
To those who are interested in knowing the path … I invite you to continue reading this article 🙂
DATA and BIG DATA
The word “Data” originates from the Latin “datum” which means ” fact”. Data is the objective representation of a fact. It is a raw value that takes on a meaning if it is contextualized in the reality it represents.
150 is data of our initial example. A value that, measured in centimeters, assumes the meaning of length on the basis of which it is possible to obtain information about a space to be furnished.
There are different types of data, on the basis of which they are then classified. Basically there are two distinctions:
- Simple data: a word, a number, a sign;
- Complex data: logical aggregation of different types data (e.g. records and tables)
In addition, depending on their nature, we can distinguish them in:
- Digital data: quantities that assume values within a set of discrete dimensions; an example is the bit, which can assume the binary value “0” or the binary value “1”;
- Analog data: quantities that assume values in a continuous set, such as electrical signals produced by sound waves.
With the large presence of Big Data we are dealing with unstructured data.
The term “Big Data” represents a very large set of data, of which there is no dimension that can be taken as a reference value. At the same time it’s complex data because heterogeneous. It can be different types, e.g. text, numbers, images, links, etc., and it can be both structured and unstructured..
Beyond the technicalities, Big Data is important because the basis without which no economic and technological development would be possible today.
The use by companies has many advantages:
- cost reduction
- time reduction
- product innovation
- optimization of the current offer
and, of course, be able to make optimal decisions. In other words, make progress.
On the technological side, the use of Big Data has allowed the growth of the entire Predictive Analysis sector.
The word “Information” also comes from Latin and means “to shape the mind”. It’s the meaning we attribute to data, once processed and interpreted. Going back to our example, information is the measure of the side of the square of the space to be furnished with the Ikea table.
Information production process is divided into three phases:
- data acquisition
- data processing
- issue information
Each phase is fundamental and at the same time complex and often requires vertical expertise in Data Science.
Knowing where to find data, which reliable sources to use and how to extrapolate it is the first necessary step.
Once the raw data are obtained, they are not immediately usable but must be properly processed to become meaningful and understandable information. To do this we must use techniques for their treatment and processing that require knowledge on several fronts, mathematical and statistical first of all but also techniques to operate.
Finally, we need the communication principles for the visualization of the final results of data processing and the knowledge of the tools for the visualization of the information obtained.
The ultimate goal of obtaining information is to be able to make decisions and take subsequent actions. The decision-making process is called knowledge.
Knowledge can be extracted in two different ways:
- Passive. These are the classical statistical analysis methods, query and reporting systems including multidimensional analysis – OLAP
- Active. Represented by inferential statistics, Machine Learning and Data Mining techniques.
Thanks also to Big Data the second direction is the predominant one today.
The path to knowledge is long and often complicated but essential to make the best decisions that can determine the success or otherwise of our business.
Everything starts with data. The processing depends both on the type of data and size but also on the goals we want to achieve and the resources we have available.
In the end, if I’ve got data 42 … what is information we can get from it? 🙂