Data handling:
Ensures that research data is stored, archived or disposed off in a safe and secure
manner during and after the conclusion of a research project
Includes the development of policies and procedures to manage data handled
electronically as well as through non‐electronic means.
In recent days, most data concern –
Big Data
Due to heavy traffic generated by IoT devices
Huge amount of data generated by the deployed sensors
What is Big Data
“Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling the high-velocity capture, discovery, and/or analysis.”
Types of Data:
Structured data
Data that can be easily organized.
Usually stored in relational databases.
Structured Query Language (SQL) manages structured data in databases.
It accounts for only 20% of the total available data today in the world.
Unstructured data
Information that do not possess any pre‐defined model.
Traditional RDBMSs are unable to process unstructured data.
Enhances the ability to provide better insight to huge datasets.
It accounts for 80% of the total data available today in the world.
Characteristics of Big Data
Big Data is characterized by 7 Vs –
Volume
Velocity
Variety
Variability
Veracity
Visualization
Value
Volume
Quantity of data that is generated
Sources of data are added continuously
Example of volume ‐
30TB of images will be generated every night from the Large Synoptic Survey Telescope
(LSST)
72 hours of video are uploaded to YouTube every minute
Velocity
Refers to the speed of generation of data
Data processing time decreasing day‐by‐day in order to provide real‐time services
Older batch processing technology is unable to handle high velocity of data
Example of velocity –
140 million tweets per day on average (according to a survey conducted in 2011)
New York Stock Exchange captures 1TB of trade information during each trading session
Variety
Refers to the category to which the data belongs
No restriction over the input data formats
Data mostly unstructured or semi‐structured
Example of variety –
Pure text, images, audio, video, web, GPS data, sensor data, SMS, documents, PDFs, flash etc.
Variability
Refers to data whose meaning is constantly changing.
Meaning of the data depends on the context.
Data appear as an indecipherable mass without structure
Example:
Language processing, Hashtags, Geo‐spatial data, Multimedia, Sensor events
Veracity
Veracity refers to the biases, noise and abnormality in data.
It is important in programs that involve automated decision‐making, or feeding the data
into an unsupervised machine learning algorithm.
Veracity isn’t just about data quality, it’s about data understandability.
Visualization
Presentation of data in a pictorial or graphical format
Enables decision makers to see analytics presented visually
Identify new patterns
Value
It means extracting useful business information from scattered data.
Includes a large volume and variety of data
Easy to access and delivers quality analytics that enables informed decisions.
No comments:
Post a Comment