Pages

This blog is under construction

Tuesday, November 12, 2019

Data Handling and Analytics: Types of Data and Characteristics of Big Data

Data handling:
 Ensures that research data is stored, archived or disposed off in a safe and secure
manner during and after the conclusion of a research project
 Includes the development of policies and procedures to manage data handled
electronically as well as through non‐electronic means.

 In recent days, most data concern –
 Big Data
 Due to heavy traffic generated by IoT devices
 Huge amount of data generated by the deployed sensors

What is Big Data
“Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling the high-velocity capture, discovery, and/or analysis.”

Types of Data:

Structured data
 Data that can be easily organized.
 Usually stored in relational databases.
 Structured Query Language (SQL) manages structured data in databases.
 It accounts for only 20% of the total available data today in the world.

Unstructured data
 Information that do not possess any pre‐defined model.
 Traditional RDBMSs are unable to process unstructured data.
 Enhances the ability to provide better insight to huge datasets.
 It accounts for 80% of the total data available today in the world.

Characteristics of Big Data
  Big Data is characterized by 7 Vs –
 Volume
 Velocity
 Variety
 Variability
 Veracity
 Visualization
 Value

 Volume
 Quantity of data that is generated
 Sources of data are added continuously
 Example of volume ‐
        30TB of images will be generated every night from the Large Synoptic Survey Telescope
(LSST)
        72 hours of video are uploaded to YouTube every minute

 Velocity
 Refers to the speed of generation of data
 Data processing time decreasing day‐by‐day in order to provide real‐time services
 Older batch processing technology is unable to handle high velocity of data
 Example of velocity –
       140 million tweets per day on average (according to a survey         conducted in 2011)
       New York Stock Exchange captures 1TB of trade                           information during each trading session

 Variety
 Refers to the category to which the data belongs
 No restriction over the input data formats
 Data mostly unstructured or semi‐structured
 Example of variety –
       Pure text, images, audio, video, web, GPS data, sensor data,         SMS, documents, PDFs, flash etc.

 Variability
 Refers to data whose meaning is constantly changing.
 Meaning of the data depends on the context.
 Data appear as an indecipherable mass without structure
 Example:
        Language processing, Hashtags, Geo‐spatial data,                           Multimedia, Sensor events

 Veracity
 Veracity refers to the biases, noise and abnormality in data.
 It is important in programs that involve automated decision‐making, or feeding the data
into an unsupervised machine learning algorithm.
 Veracity isn’t just about data quality, it’s about data understandability.

Visualization
 Presentation of data in a pictorial or graphical format
 Enables decision makers to see analytics presented visually
 Identify new patterns

Value
 It means extracting useful business information from scattered data.
 Includes a large volume and variety of data
 Easy to access and delivers quality analytics that enables informed decisions.

No comments:

Post a Comment