#
Data at the Heart of Modern Life
Part I
In this section, we will explore the 5 Vs, the growth of IoT and sensing devices, the impact of mobile and cloud computing on data generations, and the importance of data in a pandemic.
As our world becomes increasingly digitized, we are generating and collecting vast amounts of data from a variety of sources. From the sensing devices placed throughout our physical environment to the data we create through our daily interactions and activities, the amount of data we produce is vast and ever-growing.
But data is more than just a collection of numbers and information. It is a precious resource, one that will last long after the systems and technologies that generate it have come and gone. As Tim Berners-Lee said,
"Data is a precious thing and will last longer than the systems themselves."
— Tim Berners-Lee
In the future, we will see data play an even greater role in our lives as technologies like blockchain allow us to exchange value peer-to-peer using digital infrastructure. Autonomous vehicles, energy grids, entertainment, and transportation will all be transformed by the power of data. Data is a valuable resource that will shape our world for generations to come.
As more people and more things become connected, more data is generated, consumed, analyzed, stored, destroyed, and valued.
Using blockchain technology, we can exchange value for the first time - peer-to-peer - using digital infrastructure. Fun fact: the only other peer-to-peer infrastructure we have to exchange value is cash.
The future motivates me, personally. I love technology and love thinking about all the ways that we humans will interact with and experience technology. AR, VR, xVR, whatever the reality, it sounds cool and I look forward to it!
#
The 5 Vs of Data Analytics
Data plays a central role in modern life, and organizations of all types and sizes are facing challenges in managing the volume, variety, veracity, velocity, and value of the data they generate and collect. From states to universities, counties to tribes and pueblos, school districts to utilities, academic medical centers to businesses, the challenges of data management are widespread and complex. In this section, we will delve into each of the 5Vs and explore the key considerations for an effective data strategy.
#
1. Volume
As organizations continue to digitize, the volume of data being generated on a daily basis has become overwhelming. In fact, according to recent estimates, the average digital organization creates terabytes of data each day. Here’s a list of some important questions to consider, as the volume of data can have a significant impact.
- How much data does your digital organization create every day?
- How much of that data that is created by the organization needs to be collected?
- How much of the data is clean and is usable?
- How much of that data that is collected needs to be analyzed or processed?
- How much of that data that is collected needs to be retained?
- How will the volume of data impact the organization’s governance program?
- How much of the digital organization’s data is not being collected?
- Volume can be derived from # of users, # of endpoints, and other factors.
#
2. Variety
There is a growing movement within the cybersecurity industry to establish a common standard for sharing threat intelligence and communicating threat tactics and techniques. For example, Splunk recently announced a partnership with Amazon Web Services (AWS) to create the Splunk Security Lake, which aims to provide a centralized platform for storing, analyzing, and sharing security data. However, until there is a widely-accepted industry-wide standard for data sharing, organizations will continue to deal with a variety of data sources.
Each firewall vendor has a different log format, and endpoint telemetry may contain more information than what is reported by the firewall. While these data sources can complement each other, the differences in format can make it difficult for organizations to effectively analyze and utilize the data. Splunk provides tools that can help normalize this variety of data using a Common Information Model (CIM). By using CIM, organizations can more easily integrate and analyze data from multiple sources, enabling them to make more informed security decisions.
The following five items are key components of a successful data analytics program:
This refers to the process of combining data from multiple sources, such as databases, systems, and applications, in order to gain a more comprehensive understanding of the data.
Ensuring the quality of data is critical for accurate analysis and decision-making. This includes verifying the accuracy, completeness, and consistency of the data.
Data governance is the set of policies, procedures, and standards that govern the use, management, and protection of data within an organization. It helps to ensure that data is used ethically and responsibly.
Protecting data from unauthorized access and ensuring the confidentiality, integrity, and availability of data is essential for any organization.
This refers to the process of examining, transforming, and modeling data in order to discover useful insights and inform decision-making. It can involve a variety of techniques, such as statistical analysis, machine learning, and data visualization.
#
3. Veracity
Ensuring the veracity of data is essential for a successful data analytics platform. This includes considering factors such as accuracy, completeness, timeliness, consistency, and integrity, as these all contribute to the overall quality of the data. Ensuring that the data is of high quality is critical for accurate analysis and informed decision-making. There are multiple points of importance when it comes to the veracity of data:
Veracity refers to the truthfulness and accuracy of data. Data that is not accurate or reliable can lead to incorrect insights and decisions, which can have serious consequences for an organization.
Veracity also refers to the completeness of data. Data that is incomplete or missing key elements can impact the accuracy of analytics and insights, as well as the usefulness of the data for decision-making.
Veracity also refers to the timeliness of data. Data that is not up-to-date or current can be less useful for decision-making, as it may not reflect the current state of the organization or industry.
Veracity also refers to the consistency of data. Data that is not consistent or does not follow established standards and conventions can be difficult to use and integrate, and may lead to incorrect insights and decisions.
Veracity also refers to the integrity of data. Data that has been compromised or altered in any way can impact the accuracy and reliability of analytics and insights, as well as the trustworthiness of the data for decision-making. Ensuring the veracity of data is critical for the success of any data analytics and decision-making efforts.
#
4. Velocity
Data velocity is a measure of how fast data is being generated and processed. It is calculated using two factors: time and amount. Speed, frequency, volume, and variety are all components of data velocity that can impact how quickly data is generated and processed.
Imagine trying to measure the flow of electricity in a circuit. Just like electricity, data can flow at different speeds and in different quantities. To accurately measure the flow of electricity, you would need to consider factors such as the amount of current (similar to volume in data), the frequency of the current (similar to frequency in data), and the type of load on the circuit (similar to variety in data). Similarly, to measure data velocity, you need to consider the speed at which data is generated, the frequency of data generation, the volume of data being generated, and the variety of data being generated.
Understanding data velocity is important because it can help organizations make informed decisions about how to process and store data. For example, if an organization is dealing with high velocity data, it may need to invest in more powerful and efficient processing and storage systems in order to keep up with the volume and speed of data being generated. On the other hand, if an organization is dealing with low velocity data, it may be able to use less expensive and less powerful systems to process and store the data. By understanding data velocity, organizations can ensure that they have the right systems and infrastructure in place to support their data analytics efforts.
Breaking down speed, frequency, volume, and variety:
The velocity of data refers to how quickly data is generated and processed. With the increasing amount of data being generated, it is important for organizations to have the ability to process and analyze data in real-time or near real-time to keep up with the pace of change.
The velocity of data also refers to the frequency at which data is generated and updated. Data that is generated and updated frequently may be more useful for decision-making, as it can provide a more current and accurate picture of the organization or industry.
The velocity of data is also impacted by the volume of data being generated. Organizations that generate a high volume of data may need to invest in more advanced technologies and processes to handle the volume and velocity of data effectively.
The velocity of data can also be impacted by the variety of data being generated. Data that comes from a variety of sources and formats may be more difficult to process and analyze quickly, which can impact the velocity of data.
#
5. Value
- What is the value of a seashell?
- What is the value of a goat or a chicken?
- What is the value of 1 BTC?
- How about a Peso?
- The value of a piece of art is determined by a number of factors.
- The value of music, a concert ticket, or the autographed poster are all complex equations.
- What is the value of an hour of your time?
The value of something is determined by how much it is worth to an individual or group. This value can be subjective and can vary depending on a variety of factors. For example, the value of a seashell may be different for one person compared to another, depending on the individual's personal preferences and circumstances. Similarly, the value of data and data analytics can vary depending on the context and purpose for which they are being used.
Data itself can have value, as it can be used to inform decisions and drive business outcomes. Data analytics is the process of examining, transforming, and modeling data in order to discover useful insights and inform decision-making. The value of data analytics lies in its ability to provide valuable insights and help organizations make informed decisions.
Insights are the information or understanding gained from data analytics. They can provide valuable perspective and help organizations make better decisions. The value of insights depends on their relevance, timeliness, and usefulness to the organization. By understanding the value of data, data analytics, and insights, organizations can maximize their value and achieve better outcomes.
#
The Growth of IoT and Sensing Devices
The internet of things (IoT) refers to the growing network of interconnected devices, sensors, and other electronic devices that are able to collect, transmit, and exchange data. These devices are becoming increasingly prevalent in many areas of our lives, including homes, cities, transportation systems, and businesses. The rise of IoT and sensing devices is driving the proliferation of data, as these devices are able to generate, collect, and transmit vast amounts of data in real time.
The data generated by IoT and sensing devices can be used for a variety of purposes, including real-time decision making, trend analysis, and predictive modeling. The ability to collect and analyze this data in real time allows organizations to make more informed decisions and respond more quickly to changing circumstances. As the number of IoT and sensing devices continues to grow, the amount of data being generated and analyzed will also increase, leading to even greater insights and opportunities for organizations.
#
The Impact of Mobile and Cloud Computing on Data Generation
Mobile and cloud computing are driving the rapid growth of data generation. The proliferation of mobile devices and the increasing use of cloud-based services have made it easier for people to access and use these technologies, leading to a significant increase in the amount of data being produced.
Analytics users expect an exceptional experience when interacting with data, and we want them to spend more time engaging with insights. This means that we need to focus on delivering a user-friendly interface and ensuring that data is easily accessible.
The pandemic has also contributed to the increase in data generation, as more people turned to online platforms for services such as grocery delivery. This trend is likely to continue as the use of mobile and cloud technologies becomes more widespread.
As the amount of data being generated continues to grow, organizations need to carefully consider how much data to collect and analyze, and how long to retain it. By identifying the systems, users, and assets that are generating the most relevant data, we can more effectively scope problems and take action to address them. Baselines and trending are also important for understanding how IT services are performing and responding to user needs.
#
The Importance of Data in a Pandemic
The pandemic brought attention to the crucial role that data plays in decision making. From tracking case numbers and infection rates to determining the capacity of hospitals, data became a driving force in understanding and responding to the crisis. This emphasized the need for real-time insights and quick response times in order to keep critical services running and secure. The pandemic also highlighted the importance of defining what is essential, both in terms of services and data. Ensuring the availability of these essential services and data requires a strong and efficient IT infrastructure, which can be achieved through the use of analytics and machine-speed response.