
8 Best Practices for Approaching Master Data Governance in the Cloud

(Inked Pixels/Shutterstock)
Expanding your footprint to the cloud has gotten easier in recent years, and as a result, a growing community of data consumers are now motivated and energized to collect, capture, store, and analyze data for insights and business decision making. While more enterprise organizations are looking to go all-in on cloud, CIOs will learn that the nature of their data may require a hybrid approach to truly modernize their IT infrastructure. In fact, Gartner analysts believe hybrid IT will be the standard in 2019.
Despite traction, information management stakeholders still have concerns about the potential risk of managing their data, both in the cloud and on premises, for various reasons.
Enterprises have consistently cited security and governance as barriers to cloud adoption, looking for increased control where the cloud infrastructure is shared with a third party. There’s also a growing set of regulations that mandate stricter governance of personal data, such as the California Consumer Privacy Act (CCPA), the European Union’s General Data Protection Regulation (GDPR), and more. Not to mention democratization and data awareness. Moving data into any kind of data lake (cloud-based or on-premises) runs the risk of losing track of which data assets have been moved, the characteristics of their content, and details about their metadata. So discoverability becomes very important.
These factors highlight the need for increased data classification, cataloging, access control, data quality management, and monitoring of the flow of data through the organization (data lineage) as core data governance competencies that every organization should look for in a cloud provider. Addressing these risks, without abandoning cloud computing’s many benefits, makes it more important than ever to understand data governance in the cloud. And of course, know which parts are most critical.
Simply put, data governance encompasses the ways that people, processes, and technology can work together to enable business efficiency along with defined and agreed-upon data policies that ensure compliance is met. As a given company generates more data and moves it to the cloud, they need to consider a comprehensive approach to data governance. Why? Because good data governance can inspire customer trust, allow the company to extract more value from their data and therefore lead to more competitive offerings and improvements in, for example, customer experience.
There are well-known frameworks out there that can be great resources for how to approach data governance (i.e. Gartner framework , Forrester framework ). But these steps aren’t always linear or the same for every organization.
With that in mind, here are eight best practices for any company to consider as they approach data governance in the cloud:
1. Data Discovery and Assessment
Cloud-based environments offer an economical option for creating and managing data lakes, but the risk remains for ungoverned migration of data assets. You can mitigate this with improved data discovery and assessment processes. By identifying data assets within a primary, hybrid, or multi-cloud environment you can trace and record each data asset by origin, lineage, object metadata, and discovery what transformations have been applied to the data. Better data discovery means that users can find the data they need, when they need it, which leads to efficiency and better, data-driven decision making.
2. Data Classification and Organization
Properly evaluating a data asset and scanning the content of its different attributes can help categorize the data asset for subsequent organization. This process can also infer whether the object contains sensitive data, and if so, classifying each data asset in terms of the different levels of data sensitivity, including personal and private data, confidential data, and intellectual property. To implement data governance in the cloud, you’ll need to profile and classify sensitive data in order to inform which governance policies and procedures apply to the data.
3. Data Cataloging and Metadata Management
Once your data assets are assessed and classified, it is crucial that you document your learnings so that your communities of data consumers have visibility into your organization’s data landscape. You need to maintain a data catalog that contains structural metadata, data object metadata, and the assessment of levels of sensitivity in relation to the governance directives (such as compliance with one or more data privacy regulations). The data catalog not only allows data consumers to view this information, but it can also serve as part of a reverse index for search and discovery, both by phrase and (given the right ontologies) by concept. It is also important to understand the format of structured and semi-structured data objects, and allow your systems to handle these data types differently, as necessary.
4. Data Quality Management
Different data consumers may have different data quality requirements, so it’s important to provide a means to document data quality expectations, as well as techniques and tools for supporting the data validation and monitoring process. These management processes include creating controls for validation, enabling quality monitoring and reporting, supporting the triage process for assessing the level of incident severity, enabling root cause analysis and recommendation of remedies to data issues, and data incident tracking. The right processes for data quality management will provide measurably trustworthy data for analysis.
5. Data Access Management
There are two aspects of governance for data access. The first aspect is the provisioning of access to available assets. It’s important to provide data services that allow data consumers to access their data, and fortunately, most cloud platform providers provide methods for developing data services. The second aspect is prevention of improper or unauthorized access. It’s important to define identities, groups, and roles, and assign access rights to establish a level of managed access. This best practice involves managing access services as well as interoperating with the cloud provider’s identity and access management (IAM) services by defining roles, specifying access rights, and managing and allocating access keys for ensuring that only authorized and authenticated individuals and systems are able to access data assets according to defined rules.
6. Auditing
Organizations must be able to assess their systems to make sure that they are working as designed. Monitoring, auditing and tracking (who did what and when and with what information) helps security teams gather data, identify threats, and act on them before they result in business damage or loss. It’s important to perform regular audits: check the effectiveness of controls in order to quickly mitigate threats and evaluate overall security health.
7. Data Protection
Perimeter security is not, and never has been sufficient for protecting sensitive data. Attempting to prevent someone from breaking into your system has limited success, but at some point, your data may become exposed. It’s important to institute additional methods of data protection to ensure that exposed data cannot be read, including encryption at rest, encryption in transit, data masking, and permanent deletion.
8. Data Literacy
A critical component to an organization’s success with data governance relies on education, training, and a true understanding of what can and can’t be done with your data. Technology on its own isn’t enough – it takes people, process, policies to drive organizational change and enable users to see and protect the value of their data as a business asset.
Overall, organizations will likely reap immense benefits as they promote a data-driven culture – a large part of which involves a strong understanding of data governance. By putting these recommendations into practice, information management professionals will be one step closer.
About the author: Evren Eryurek, PhD is the leader of Data Analytics and Data Management portfolio of Google Cloud. As the director of product management, Evren’s responsibilities cover Streaming Analytics, Dataflow, Beam, Messaging (Pub/Sub & Confluent Kafka), Data Governance, Data Catalog & Discovery and Data Marketplace as the Director of Product Management. Prior to joining Google, he was the SVP & Software Chief Technology Officer for GE Healthcare, near $20 billion segments of GE. A graduate of the University of Tennessee, Evren holds a master’s and a doctorate degree in Nuclear Engineering and over 60 U.S. patents.
Related Items:
Why Investing in Your Team’s Data Culture Could Be the Most Important Money You Spend
The State of Storage: Cloud, IoT, and Data Center Trends
June 20, 2025
- Couchbase to be Acquired by Haveli Investments for $1.5B
- Schneider Electric Targets AI Factory Demands with Prefab Pod and Rack Systems
- Hitachi Vantara Named Leader in GigaOm Report on AI-Optimized Storage
- H2O.ai Opens Nominations for 2025 AI 100 Awards, Honoring Most Influential Leaders in AI
June 19, 2025
- ThoughtSpot Named a Leader in the 2025 Gartner Magic Quadrant for Analytics and BI Platforms
- Sifflet Lands $18M to Scale Enterprise Data Observability Offering
- Pure Storage Introduces Enterprise Data Cloud for Storing Data at Scale
- Incorta Connect Delivers Frictionless ERP Data to Databricks Without ETL Complexity
- KIOXIA Targets AI Workloads with New CD9P Series NVMe SSDs
- Hammerspace Now Available on Oracle Cloud Marketplace
- Domino Launches Spring 2025 Release to Streamline AI Delivery and Governance
June 18, 2025
- WEKA Introduces Adaptive Mesh Storage System for Agentic AI Workloads
- Zilliz Launches Milvus Ambassador Program to Empower AI Infrastructure Advocates Worldwide
- CoreWeave and Weights & Biases Launch Integrated Tools for Scalable AI Development
- BigID Launches 1st Managed DPSM Offering for Global MSSPs and MSPs
- Starburst Named Leader and Fast Mover in GigaOm Radar for Data Lakes and Lakehouses
- StorONE Unveils ONEai for GPU-Optimized, AI-Integrated Data Storage
- Cohesity Adds Deeper MongoDB Integration for Enterprise-Grade Data Protection
- Fivetran Report Finds Enterprises Racing Toward AI Without the Data to Support It
- Datavault AI to Deploy AI-Driven Supercomputing for Biofuel Innovation
- Inside the Chargeback System That Made Harvard’s Storage Sustainable
- What Are Reasoning Models and Why You Should Care
- The GDPR: An Artificial Intelligence Killer?
- It’s Snowflake Vs. Databricks in Dueling Big Data Conferences
- Databricks Takes Top Spot in Gartner DSML Platform Report
- Snowflake Widens Analytics and AI Reach at Summit 25
- Why Snowflake Bought Crunchy Data
- Top-Down or Bottom-Up Data Model Design: Which is Best?
- Change to Apache Iceberg Could Streamline Queries, Open Data
- Fine-Tuning LLM Performance: How Knowledge Graphs Can Help Avoid Missteps
- More Features…
- Mathematica Helps Crack Zodiac Killer’s Code
- It’s Official: Informatica Agrees to Be Bought by Salesforce for $8 Billion
- Solidigm Celebrates World’s Largest SSD with ‘122 Day’
- AI Agents To Drive Scientific Discovery Within a Year, Altman Predicts
- DuckLake Makes a Splash in the Lakehouse Stack – But Can It Break Through?
- The Top Five Data Labeling Firms According to Everest Group
- ‘The Relational Model Always Wins,’ RelationalAI CEO Says
- Who Is AI Inference Pipeline Builder Chalk?
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- IBM to Buy DataStax for Database, GenAI Capabilities
- More News In Brief…
- Astronomer Unveils New Capabilities in Astro to Streamline Enterprise Data Orchestration
- Yandex Releases World’s Largest Event Dataset for Advancing Recommender Systems
- Astronomer Introduces Astro Observe to Provide Unified Full-Stack Data Orchestration and Observability
- BigID Reports Majority of Enterprises Lack AI Risk Visibility in 2025
- Databricks Unveils Databricks One: A New Way to Bring AI to Every Corner of the Business
- MariaDB Expands Enterprise Platform with Galera Cluster Acquisition
- FICO Announces New Strategic Collaboration Agreement with AWS
- Snowflake Openflow Unlocks Full Data Interoperability, Accelerating Data Movement for AI Innovation
- Databricks Announces Data Intelligence Platform for Communications
- Cisco: Agentic AI Poised to Handle 68% of Customer Service by 2028
- More This Just In…