top of page

What to Expect in AI Data Governance: 2025 Predictions

This guest blog was contributed by Krishna Subramanian, co-founder of Komprise In 2025, preventing risks from both cyber criminals and AI use will be top mandates for most CIOs. Ransomware in particular continues to vex enterprises and unstructured data is a vast, largely unprotected asset. AI solutions have moved from experimental to mainstream, with all the major tech companies and cloud providers making significant investments in building turnkey GenAI and AI solutions for enterprise customers. CXOs want to leverage AI yet not at the cost of damaging customer relationships, reputation and market share with irresponsible use. IT professionals responsible for data and infrastructure will need to be prepared as employees start sending company data to AI. The following predictions focus on the urgency to get AI data governance right – from systems and policies to IT skills.


Systematic data ingestion for AI will be the first data storage mandate


AI mania is overwhelming, but so far, enterprise participation has been largely led by employees who are using GenAI tools to assist with daily tasks such as writing, research and basic analysis. AI model training has been primarily the responsibility of specialists, and storage IT has not been involved with AI. But this will change swiftly in the coming year. Business leaders know that if they get left behind in the AI Gold Rush, they may lose market share, customers and relevance. Corporate data will be used with AI for RAG and inferencing, which will constitute 90% of AI investment over time. Everyone touching data and infrastructure will need to step up to the plate as everyday employees start sending company data to AI. Storage IT will need to create systematic ways for users to search across corporate data stores, curate the right data, check for sensitive data and move data to AI with audit reporting. Storage managers will need to get clear on the requirements to support their business and IT counterparts. 


Unstructured data governance processes for AI will mature

 

Protecting corporate data from leakage and misuse and preventing unwanted, erroneous results of AI are top of mind for executives today. A lack of agreed-upon standards, guidelines and regulations in North America is making the task more difficult. IT leaders can get started by using data management technology to get visibility on all their unstructured data across storage. This visibility is the starting point to understanding this growing volume of data better so that it can be governed and managed properly for AI. Data classification is another key step in AI data governance, and it involves enriching file metadata with tags to identify sensitive data that cannot be used in AI programs. Metadata enrichment is also available for aiding researchers and data scientists who need to quickly curate data sets for their projects by searching on keywords that identify file contents. With automated processes for data classification, IT can create workflows to continually send protected data sets to secure locations and, separately, send AI-ready data sets to object storage where it can be ingested by AI tools. Automated data workflow orchestration tools will be important for efficiently managing these tasks across petabyte-scale data estates. AI-ready unstructured data management solutions will also deliver a means to monitor workflows in progress and audit outcomes for risk.


Role of storage administrator evolves to embrace security and AI data governance


Pressing demands on both the data security and AI fronts are changing the roles of storage IT professionals. The job of managing storage has evolved, with technologies now more automated and self-healing, cloud-based and easier to manage. At the same time, there is increasing overlap and interdependency between cybersecurity, data privacy, storage and AI. Storage pros will need to make data easily accessible and classified for AI, while working across functions to create data governance programs that combat ransomware and prevent against the misuse of corporate data in AI. Storage teams will need to know where sensitive data lurks and have tools to develop auditable data workflows that prevent sensitive data leakage.


Ransomware defense of unstructured data becomes more urgent


Traditionally, data protection has focused on mission-critical data because this is the data that needs faster restores. Yet the landscape has changed with unstructured data growing to encompass 90% of all data generated in the last 10 years. The large surface area of petabytes of unstructured data coupled with its widespread use and rapid growth make it highly vulnerable to ransomware attacks. Cyber-criminals can use the unstructured data as a Trojan horse to infect the enterprise. Cost-effectively protecting unstructured data from ransomware will become a critical defense tactic, starting with moving the cold, inactive data to immutable object storage where it cannot be modified.


Unstructured data management solutions broaden to serve AI data governance and monitoring needs.


The Komprise 2024 State of Unstructured Data Management report uncovered that IT leaders are prioritizing AI data governance and security as the top future capability for solutions. AI data governance covers protecting data from breaches or misuse, maintaining compliance with industry regulations, managing biases in data, and ensuring that AI does not lead to false, misleading or libelous results. Monitoring and alerting for capacity issues or anomalies, last year’s top pick, remains high again along with analytics and reporting. IT and storage directors will look for unstructured data management solutions that offer automated capabilities to protect, segment and audit sensitive and internal data use in AI—a use case that is bound to expand as AI matures.

bottom of page