Skip to content
All posts

UCE AI Team Article: Model Context Protocol as the Future of Data Engineering

Integration of the MCP within enterprise organizations creates a new demand for Data Engineers with a unique combination of skills of Data and Generative AI. While it may sound like a familiar claim – companies have been trying to connect Data to GenAI using concepts like ‘chat-to-your-data’ for many years. Model Context Protocol makes this process more streamlined, enabling plug-and-play capabilities for cutting edge companies as well as for companies with a low level of AI adoption. However, an MCP server connected to the data provides not only an elegant way to enable cross-company access to data, but also brings new challenges, as outlined below.  

Generic MCP Implementation

The diagram below shows a generic MCP server connected to a Data Platform.

This implementation allows a straightforward way to connect chatbots and other MCP clients to the data and its infrastructure. However, such generic MCP servers provide their capabilities without domain specific knowledge. For example - a basic MCP easily handles "Show all database tables with customer information," but cannot effectively interpret "What's our customer churn risk based on recent support tickets?"

The lack of domain knowledge in such generic MCP servers poses a challenge for queries that require domain-specific contextual understanding in order to translate business terminology into appropriate data queries and meaningful insights. Just as connecting a BI reporting tool to a database through JDBC, requires specialized knowledge to build business-oriented dashboards, the MCP server requires extra domain knowledge to integrate with tools used by business users.

Business-Oriented MCP Implementation (MCP as a Product)

The real power of MCP and its potential return on investment emerge when we empower MCP with business knowledge. The appeal of this approach is that the business can now interact with data through natural, complex queries that reflect actual decision-making processes.

“Show me the latest actual company revenue and compare it with forecasted numbers for the next six months. Connect it with the current status of the project and produce a structured summary highlighting discrepancies.” – generic MCP server cannot truly serve a query like this.

To address this objective, this new diagram below represents a domain-focused MCP implementation and it includes essential elements: a refined data set governed by data governance, and a business-oriented MCP server with tools tailored to actual business needs, rather than generic functionality.

The Challenge: Beyond Traditional Data Engineering

While the concept of a business-oriented MCP server may seem challenging, seasoned Data Engineers have long addressed similar problems through refined BI reporting layers (Data Marts, Aggregations, Analytical Layers, etc). Now, a conversational AI introduces new challenges:

MCP Client limitations

Unlike traditional BI tools with predefined reports and dashboards, the AI-powered conversational interface may have context and prompt interpretation issues. Data Engineers can't simply provide a subset of data marts and expect the model to always correctly interpret and respond to every query variation. The system must be adaptable to the full spectrum of potential questions and their nuances  as well as provide an audit capabilities and lineage from query back to the returned data. 

Example: 
•    User Query A: What were our top 5 products in Q4 2024?
•    User Query B: Which products had the most revenue in the last quarter?

Result:
•    Query A might hit a sales_summary_qtrly table using a fixed product ranking.
•    Query B might trigger a revenue_by_product_daily scan with recent data — leading to different outputs.

Data Governance and Business Context

MCP systems require robust data governance and descriptive metadata layers. Without expert-level understanding of data semantic, relationships, and business context, even the most advanced AI models cannot correctly identify, interpret, or present the right information. 

Example:
•    AI is asked: “Show churn rate per region.”
•    The customer_status column uses internal codes like "INAC", "DEL", "SUS" — not easily interpretable by AI.

Department-Specific Adaptations

Just like BI dashboards vary by department, MCP will require different views and interpretations of the same data. MCP will need to provide not only data context but also the business context in which the organization operates. 

Example:
•    Finance asks: “Show cost per unit.” - expects full overhead and amortization included.
•    Operations asks: “Show cost per unit.” - expects raw material + labor only.

Model Context Limitations 

This is especially related to Big Data datasets – datasets and MCP server should be ready to provide back data that can fit into the model context and be meaningful.  

Example:
•    A user asks: “What are common error patterns in device logs from last year?”
•  The device logs span 6 TB and over 12 billion rows. The data exceeds the model's context window, leading to misinterpretation and repeated requests that burn both tokens and infrastructure resources.

Data Engineering and Generative AI – Fusion or Collision?

The successful implementation of an MCP product creates a new demand for Engineers at the intersection of Generative AI and Data. The best path forward is to bridge traditional data engineering with generative AI capabilities: semantic layer and data design integrated into prompt engineering; data quality merged with evaluation frameworks – just a few examples of edge cases requiring deep expertise.

 


Conclusion

The future of data engineering is not only about managing data – it's about making data accessible, context-ready, and conversational for business users. MCP adoption at the company level will significantly accelerate this process and will require data platforms to support cross-functional teams and engineers prepared for this shift. This is not a threat – it’s an opportunity for Data Engineers to evolve from pipeline engineering into data and conversational architecture, and for companies to evolve into data-driven AI organizations.