Data Engineering20255 min read

Grupo Amoble

A case study on implementing a Data Warehouse architecture that unified reporting across heterogeneous systems while keeping extraction costs under control.

Cross-DB Reporting

Unified ReportsPrimary Goal

Hybrid ELTArchitecture

IncrementalSync Strategy

Cost ControlConstraint

Grupo Amoble needed consistent analytics across multiple operational databases, but their current setup made reporting fragmented, slow, and hard to trust. They already had a Metabase instance for BI, yet queries still stayed siloed per source, so cross-database analysis remained out of reach. The objective became centralizing data in a warehouse model that could support business reporting without adding unnecessary load to production systems.

The Business Problem

Grupo Amoble had valuable operational data spread across multiple systems, but reporting required jumping between disconnected sources. This created delays, inconsistent numbers, and extra effort for teams that needed fast decisions.

A Metabase instance was already deployed for Business Intelligence, but it could not actually support reliable cross-database querying in their current architecture. That limitation made the root problem clear: BI tooling alone was not enough without a unified analytical data layer.

The project goal was to consolidate those data flows into a Data Warehouse layer where stakeholders could visualize performance and create reports across databases with confidence.

The Technical Constraint

Airbyte was selected as the core ingestion platform because it offered a practical way to move data from common sources into a central model. However, one key source ran on SAP HANA, and the open-source connector did not exist for that case.

The available connector required an enterprise upgrade that represented a US$30,000 jump. That pricing constraint made a direct platform-only approach impossible, so the architecture had to be adapted.

Constraint-driven architecture: no enterprise connector for SAP HANA, so ingestion strategy had to combine platform tooling and custom development.

The Implementation

For MySQL and PostgreSQL, we implemented incremental synchronization using MySQL binary logging and Postgres WAL. This reduced overfetching and avoided wasting compute and database resources by repeatedly extracting unchanged records.

For SAP HANA, we built a custom Python pipeline that connected via SSH to the SAP server, authenticated against HANA, extracted only the required datasets, and pushed the data into the warehouse flow. This created a stable bridge for a source not covered in the open-source stack.

Incremental sync strategy: MySQL binary log + PostgreSQL WAL to reduce overfetching and protect upstream system resources.

Outcome

The final architecture gave Grupo Amoble centralized reporting without forcing an expensive licensing jump. Teams gained a clearer view of business data across systems, and the ingestion layer remained efficient thanks to incremental sync patterns.

From an engineering standpoint, the project demonstrates a pragmatic integration strategy: use managed tooling where it is strongest, and introduce focused custom scripting only where platform limitations would otherwise block delivery.

Services

Data WarehouseAirbyteMySQL BinlogPostgres WALSAP HANAPython ETLSSH TunnelingIncremental Sync

Need a data stack that works with real constraints?

We help teams design pragmatic data platforms that improve reporting quality without forcing unnecessary licensing costs.