I’m seeing many companies’ in-house data and analytics teams wanting external data, not dashboards, from their vendors or partners to train AI/ML models, conduct analysis, augment data, etc. These companies want data on their terms. And they want it directly in a place where they can work on it (e.g., data warehouse, Snowflake, Redshift, Big Query, Databricks, etc.) instead of having to download CSVs, scrape APIs, hire many data engineers to convert data, and so on.
The ETL/data integration market is already filled with dozens (perhaps hundreds) of companies that all effectively solve the same challenges with different levels of pain for the users. The challenge here isn't the tooling, which is available, but overcoming the organizational inertia to only use them tactically. A strategic approach is needed, but those approaches are difficult as they take executive buy-in and resources to implement.
Another side of this is cost. Companies already have PB of data spread over hundreds of S3 buckets. Moving it into Snowflake or, worse, BigQuery, is enormously expensive and has questionable benefits. What's needed are new approaches to leverage data where it resides today, not where it might be tomorrow. Overcoming the data quality and governance challenges over data stored in object storage should be a rich area for investment. Instead, it's the same founders and same companies trying to stick a square peg into a data warehouse shaped hole.
The ETL/data integration market is already filled with dozens (perhaps hundreds) of companies that all effectively solve the same challenges with different levels of pain for the users. The challenge here isn't the tooling, which is available, but overcoming the organizational inertia to only use them tactically. A strategic approach is needed, but those approaches are difficult as they take executive buy-in and resources to implement.
Another side of this is cost. Companies already have PB of data spread over hundreds of S3 buckets. Moving it into Snowflake or, worse, BigQuery, is enormously expensive and has questionable benefits. What's needed are new approaches to leverage data where it resides today, not where it might be tomorrow. Overcoming the data quality and governance challenges over data stored in object storage should be a rich area for investment. Instead, it's the same founders and same companies trying to stick a square peg into a data warehouse shaped hole.