I’m seeing many companies’ in-house data and analytics teams wanting external data, not dashboards, from their vendors or partners to train AI/ML models, conduct analysis, augment data, etc.
These companies want data on their terms. And they want it directly in a place where they can work on it (e.g., data warehouse, Snowflake, Redshift, Big Query, Databricks, etc.) instead of having to download CSVs, scrape APIs, hire many data engineers to convert data, and so on.
Working on converting data to a usable format is a lot of work. 80%+ of what data folks do is building pipes and integrations, and it’s really boring work.
Big companies like Salesforce are starting to deliver data directly into their customers’/partners’ data warehouses - eliminating the need for their partners/customers to convert all this data. Everyone else should follow suit. Companies like Amplify Data are helping with this transition.
This big middle layer to extract, transform, and load (ETL) the data exists purely to convert data into a usable form, but there is a big opportunity here for data originators or suppliers to increase margins, improve end data customer stickiness, and reduce infra spend by delivering the data directly into a place where their customers need them the most.
Reducing the time and effort it takes to ingest/digest/use data will go a long way in the age of AI and co-pilots.
The ETL/data integration market is already filled with dozens (perhaps hundreds) of companies that all effectively solve the same challenges with different levels of pain for the users. The challenge here isn't the tooling, which is available, but overcoming the organizational inertia to only use them tactically. A strategic approach is needed, but those approaches are difficult as they take executive buy-in and resources to implement.
Another side of this is cost. Companies already have PB of data spread over hundreds of S3 buckets. Moving it into Snowflake or, worse, BigQuery, is enormously expensive and has questionable benefits. What's needed are new approaches to leverage data where it resides today, not where it might be tomorrow. Overcoming the data quality and governance challenges over data stored in object storage should be a rich area for investment. Instead, it's the same founders and same companies trying to stick a square peg into a data warehouse shaped hole.