Managing Cloud Data. Datasy a tale of Automation

  • Data Lake/Data Warehouse where all data will be stored. Customers usually want to have control over their data and with recent regulations like GDPR the data protection bar has risen for everyone. This is why usually for keeping data SaaS solutions and not very popular as they keep the data not on customer owned environment.
  • ETL cluster to manage all data pipelines and schedules. Everyone sees the increase of data generation in the world and realizes how important scalability is for their data platform to keep up with the data volumes increase even in the next 5 years. Also a cluster to take the heavy calculations and processing for training Machine Learning models.
  • Analytics cluster to pull data from the data lake and data warehouse and provide customers with valuable rich visualizations of the data to take more accurate and informed decisions. In our experience we’ve seen the major tools used for analytics and they are great but they come at a price. Either price as real money or price as you need to use given company technical stack.
  • Automation and Metadata — No customer wants to deal with writing code for data pipelines, creating infrastructure or migrating data. In most cases we were asked to automate these tasks too. Automation helps customers not to need experts they cannot find when their business needs it. For example they need just a few AWS Devops to manage the accounts, networking and IAM and not the whole data platform with all its complexities.
  • Last but not least all customers wanted to reduce costs. They looked at many option like changing the analytics tool so they don’t need to pay per user or using an opensource software for ETL and scheduling and here we are lucky for the times we live it as we have huge arsenal of advanced open source products at out disposal like Docker, Hashicorp’s Terraform, Apache Airflow, Apache Superset, Tensorflow, sklearn and so many more.





Passionate about Data Processing, Data Science and Machine learning

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Introduction to Design Patterns and Power of Singleton Design Pattern

MVP VS MVVM in iOS using swift

Main differences between MVP and MVVM in iOS

How Slinky automatically implements HTTP PATCH with Spring Boot

Load testing in production with Grafana Loki, Kubernetes and Golang

Terraform Task 3 : Launch the Wordpress and MySQL

Basics on API’s and Controllers.

Mercedes Benz Greener Manufacturing With Machine Learning.

LeetCode_301(Remove Invalid Parentheses) 心得(Hard)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Boyan Stoyanov

Boyan Stoyanov

Passionate about Data Processing, Data Science and Machine learning

More from Medium

How configuring access controls with Terraform scripts prevents you from scaling your data mesh

Migrating Data from Azure Blob to GCP Cloud Storage

Slack notification for BigQuery results using GitHub Actions

AWS Master Data Management — Architecture, Tools and Best Practices