-
Notifications
You must be signed in to change notification settings - Fork 40
Data Engineering
Data Engineering is a set of operations aimed at creating interfaces and mechanisms for the flow and access of information. Data engineers set up and operate the organization’s data infrastructure preparing it for further analysis by data analysts and scientists.
There are awesome and freely available resources, roadmaps and courses.
Data Science about finding patterns in data, to make future predictions.
What is the difference between Data Science vs Data Engineering?
What's the difference between Data Analytics vs Data Engineering?
Google Analytics 4 is an analytics service that enables you to measure traffic and engagement across your websites and apps.
https://developers.google.com/analytics/devguides/collection/ga4
https://github.com/GoogleCloudPlatformTraining/training-data-analyst
https://medium.com/p/26516b5d28e4
https://cloud.google.com/blog/products/data-analytics/building-the-data-analyst-driven-organization
https://cloud.google.com/blog/products/data-analytics/google-cloud-next-rollup-for-data-analytics
https://cloud.google.com/blog/products/data-analytics/unlocking-opportunities-data-transformation
One of Google Cloud Platform's competitive advantages is the strong ecosystem of managed databases.
Choosing the right database for your workloads can be confusing. We can compare different GCP Database services and make the best decision for each use case.
Besides the GCP database services, there are also a lot of other databases.
https://github.com/andkret/Cookbook
Cloud SQL is a fully-managed relational database service on Google Cloud Platform.
You can use Cloud SQL with MySQL, PostgreSQL, or SQL Server.
SQL stands for Structured Query Language. SQL is used to communicate with a database.
Bigtable is ideal for storing very large amounts of data in a key-value store. Bigtable supports high read and write throughput at low latency.
Dataproc is a fully managed and highly scalable service for running Apache Hadoop, Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks.
Pub/Sub works as a messaging middleware for traditional service integration or a simple communication medium for modern micro-services.
BigQuery is a server-less, cost-effective and multi-cloud data warehouse designed to help you turn big data into valuable business insights.
Cloud Spanner is a distributed SQL database management and storage service that is scalable, multi-version, globally-distributed, and synchronously-replicated.
Cloud Data Fusion is a fully managed, cloud-native, enterprise data integration service for quickly building and managing data pipelines.
Cloud Data Fusion is powered by the open source project CDAP.
https://cloud.google.com/data-fusion/docs
Dataflow is a fully managed streaming analytics service that minimizes latency, processing time, and cost through autoscaling and batch processing.
Dataprep is an interactive web application in which users define the data preparation rules by interacting with a sample of their data.
Cloud Composer is a fully managed data workflow orchestration service that empowers you to author, schedule, and monitor pipelines.
Data Studio is a free tool that turns your data into informative, easy to read, easy to share, and fully customizable dashboards and reports.
https://support.google.com/datastudio#topic=6267740
Demonstration of Data Studio.
https://www.youtube.com/watch?v=NhGLOVkyKjg
Cloud Datalab can be used to easily explore, visualize, analyze, and transform data using familiar languages, such as Python and SQL, interactively.
Looker is a business intelligence software and big data analytics platform that helps you explore, analyze and share real-time business analytics easily.
Data Catalog is a fully managed and scalable metadata management service that empowers organizations to quickly discover, understand, and manage all of their data.
https://cloud.google.com/data-catalog/docs
Cloud Life Sciences is a suite of services and tools for managing, processing, and transforming life sciences data. It also enables advanced insights and operational workflows using highly scalable and compliant infrastructure.
Cloud Firestore is a cloud-hosted, NoSQL database that your iOS, Android, and web apps can access directly via native SDKs.
Datastore is a schema-less database, which allows you to worry less about making changes to your underlying data structure as your application evolves.
Firestore in Datastore mode is a NoSQL document database built for automatic scaling, high performance, and ease of application development.
https://cloud.google.com/datastore/docs
Memorystore automates complex tasks for open source Redis and Memcached like enabling high availability, failover, patching, and monitoring.
Firebase is Google's mobile platform that helps you quickly develop high-quality apps and grow your business.
There are various Data Transfer options in GCP.
https://cloud.google.com/blog/products/data-analytics/data-ingestion-planning-principles
https://cloud.google.com/blog/products/data-analytics/open-data-lakehouse-on-google-cloud
https://martinfowler.com/articles/data-mesh-principles.html
https://medium.com/google-cloud/10-reasons-why-you-should-not-adopt-data-mesh-7a0b045ea40f
https://medium.com/data-monzo/an-introduction-to-monzos-data-stack-827ae531bc99