
DATabricks on aws
This is Dataware Consulting's specialty
Databricks is a comprehensive open source product for developing your data platform. The Databricks Lakehouse model enables organizations familiar with data warehousing to incorporate a data lake into a "single source of truth". AWS works great with Databricks as a cloud storage platform and has its own vast array of tools. Dataware Consulting has maintained a focus on this pairing so that we can expertly guide your Databricks on AWS implementation:
-
Create workspaces in customer-managed VPC
-
Unity Catalog configuration
-
Hive Metastore to UC conversion
-
Catalog & schema design across workspaces
-
Data access administration
-
File management in volumes
-
-
Data Pipeline development
-
Python, Spark SQL, JSON, YAML, Databricks API/CLI
-
Spark Streaming in Delta Live Tables
-
Catalog-aware notebooks using environment variables
-
Notebook & DLT orchestration with jobs
-
-
Code Deployment development
-
Source control with Databricks Repos & GitHub
-
CI/CD with GitHub Actions & Databricks CLI
-
Databricks Asset Bundles - Job/DLT deployment
-
-
Cluster setup & compute policies
-
DBU cost tracking with tags
-
AWS configuration for Databricks
-
Customer-managed VPC setup for Databricks
-
Subnet & Network ACL configuration
-
Roles & Policies for S3 bucket access
-
DMS migration tasks CDC source data to S3
-
aws Data lake
As a Databricks alternative or complimentary toolset
AWS is not just a cloud storage platform. It has all the tools to build a robust data lake: Glue for ETL workflows and cataloging, SQS/Lambda for triggering, Step Functions for orchestration and Athena for querying. DyanamoDB, RDS and Redshift cover the spectrum of databases from NoSQL to MPP data warehousing. Dataware Consulting has AWS data lake experience, both implementing as standalone and operating in parallel with Databricks:
-
AWS data lake development (Python)
-
Glue workflows & jobs
-
Lambda, Step Functions & SQS
-
Glue catalogs & Athena queries
-
CDK/SDK & Code Pipeline deployment
-
-
AWS database development (SQL)
-
DynamoDB, RDS, Redshift
-
-
AWS data lake as source for Databricks external tables
-
DMS migration tasks CDC source data to Aurora
-
Daily load partitions in medallion S3 buckets
-
S3 bucket versioning, scripted Glacier/delete restores
-
DynamoDB for pipeline auditing & SQL config (JSON)
-
SNS email notifications
-
Athena automated test scripts (Python/SQL)
data dESIGN
Understanding the options, designing to the goal
Dataware Consulting provides data design services for your data platform. For some of our clients this starts with the end in mind, modeling tables in the analytical layer / data warehouse. For others, a careful examination of raw sources and their inherent "schema-on-read" designs. Will the schemas evolve automatically? Does the load method dictate design changes to the targets? What kinds of staging tables are needed for transforming data through the pipeline? How best to design the bronze/silver/gold "hops" for data governance? Big Data hasn't so much revolutionized data design as it has expanded the options at our disposal. Dataware Consulting understands how to put these strategies to work into a cohesive data platform design:
-
Legacy to Lakehouse conversion
-
Medallion architectures for Lakehouse
-
Schema-on-read & schema-on-write
-
Append only & merge schemas
-
Streaming schema evolution
-
Data transformation staging tables
-
Star schema modeling in data marts
-
Conformed dimensions
-
SCD1 & SCD2
-
Table optimization
tech history & Leadership
Dataware Consulting has been there, done that
For 15 years, Dataware Consulting has been a trusted leader in data platform design and development. In that time, we've leveraged many technologies to build out our clients' data platforms. Along the way we've become proficient in:
-
Data warehouse architecture & modeling (ERwin)
-
Database management (SQL Server, Oracle)
-
Performance tuning (indexes, table partitions)
-
ETL design (SSIS, Informatica)
-
Database development (t-SQL, pl-SQL)
-
Reporting (SSRS, Business Objects)
-
Analytical cubes (SSAS MD & Tabular)
-
Dashboarding (Power BI, Microstrategy)
-
Deployment strategies
-
Source control & branching
Technical knowledge is key to building data platforms, but it's the leadership that sets Dataware Consulting apart. Know that you will have an experienced partner that can provide crucial project support:
-
Project planning & budgeting
-
Business liason & requirements definition
-
Vendor liason & product adoption
-
Agile story writing & sprint planning
-
Manage implementation teams
-
Best practices & consistency of implementation
-
Training & mentoring
-
Wiki-style documentation & project hand-off