Technologies | Dataware Consulting

DATabricks on aws

This is Dataware Consulting's specialty

Databricks is a comprehensive open source product for developing your data platform. The Databricks Lakehouse model enables organizations familiar with data warehousing to incorporate a data lake into a "single source of truth". AWS works great with Databricks as a cloud storage platform and has its own vast array of tools. Dataware Consulting has maintained a focus on this pairing so that we can expertly guide your Databricks on AWS implementation:

Create workspaces in customer-managed VPC
Unity Catalog configuration
- Hive Metastore to UC conversion
- Catalog & schema design across workspaces
- Data access administration
- File management in volumes
Data Pipeline development
- Python, Spark SQL, JSON, YAML, Databricks API/CLI
- Spark Streaming in Delta Live Tables
- Catalog-aware notebooks using environment variables
- Notebook & DLT orchestration with jobs

Code Deployment development
- Source control with Databricks Repos & GitHub
- CI/CD with GitHub Actions & Databricks CLI
- Databricks Asset Bundles - Job/DLT deployment
Cluster setup & compute policies
DBU cost tracking with tags
AWS configuration for Databricks
- Customer-managed VPC setup for Databricks
- Subnet & Network ACL configuration
- Roles & Policies for S3 bucket access
- DMS migration tasks CDC source data to S3

aws Data lake

As a Databricks alternative or complimentary toolset

AWS is not just a cloud storage platform. It has all the tools to build a robust data lake: Glue for ETL workflows and cataloging, SQS/Lambda for triggering, Step Functions for orchestration and Athena for querying. DyanamoDB, RDS and Redshift cover the spectrum of databases from NoSQL to MPP data warehousing. Dataware Consulting has AWS data lake experience, both implementing as standalone and operating in parallel with Databricks:

AWS data lake development (Python)
- Glue workflows & jobs
- Lambda, Step Functions & SQS
- Glue catalogs & Athena queries
- CDK/SDK & Code Pipeline deployment
AWS database development (SQL)
- DynamoDB, RDS, Redshift

AWS data lake as source for Databricks external tables
DMS migration tasks CDC source data to Aurora
Daily load partitions in medallion S3 buckets
S3 bucket versioning, scripted Glacier/delete restores
DynamoDB for pipeline auditing & SQL config (JSON)
SNS email notifications
Athena automated test scripts (Python/SQL)

data dESIGN

Understanding the options, designing to the goal

Dataware Consulting provides data design services for your data platform. For some of our clients this starts with the end in mind, modeling tables in the analytical layer / data warehouse. For others, a careful examination of raw sources and their inherent "schema-on-read" designs. Will the schemas evolve automatically? Does the load method dictate design changes to the targets? What kinds of staging tables are needed for transforming data through the pipeline? How best to design the bronze/silver/gold "hops" for data governance? Big Data hasn't so much revolutionized data design as it has expanded the options at our disposal. Dataware Consulting understands how to put these strategies to work into a cohesive data platform design:

Legacy to Lakehouse conversion
Medallion architectures for Lakehouse
Schema-on-read & schema-on-write
Append only & merge schemas
Streaming schema evolution

Data transformation staging tables
Star schema modeling in data marts
Conformed dimensions
SCD1 & SCD2
Table optimization

tech history & Leadership

Dataware Consulting has been there, done that

For 15 years, Dataware Consulting has been a trusted leader in data platform design and development. In that time, we've leveraged many technologies to build out our clients' data platforms. Along the way we've become proficient in:

Data warehouse architecture & modeling (ERwin)
Database management (SQL Server, Oracle)
Performance tuning (indexes, table partitions)
ETL design (SSIS, Informatica)
Database development (t-SQL, pl-SQL)

Reporting (SSRS, Business Objects)
Analytical cubes (SSAS MD & Tabular)
Dashboarding (Power BI, Microstrategy)
Deployment strategies
Source control & branching

Technical knowledge is key to building data platforms, but it's the leadership that sets Dataware Consulting apart. Know that you will have an experienced partner that can provide crucial project support:

Project planning & budgeting
Business liason & requirements definition
Vendor liason & product adoption
Agile story writing & sprint planning

Manage implementation teams
Best practices & consistency of implementation
Training & mentoring
Wiki-style documentation & project hand-off