pRODUCT • dATAFACTORY

Prepare data pipelines without code or extra tools

DataFactory is everything you need for data integration and building efficient data pipelines

Get A Demo

Transform raw data into information and insights in ⅓ the time of other tools

DataFactory is made for

self-service data ingestion

streaming

Real-Time Replication

transformations

cleansing

preparation

wrangling

machine learning modeling

Get A Demo

Build Pipelines Visually

The days of writing pages of code to move and transform data are over. Drag data operations from a tool palette onto your pipeline canvas to build even the most complex pipelines.

Palette of data transformations you can drag to a pipeline canvas
Build pipelines that would take hours in code in minutes
Automate and operationalize using in-built approval and a version control mechanism

Data Wrangling & Transformations

It used to be that data wrangling was one tool, pipeline building was another, and machine learning was yet another tool. DataFactory brings these functions together, where they belong.

Perform operations easily using drag-and-drop transformations
Wrangle datasets to prepare for advanced analytics
Add & operationalize ML functions like segmentation and categorization without code

Complete Pipeline Orchestration

With DataFactory, you’ve got complete control over your pipelines — when they ingest, when they load targets — everything. It’s like your personal data pipeline robot doing your bidding.

Perform all ETL, ELT, and lake house operations visually
Schedule pipeline operations & receive notifications and alerts

Click into Details

DataFactory doesn’t sacrifice power for ease of use — open the nodes you’ve added to fine-tune operations exactly how you want them to perform. Use code or tune nodes using the built-in options.

Click on any pipeline node to see the details of the transformation
Adjust parameters with built-in controls or directly using SQL
Analyze and gain insights into your data using visualizations and dashboards

Model and maintain an easily accessible cloud Data Lake

If you’re worried about standing up a data lake just for analytics, worry no more. DataFactory includes it’s own data lake to save you time and money. No need to buy yet another tool just for analytic data storage.

Use for cold and warm data reporting & analytics
Save costs vs buying a separate data lake platform

Hundreds of Connectors

Batch data? Buy a tool. Streaming data? Buy another tool. That’s so 2018. DataFactory connects to any data, anywhere, without having to buy YAT (yet another tool).

Legacy databases
Modern Cloud databases
Cloud ERPs
REST API sources (Streaming / Batch)
Object Storage Locations
Flat Files

Whatever you need for your data engineering,
DataFactory is there for you

Capability Matrix

data sources

modern cloud dbs

sap erp system

cloud based erps

Rest Apis

flat files

iot devices

connect

Spark Jdbc

kafka

debezium

Spark streaming

Rest Apis

Discover

SQl console

visual query

sap erp system

builder

Ingest

Batch

Streaming

Log Based Replication

Query based Replication

Processing

Transform

prepare

predict

wrangle

profile

analyze

ml modeling

Load

lakehouse

External ewds

datalakes

external dl platforms

object storage

Get Started Today

Use cases

What can you do with DataFactory?

Quick Insight on Datasets

Use DB Explorer to query data for rapid insights using the power of Spark SQL engine

Save results as data sets to be used as sources in building pipelines and as sources in data wrangling and ML modeling

Easy Data Preparation

Use transformation nodes included in the tool palette to analyze the grain of the data and distribution of attributes

Use nulls and empty records, value statistics, and length statistics to profile data in sources for efficient join and union operations

Query-based CDC

Identify and consume changed data from source databases into downstream staging and integration layers in Delta Lake

Minimizes impacts on source systems via incremental changes identified using variable-enabled, timestamp-based SQL queries

Incorporate only the latest changes into your data warehouse for near real-time analytics and cost savings

Log-based CDC

Deliver real-time data streaming by reading the database logs for identifying the continuous changes happening to source data.

Optimize efficiency by using background processes to scan database logs to capture changed data without impacting transactions and minimizing impact on source

Easily configure and schedule using built-in log-based CDC tools

Anomaly Detection

Quickly perform data pre-processing or data cleansing to provide the learning algorithm a meaningful training dataset

Minimizes impacts on source systems via incremental changes identified using variable-enabled, timestamp-based SQL queries

Cap anomalies based on the built-in rule sets and set guardrails to achieve a higher accuracy percentage of quality data

Push-down Optimization

Execute pushdown optimization via push-down enabled transformation nodes so that transformation logic is pushed down to the source or target database

Easily configure push-down operations within transformation nodes

The team was so excited that we were able to do it in a fraction of the time and so effectively.

Shoban Kumar

J&J NA Application Testing Service Owner

$940K saved annually by automating data quality across nine data sources.

14 FTEs saved through automation. 60% reduction in time needed to test data.

Prepare data pipelines without code or extra tools

Transform raw data into information and insights in ⅓ the time of other tools

DataFactory is made for

Build Pipelines Visually

Data Wrangling & Transformations

Complete Pipeline Orchestration

Click into Details

Model and maintain an easily accessible cloud Data Lake

Hundreds of Connectors

Whatever you need for your data engineering, DataFactory is there for you

Capability Matrix

data sources

data sources

modern cloud dbs

sap erp system

cloud based erps

Rest Apis

flat files

iot devices

connect

Spark Jdbc

kafka

debezium

Spark streaming

Rest Apis

Discover

SQl console

visual query

sap erp system

builder

Ingest

Batch

Streaming

Log Based Replication

Query based Replication

Processing

Transform

prepare

predict

wrangle

profile

analyze

ml modeling

Load

lakehouse

External ewds

datalakes

external dl platforms

object storage

Use cases

What can you do with DataFactory?

The team was so excited that we were able to do it in a fraction of the time and so effectively.

$940K saved annually by automating data quality across nine data sources.

Whatever you need for your data engineering,
DataFactory is there for you