Garbage In,Garbage Out

In order to obtain accurate data insights, analysts often spend 80% of their energy on data preprocessing.

DataSpring is an ETL tool based on the latest streaming architecture, which adopts Log-based Change Data Capture, and supports rich, automatic and accurate semantics mapping construction between heterogeneous data, and at the same time satisfies both real-time and batch data processing. It also supports incremental synchronization and conversion of various mainstream databases such as Oracle, MySQL, SQL Server, PostgreSQL, and API data. Simple and easy-to-operate, it can be deployed privately.

System Architecture Diagram

Advantages of New Architecture

In traditional architectures, remote transactional databases need to be read and written; while in event-driven applications, data and computing are not separated, and applications can obtain data only through local access, with higher throughput and lower latency.

Feature Highlights

Data Access

Support common relational database data access, and also API data access

Batch Processing Task

Timing tasks to complete batch tasks

Stream Processing Task

Real-time streaming data access based on CDC

Formula Conversion

Realize data conversion similar to excel functions through preset formulas

Custom UDF Operator

For complex data processing logic, custom UDF operators based on python code are supported for processing

Scheduled Tasks

Configure timing task flow: specify how often to execute, specified time to execute, and cycle execution

Log & User Management

The ETL management interface provides common modules such as operation log query and user management

Seamless Integration With DataFocus

As a member of the DFC series products, it supports the single sign-on feature of the DFC member center, and a seamless product experience can be achieved by the joint deployment with DFC

Three Application Scenarios

Real-Time Computing

Real-time ingestion of live stream, sensors, and black friday event data to form a real-time monitoring dashboard

Real-Time Data Extraction & Cleaning

Load the data of the business system into the data warehouse after extraction, cleaning and transformation

Event-Driven Application

Separate the CPU, MEM, and LOAD information from the messages reported by the server for analysis, and then trigger custom rules to alarm