πFlows
Clarista Flows provides the flexibility to create advanced insights using built-in no-code transformers for business analysts, SQL work-bench for data analysts and Python work-bench for Data Scientists and Data Engineers. Unlike other data transformation (ETL/ELT) software, Clarista does not require knowledge of underlying technical systems and data models. Instead, Clarista leverages its business data catalog as the single source of data for all advanced insights. Along with providing business definitions for all data attributes, this approach also ensures that the access controls defined in the Data Catalog are carried forward to its analytics capabilities as well.
The drag-n-drop data orchestration layer is easy to use by anyone interested in driving more from the data. At each step of data transformation, users can see the results, making the process interactive and transparent. On completion of the workflow, if the user decides to automate it, they can use Clarista Scheduler to schedule the workflow based on a time frequency or based on events.
Key Features:
Clarista Flows provides following key capabilities to drive advanced insights:
Visual designer and orchestrator to configure multi-step data processes
No-code transformers for joins, filter, aggregation and schema changes
Code work-bench for SQL, Python, Pyspark for advanced logic
Integration with Databricks and JupyterHub for coding and compute engines
Cluster configuration depending on code/model complexity
Clarista Navigator as the unified source for all data with definitions
Access controls inherited from Clarista Catalog to Flows
Step by step execution and result analysis
Integration with Alert engine to monitor model metrics or data exceptions
Log of all changes for auditing
Ability to post the results to any connected data system and share them through Navigator
Integration with Scheduler to automate Flows based on events or time
Integration with Data Lineage to track data dependencies
How is it different from other self-service analytics tools?
Choices, Interoperability, Transparency and Performance - Existing end-user analytics tools to transform data or to run data science models had to first localize the data and then use their proprietary engines for computing. Most of these engines struggle to handle large data volumes and can only scale vertically (single server or desktop with more CPU). Clarista Lab was designed to maximize customer choices through interoperability and deliver superior performance by scaling horizontally (workload distribution among many smaller machines).
Since Clarista Lab uses Clarista Navigator as the unified source for all data, Data Scientists and Data Engineers do not have to struggle with in finding, sourcing and profiling data from multiple data sources. This can save up to 50% of their time currently spent on such activities.
Clarista Flows support all major data scripting languages such as SQL, Python and Pyspark, giving Engineers and Scientists flexibility to use the coding language they know best.
Flows' visual orchestrator provides much needed transparency to technical and non-technical users alike to understand the data dependencies on advanced data processes. In case of a failure of data process, data teams can audit every step of the process to find the cause of failure in minutes.
References:
β°Alertsπ§SchedulerβοΈConnectionsπLineageLast updated