We present ST2, an end-to-end solution to analyze distributed dataflows in an online setting. It is powered by Timely Dataflow, a low-latency, distributed data-parallel dataflow computational framework.
ST2 connects to a running Timely computation, creates the program activity graph representation, and runs multiple analyses, including aggregate metrics, progress and temporal invariant checking, and graph pattern matching on top of it. Through a command-line interface and an interactive dashboard, users are able to interact with and visualize ST2’s analytics. In our performance evaluations, we are able to show that ST2 is able to comfortably keep up with common streaming computations in offline and online settings. We argue that ST2 is an extendable system that paves the way for users to debug, monitor, and optimize online distributed dataflows.
Modern, stateful web applications are increasingly dependent on high-frequency updates from multiple database systems. Each application maintains its own dynamic view of shared and private data. What if querying and synchronizing this state consistently was as simple as writing a Datalog query?
Using a reactive query engine powered by differential computation built in Rust, we explore this question. We present a novel architecture that selectively and incrementally replicates state between server and application and thus allows you to develop as if you were sitting right inside of the database.
A central challenge of current stream processors is navigating the trade-off between performance and consistency. Timely Dataflow changes that. By rethinking how time should be represented in a distributed system, it achieves lower latencies and strong consistency, while allowing its users to express even complex cyclic computations.
In this talk, we give an introduction to the Timely stack, revealing what makes it special, and how we use it in our professional work. We will guide you through Timely's underlying dataflow model, its unique approach to progress tracking, and give intuition for why it is able to outperform even specialized systems in the wild. Along the way, we highlight some of the more advanced aspects of the Timely ecosystem, and how your organization can use it to supercharge its data-driven architectures.
We rigorously generalize critical path analysis (CPA) to long-running and streaming computations and present SnailTrail, a system built on Timely Dataflow, which applies our analysis to a range of popular distributed dataflow engines. Our technique uses the novel metric of critical participation, computed on time-based snapshots of execution traces, that provides immediate insights into specific parts of the computation. This allows SnailTrail to work online in real-time, rather than requiring complete offline traces as with traditional CPA. It is thus applicable to scenarios like model training in machine learning, and sensor stream processing.
SnailTrail assumes only a highly general model of dataflow computation (which we define) and we show it can be applied to systems as diverse as Spark, Flink, TensorFlow, and Timely Dataflow itself. We further show with examples from all four of these systems that SnailTrail is fast and scalable, and that critical participation can deliver performance analysis and insights not available using prior techniques.
Eine zentrale Herausforderung von IoT-Anwendungen ist die Auswertung von hochdynamischen Data Streams in Real-Time. Vor dem Hintergrund klassischer Data-Pipelines stellen wir eine Dataflow-Architektur vor, mit der Data Streams korrekt, effizient und schnell verarbeitet werden können. Unsere Architektur erlaubt es, komplexe, aufeinander aufbauende high-level Queries über heterogene Datenquellen zu stellen, die mit dem Eintreffen neuer Daten inkrementell aktualisiert werden und den Anfragesteller reaktiv über neue Ergebnisse informieren. Ereignisse können dabei bis auf die Nanosekunde genau aggregiert werden.
The functional approach to state management in the frontend was first enabled by React, pioneered by the likes of om.next, Redux, and Elm, and has ushered in a golden era in web development. It is captured by the two signatures
view :: DB -> HTMLand
mutate :: DB -> Tx -> DB. What might happen when we start working with more than one world, i.e. when we replace the notion of mutate with that of
solve :: DB -> Tx -> [DB]?
Using Clojure and Rust, we explore these ideas in the context of 3DF, a stream processing system based on differential dataflows.
3DF is a stream processing system which feeds off of Datomic’s transaction log and provides clients with the ability to register arbitrary Datalog queries, for which they will then continuously receive any changes to the result set. It does this efficiently by compiling Datalog queries to differential dataflows.
Using 3DF on top of Datomic provides a powerful, reactive interface to Datomic, making it an even more attractive choice for the real-time web. It also opens up Datomic to non-JVM runtimes and processes without a peer cache, without sacrificing performance. Finally, it hints at the possibility of significantly speeding up functional UI frameworks like D3 and React, because it allows these systems to skip their own change detection.
This talk will explore the “why?“, “how?“, and “what now?“ of working with a reactive database.
Imagine turning the way that web applications interact with the database on its head: instead of polling for changes, clients register their information interests and then continuously receive updates as new data enters the system. Can we do this while maintaining the power and flexibility of Datomic?
This talk introduces 3DF, a stream processing system based on differential dataflows, which aims to do just that. Feeding off of Datomic’s transaction log, 3DF provides a reactive interface to your favourite database, making it an even more attractive choice for the next generation of web applications.