A central challenge of current stream processors is navigating the trade-off between performance and consistency. Timely Dataflow changes that. By rethinking how time should be represented in a distributed system, it achieves lower latencies and strong consistency, while allowing its users to express even complex cyclic computations.
In this talk, we give an introduction to the Timely stack, revealing what makes it special, and how we use it in our professional work. We will guide you through Timely's underlying dataflow model, its unique approach to progress tracking, and give intuition for why it is able to outperform even specialized systems in the wild. Along the way, we highlight some of the more advanced aspects of the Timely ecosystem, and how your organization can use it to supercharge its data-driven architectures.
We rigorously generalize critical path analysis (CPA) to long-running and streaming computations and present SnailTrail, a system built on Timely Dataflow, which applies our analysis to a range of popular distributed dataflow engines. Our technique uses the novel metric of critical participation, computed on time-based snapshots of execution traces, that provides immediate insights into specific parts of the computation. This allows SnailTrail to work online in real-time, rather than requiring complete offline traces as with traditional CPA. It is thus applicable to scenarios like model training in machine learning, and sensor stream processing.
SnailTrail assumes only a highly general model of dataflow computation (which we define) and we show it can be applied to systems as diverse as Spark, Flink, TensorFlow, and Timely Dataflow itself. We further show with examples from all four of these systems that SnailTrail is fast and scalable, and that critical participation can deliver performance analysis and insights not available using prior techniques.
Eine zentrale Herausforderung von IoT-Anwendungen ist die Auswertung von hochdynamischen Data Streams in Real-Time. Vor dem Hintergrund klassischer Data-Pipelines stellen wir eine Dataflow-Architektur vor, mit der Data Streams korrekt, effizient und schnell verarbeitet werden können. Unsere Architektur erlaubt es, komplexe, aufeinander aufbauende high-level Queries über heterogene Datenquellen zu stellen, die mit dem Eintreffen neuer Daten inkrementell aktualisiert werden und den Anfragesteller reaktiv über neue Ergebnisse informieren. Ereignisse können dabei bis auf die Nanosekunde genau aggregiert werden.
The functional approach to state management in the frontend was first enabled by React, pioneered by the likes of om.next, Redux, and Elm, and has ushered in a golden era in web development. It is captured by the two signatures
view :: DB -> HTMLand
mutate :: DB -> Tx -> DB. What might happen when we start working with more than one world, i.e. when we replace the notion of mutate with that of
solve :: DB -> Tx -> [DB]?
Using Clojure and Rust, we explore these ideas in the context of 3DF, a stream processing system based on differential dataflows.
3DF is a stream processing system which feeds off of Datomic’s transaction log and provides clients with the ability to register arbitrary Datalog queries, for which they will then continuously receive any changes to the result set. It does this efficiently by compiling Datalog queries to differential dataflows.
Using 3DF on top of Datomic provides a powerful, reactive interface to Datomic, making it an even more attractive choice for the real-time web. It also opens up Datomic to non-JVM runtimes and processes without a peer cache, without sacrificing performance. Finally, it hints at the possibility of significantly speeding up functional UI frameworks like D3 and React, because it allows these systems to skip their own change detection.
This talk will explore the “why?“, “how?“, and “what now?“ of working with a reactive database.
Imagine turning the way that web applications interact with the database on its head: instead of polling for changes, clients register their information interests and then continuously receive updates as new data enters the system. Can we do this while maintaining the power and flexibility of Datomic?
This talk introduces 3DF, a stream processing system based on differential dataflows, which aims to do just that. Feeding off of Datomic’s transaction log, 3DF provides a reactive interface to your favourite database, making it an even more attractive choice for the next generation of web applications.