Data has emerged as one of the world’s greatest resources, powering everything from video recommendation engines and digital banking to the burgeoning AI revolution. But in a world where data is becoming increasingly scattered across locations, from databases to data warehouses to data lakes and beyond, combining all of this into a compatible format for use in real-time scenarios can be a mammoth undertaking.
For context, applications that don’t require direct, real-time data access can easily combine and process data in batches at fixed intervals. This so-called “batch data processing” can be useful for things like processing monthly sales data. But often a company shall real-time access to data as it is created can be critical, for example, for customer support software that relies on up-to-date information about every single sale. Elsewhere, ride-hail apps also have to process all sorts of data points to connect a rider to a driver – this isn’t something that can wait a few days. These kinds of scenarios require what’s known as “stream data processing,” where data is collected and combined for real-time access — something that’s much more complex to configure.
And this is something that Bulldozer plans to address this by powering fast, read-only APIs directly from any source through a plug-and-play data infrastructure backend.
Bulldozer in the craft of Vivek Gudapuri And Matteo Pelati, who founded the company from their base in Singapore almost a year ago. The duo has built a 10-person distributed team across Asia and Eastern Europe as they prepare to move beyond the current product. source available (i.e. not completely open source) incarnation and in a fully monetization product.
Dozer has tested its product with a handful of unnamed design partners, and today it emerges from stealth that any developer can access. The company also revealed that it has raised $3 million in seed funding Sequoia Capital Indiafrom Google Gradient Enterprises, golfAnd January capital.
Numerous tools designed to transform, integrate, and leverage distributed data are already available, including streaming databases and extract, transform, load (ETL) tools such as Apache Flash, Airbyte and Fivetran; cache layers for temporary data storage such as Redis; and instant APIs powered by the likes of Hasura or Supabase to funnel data between systems.
Dozer, for its part, works across all of these different categories, taking what it considers the best parts and removing the friction associated with building the infrastructure and piping that underpins real-time data apps.
Users plug Dozer into their existing data stack, which can include databases, data warehouses, and data lakes, and Dozer takes care of real-time data extraction, caching, and indexing, bringing it up via low-latency APIs. So while something like Airbyte or Fivetran helps get data into a data warehouse, Dozer focuses on the other side: “making this data accessible in the most efficient way possible,” Gudapuri explained to gotechbusiness.com.
Gudapuri said Dozer “takes a quirky approach,” one that addresses very specific issues and no more. For example, established streaming databases solve many problems well beyond what Dozer offers, which is all about providing real-time data updates and APIs in a single product.
“We fix just the right amount of issues in each of these categories to provide a fast build experience for developers, as well as out-of-the-box performance,” said Gudapuri. “Developers (currently) need to integrate multiple tools to achieve the same thing.”
And for this reason, Pelati says, Dozer can promise better data query latency.
“Because of these design choices, Dozer provides much better query latency needed for customer-facing applications,” said Pelati. “A single developer can launch entire data apps in minutes, which would normally take months. A team doesn’t have to build and maintain multiple integrations, saving time and money.”
The (not quite) open source factor
While Dozer is touted as an “open source” platform, a close look at its license on GitHub reveals that it is a Elastic license 2.0 (ELv2), the same license search company Elastic adopted two years ago as part of the transition away from true open source. That’s right, the Elastic license is not recognized as open sourcebecause it prevents third parties from using the software and offering it themselves as a hosted or managed service.
More precisely, ELv2 can be called a “source available” license, which essentially means that it offers many of the benefits of a more permissive open source license like MIT, including codebase transparency, the ability to extend Dozer’s capabilities, refine features, and fix bugs. This alone will likely be enough to win the hearts and minds of businesses of all sizes, as long as it’s not AWS or some other cloud giant looking to monetize directly on top of Dozer.
It’s worth noting that some companies have created their own in-house tooling to solve a similar problem to what Dozer is dealing with, including Netflix who built Bulldozer some years back. Notably one of the main makers behind Bulldozer, Ioannis Papapanagiotounow works as an advisor to Dozer.
It’s still early days for Dozer, but with $3 million in the bank from a host of high-profile backers, the company is fairly well-funded as it moves toward commercialization, including the release of a hosted SaaS version packed with a bunch of extra features. Gudapuri said it expects this to go live in the coming months.
“The hosted service provides auto-scaling, instant deployments, security, compliance, rate throttling, and some additional features,” said Gudapuri.