Data Quality - DiscoPosse Podcast Network

Podcast

Ep 227 Elliot Shmukler of Anomalo on Data Quality, Growth Strategy, and Finding Product Market Fit

May 28, 2022
Tagged as: Anomalo, CEO, Data Quality, Elliot Shmukler, Founder, startup

Sponsored by our friends at Veeam Software! Make sure to click here and get the latest and greatest data protection platform for everything from containers to your cloud!

Sponsored by the Shift Group – Shift Group is turning athletes into sales professionals. Is your company looking to hire driven, competitive former athletes? Shift Group not only offers a large pool of diverse sales candidates from entry level to leadership – they help early stage companies in developing their hiring strategy, interview process and build strong sales cultures that attract the best talent for early stage companies.

Sponsored by the 4-Step Guide to Delivering Extraordinary Software Demos that Win Deals – Click here and because we had such good response we have opened it up to make the eBook and Audiobook more accessible by offering it all for only 5$

Sponsored by Diabolical Coffee. Devilishly good coffee and diabolically awesome clothing

Does your startup need strategic technical content? The team at GTM Delta delivers SEO-optimized, compelling content that connects your company with technical users to help grow your credibility, and your pipeline.

Need Podcast gear? We are partnered up with Podcast Gear Pro to share tips, gear ideas and much more. Check it out at PodcastGearPro.com.

Elliot Shmukler is Co-founder and CEO of Anomalo. Based in the San Francisco Bay Area, Elliot’s been leading a small but growing team since Anomalo being founded in 2018. He’s had previous roles as a Product and Growth leader at tech companies like Instacart, LinkedIn, and Wealthfront. This is a great chat packed with lessons on startup growth, finding quality in your data, and much more.

Thank you Elliot for a great discussion!

Check out Anomalo at https://anomalo.com

Connect with Elliot here: https://www.linkedin.com/in/eshmu/

Transcript powered by Happy Scribe

Welcome back. This is Eric Wright, the host of the DiscoPosse podcast. And you are listening to another fantastic conversation with the one and only Elliot Shmukler. Elliot is the CEO and co-founder of Anomalo, and they’re doing really fantastic stuff around understanding data cleanliness and data issues. This is the data quality platform. Super cool stuff. Elliot’s got a really fantastic background in what he did in early days with LinkedIn and then much more around the rest of his career. But I really dig his approach, which reminds me to go back and listen again to a couple of spots because there’s stuff that stand out lessons here in how you can help to build teams. Think about product market fit. This is like another classic example of super startup lessons. All right, speaking of other startup lessons, learn some lessons without learning on the hard way by making sure you go to the amazing partners that make this podcast happen. Of course, like the fine folks over at Veeam Software, everything you need for your data protection needs, wherever you got it, whether it’s on premises, in the cloud, cloud native, even SaaS stuff like Office 365 Team SharePoint.

Yeah, you can hit delete button. Bad things happen. So, yeah, hit the go to Veeam button, vee.am/discoposse. Let them know old disco sent you. And on top of that, this is a fantastic platform. So go check it out. All right. Now next up, of course, this episode is brought to you by the folks at the Shift Group who are turning athletes into sales professionals. So if you’re looking to hire super cool, driven, competitive former athletes, or maybe you just want to build your own go to market strategy efficiently and effectively. The Shift Group team has an incredible diverse pool of candidates, whether it’s from entry-level all the way up to leadership. Plus, JR and the team are helping early stage groups just build that strategy. Start with culture, start with success. Take the drive of an athlete, bring that into your organization. Fantastic, folks. Go back and check out JR’s episode used recently on the podcast. So head on over to shiftgroup.io or just drop an email right to JR. He’s JR@ShiftGroup.io. Yeah, he’s really cool. Oh, by the way, if you like coffee, go to Diabolicalcoffee.com. Did I ask you that too fast? Go to diabolicalcoffee.com There you go. That’s better. All right, let’s get to the show. Here we go.

I am Elliot Shmukler, co-founder and CEO of an Anomalo. And you are listening to the DiscoPosse podcast.

All right. I feel like that’s always my moment where I tell people that’s like the on air light just like turns on like. All right, we are live. Although we’re not live, it’s live to live to tape or live to right. Live to MP4. I’m an older fellow, so I still say live to save. Elliot, thank you very much for joining today. I was really excited when I saw you come up as a guest, first because you’re doing exciting stuff with the team at Anomalo. And secondly because you’re a friend of Amber Rowland. And if I take great problems, complex problems being solved with platforms and then seeing somebody who’s standing by the story, it is a great pairing. So I’m excited to chat. So if you don’t mind, Elliot, for folks that are new to you, if you want to give a quick background and bio on yourself and we’ll get into the Anomalo’s story.

Absolutely. Thank you so much for having me, Eric. Really a pleasure to be here. In terms of myself, I’m a long time Silicon Valley executive. I’ve worked at some companies that hopefully your listeners know about – LinkedIn, WealthFront – recently in the news and Instacart, also recently in the news with the pandemic. So have been a product and growth leader. I had a bunch of companies like that for a while before founding Anomalo.

It’s amazing how many LinkedIn alumni I found recently. And it’s definitely it’s funny. Some have come from different phases in the company, and I always want to feel like, hey, do you know Patrick Baines? He uses LinkedIn, too, but that’s like saying, oh, you’re from Canada, you must know Pete. He’s from Halifax. There’s a lot of people that work there. But you definitely have a storied history, proven history at that in the industry. And then it comes to today, which we’re going to talk about Anomalo. You’ve got some really great stuff. Obviously, announcements are live and you’ve had some work that’s happening. So let’s talk about the problem that you’re solving and then the how, which is actually super exciting.

Yeah, absolutely, Eric. And I’m glad you’re running into a lot of LinkedIn folks because it was a pretty special time when I was there. And it really actually began a lot of my journey toward Anomalo, where LinkedIn was one of the first places in my career where I got exposure to having a lot of data and trying to use that data to make decisions and in my case, to make the LinkedIn product better, to make it grow faster and ran into a lot of the issues back in the day. This is ten plus years ago now. So even more issues than there are today. But ran into a lot of issues with being data driven and trying to use data. And over the subsequent years, a lot of those issues got solved. For example, LinkedIn at one point had 150 people managing our data warehouse.

Wow.

Right. And today, you just don’t have to do that. Right. Today you can spin up a great Snowflake data warehouse or in a few minutes or data bricks in a few minutes and off you go. You have a world class place to store your data, query your data, analyze your data. But the issue that I’ve seen, despite these amazing improvements in the data stack is that the more powerful your tool, the more powerful your data warehouse, the more data you’re pulling in, the more use cases you’re building on top of data, the more cost you bear if your data is wrong one day or incomplete or missing or inconsistent with what you expected it to be. Right. And so that’s the problem that a lot of them solving is how do we give teams that are working with data, they’re trying to make use of their corporate data, their enterprise data, trying to make decisions, trying to get inside. How do we give them something that helps to make sure that their data is actually right, that they don’t have issues with their data, or if they do that, they can detect them and resolve them quickly before it impacts decisions or other work?

Yeah, it’s amazing. We get so wrapped into the buzzwordy lifestyle of talking about being data driven, and everybody’s got data Lakes and data warehouses and data puddles and data, whatever you want to call them. There’s all these different things about data is the new oil in the same way that data is the new oil? Say data is the new crude oil. And in fact, there’s a lot that needs to be done to make that data really enriched information and gather signal from the noise because data in and of itself is not valuable. It’s the cleanliness of the data and the sort of trueness to the signal you need to find in order to then gather insights and info. All these folks are focusing on the automation side. But if you do not trust the data that’s going into the machine GIGO. Right. Garbage in, garbage out.

Exactly right, Eric. Exactly right. And it’s actually much worse today than it’s been. Think about using machine learning. Right. That’s another buzzword. Everyone talks. Everyone’s trying to deploy their machine learning model, do great things, use those technologies in a way that a Google or an Amazon or Netflix might to improve their product. Guess what happens if you’ve trained a machine learning model on a particular set of data, particular characteristics, and suddenly today the input data that it receives is wildly different, right. That model doesn’t produce great results for you. You’re essentially getting a random out of that model because you’re exposing it to data that it wasn’t trained on. It doesn’t know what to do with it. It has no constraints on what outputs it gives you. So it’s even worse when you have machine learning deployed and you’re expecting to feed us the data that’s coming in and expecting to have great results.

One of the things that stood out when I look at your platform story, I don’t mean to pick on one phrase right now. We go into the entirety of how the platform works, but automated root cause analysis.

Right.

And this is one of the things that near and dear to me. I’ve been doing this as a business for a decade, and it’s one of the most difficult problems to solve because the speed at which the data is moving, the ability to do real time and automated root cause analysis is almost an intractable problem because by the time especially when it gets into anything that’s around system design, the old class thing is by the time you figure out what the real root cause of the problem was, you could have just rebooted the system. Right. But when it comes to data, there’s no reboot the system option. It means you have to understand the forbidden fruit from which the data was gathered, and then now to be able to go back and there’s data reconciliation. So there’s a fantastic problem in the bigness of what it is that you are able to solve. So when I saw that, I was like, okay, we’re going to dig in hard on this one, but let’s actually just talk about the platform in general and how it was put together to solve the problem of data.

Absolutely, Eric. And automated root cause analysis is something we’re very proud of and something is very unique to what we do in our approach. But to step back and give you a sense of how it works fundamentally, what you do with Anomalies, you connect it to your data warehouse. We’re taking advantage of the fact that companies these days are putting up these big data warehouses in the cloud and are stuffing them full of all the data that they care about, centralizing all their data in one place so they can connect it together, analyze it together, and use it for all the various use cases that they have. So Anomalo just connects to your data warehouse, and then within your data warehouse, you select the tables of data that you want us to monitor, and Anomalo goes to work. So one interesting part about what we built with the product is that we’re a machine learning first solution. When you tell us, I want you to monitor this table that has my sales information. We don’t ask you to tell us about the data in that table. We don’t ask you to configure rules for that data or to give us parameters for what that data should be.

We, to the extent possible, learn all those automatically how we connect to that data set. We query your data warehouse, grab some samples of that data, we look at it historically over time, and we actually train one or more machine learning models for each data set that you have us monitor that really seeks to understand the structure and pattern of that data set. That way, when new data comes in, machine learning model can say, hey, this new data that came in, is it somehow different from the structure that I learned from the data set history? And if it is, well, now that may be an issue in the data that we should tell someone about this.

Is the point where if it wasn’t for the fact that I have to stay in camera frame and my microphone arm is not too long, I would stand on the chair and say, oh, Captain, my captain. The idea and this is the core of next generation systems architecture and design is ultimately the system needs to be responsible for its own outcome. And by letting the data drive its own like the understanding of the data itself versus what we believe is the creators of the table that is actually in there is such a fundamental shift, and it’s taking all those assumptions and turning them upside down, which is amazing, because time and time again, we hire a sea of DBAs. And I’ve worked in massive insurance companies, worldwide companies, investment firms, explosive companies, all sorts of exciting stuff. And there’s just we’ve got clients, we’ve got DBAs, we’ve got all these people. And they’re coming in trying to make the data fit into a thing that they believe it should fit into. And every time you’re five years into that project, the diagram is like monstrous, UML, diagram that’s on someone’s wall that they printed on, like five pitch font.

And it’s the size of the entire room. Well, it’s dead because the moment you went live with the system, everything changed. The day in the life moved. And from that point on, the best thing you can do is hope to keep up.

Yeah.

So you’re basically saying you can shed that wherever you are today is, in fact, the beginning of forever because you are now adaptively understanding the data.

That’s exactly right, Eric. And in fact, I would argue those old school approaches which you’re describing, they worked up to a point. Right. We have customers where they spent 2030 years with that approach. They made it work, and they have 100 people doing this work and all this kind of stuff. But at the scale that folks are ingesting data today and with the different types of data that are coming in and the number of applications that they’re trying to run on top of the data, there’s just no way that you can continue that approach. I mean, we have a customer right now Anomalous, that has a table where they’re adding 24 billion records a day. Right. There’s just no way that they’re going to come up with any sort of manual process or rules based process or schema based process to fully make sure that all those rows are conforming to something. Right. They can take some cuts at it, but there’s no way they need something that’s adaptable. And more importantly, they need a machine. Right. Our machine within an envelope has no problem going through 24 billion rows or a sample of those rows if it needs to.

And looking for patterns. That’s going to be pretty challenging using any kind of manual or human driven approach.

Now, I guess this is where the thing will come in, where, as he said, there are purposes and requirements to sort of define the standard by which data is stored. And ultimately, because there’s front end applications that need to understand the schema, there are sort of bound things to the behavior of the data within the structure. But as you said, we’ve got much more that’s coming in. Whether we call it IoT, whether we call it whatever kinds of many sensors, and those sensors could be anything could be 15 different application signals that are coming through that each has their own sort of structural form that’s different. The fact that you could then it gives you the freedom to be able to co locate disparate data, and then ultimately that data, you can find me observability as a practice. We talked about it six, seven years ago. Observability wasn’t even a word outside of physics and chemistry. And then so shout out to charity Majors, who I still will always say she is the creator of the word of observability as a practice. But observability is about bringing unstructured data together and then looking for patterns and signals within it.

And the problem is a high cardinality. Data is incredibly difficult to be able to pull together and then make decisions on and systematically even refine it, let alone get to the point where the data can ultimately create its own structure through having your platform look at it. I don’t mean to wow over this because the computer science folks are just like, there’s no way this is real. It’s a seemingly intractable problem. And I say that because it was intractable up until now. The technology and the capabilities are there where it’s more accessible to do this. But it’s a very unique challenge that you’re solving.

Yeah, absolutely, Eric. And we see ourselves as very much an extension, kind of the observability movement. Right. And they’re great Serbia tools for other dimensions of operations. Right. Data observability is actually even more challenging problem, say operational observable. Is my server up. Right. Those kinds of things. Because data, by necessity is chaotic. Chaotic. To a large extent, what my users do with my product on Fridays might be dramatically different. But what they do with my product on Sundays and even more so different if Sunday is part of a long weekend or Friday is a holiday or we just launched the new product on Monday. And so there’s a lot of dimensions of variability, a lot of chaos in actual data that’s coming in. User data, third party data, those kinds of things. There’s a lot of chaos there above and beyond, sort of the classic conservative data. What is my machine doing? Is it up? Is it down? Is it processing transactions? So definitely a challenging problem. But, yeah, the technology has also improved traumatic. Modern machine learning techniques can do a lot. And modern data warehouses are also incredibly powerful. You can ask them to summarize a lot of what’s going on with the data quickly.

You can analyze it.

Yeah. I think the biggest battleground that we are seeing in the industry is this idea of putting data into a place. And then because right now we know that the technology is arriving, if not has already arrived to do really amazing things with our data. And the one thing that think of the early application design, it was like, this is the data that we’re going to need in order to make decisions around future architectures. So they basically throw away everything but this. It is purely wheat versus chaff, except that they threw away the chaff. And then at some point, especially when you get into retail and you get into all the industrial, there’s so many use cases where they say, like, we got to keep the chaff, hang on to it, because we don’t know, there may actually be a different seed hiding in the chaff. And the economics of storing data have gotten significantly better. And then again, what’s happening now is really people are sort of holding onto it and saying that this may be useful one day and I can’t risk that I throw it away and find out that it would have been useful.

It really is a ripe opportunity for what you and the team are doing.

Yeah, exactly right. It’s really a sea change. And I saw this first template years ago when I was at LinkedIn, coming back to the beginning of the podcast. We were collecting everything, every bit of data that we could, and we were maybe using 5% of it. But it was a cultural thing that the team had picked up from other companies like PayPal early in the Internet history that we got to collect everything because it might be helpful in the future and we would regularly discover new ways to use that data that we weren’t using. We regularly found ways to take that data that maybe in the old days you would have discarded and actually innovate with it and build new product features based on it. That’s exactly right. And that’s been a cultural transformation throughout the enterprise world where now when we talk to customers, they’re almost always storing all the data. Right. They’re not throwing data away anymore like they used to. They may not be using all of it. Maybe they’re on their way to trying to use more and more of it, but they’re definitely storing it and they’re centralizing it and they’re making it accessible.

Yeah. I say as a guy who gets hard to see an out of focus view, there’s about 35 decks of cards over there. I’m not going to use all of them, but you never know. I buy three of each packet again. I’m a bit of a collector in that way. And really in the data world now, this truly is what we’re seeing. More and more companies are realizing. It is a combination of many things. But I’d love to talk about this idea that many people believe today is the beginning of a lot of this, when in fact, this has been a well formed idea for quite some time. It’s just that it was maybe not broadly accessible or broadly understood outside of like a core group of, obviously, people in financial services. We’ve got insurance stuff like there are organizations that have long held that their data needs to be used later. So let’s just keep holding onto it. You never know. But it’s always a funny thing, just like when any band suddenly becomes very popular. I saw them ten years ago in College. I don’t know who you think is brand new, but these folks have been around for a while.

This concept, I think, is probably more widely understood in some circles. Where has this been prevalent before?

Yeah, well, I kind of trace it, at least in my experience of it is really a Silicon Valley phenomenon, at least to the extreme extent that we see. Obviously, financial services companies have been storing data using fraud models and that kind of thing for a long time. But this idea that all of your data needs to be in one place. Right. And even if you’re not using it, eventually I may want that connection for something. It’s, I think, a very Silicon Valley phenomenon. Silicon Valley companies that I’ve been at always kind of strive to have the central one place with everything. So we’re using a third party tool. That’s not okay. We got to import the data from that third party tool into our one place. Literally, engineers would come to me and be like, Elliot, you can’t put up this tool. We’re going to lose that data that goes into the tool. We need an API to get it out. And that would be a hard requirement to using a third party tool for something. And so I think that was the core of it. And that enabled companies like the Google, Amazon and Netflix to do very powerful things.

And now everyone’s kind of realizing that that was a really big advantage. We work actually with a lot of financial services customers. They’ve always had this idea of using data. And they have some amazingly expert teams, frog modeling, and all this kind of stuff. They’re still in silo mode. They have data all over the place. Right. They never really bought in until very recently into this idea of centralization into putting all of it in one place.

And I would posit that still today, the most widely used tool for data analysis is Excel. It’s just bizarre. It’s 2022. And if Microsoft should have divested Excel, it would be worth more than Amazon right now.

Absolutely. And if you think about it, Excel is the most decentralized type of data you can have. I literally have my own copy of the data in my sheet. Right. And there’s ways to sync it now. And all this kind of stuff, but it really is a very different world from the one, I think, where we’re heading to them.

I’ve seen this for having supported big financials for a long time in my own career in the tech side. And I remember getting these calls first and you’re like, oh, I need to restore this Excel documents. What, did you delete it? No, it just got corrupted. Like, how did it get corrupted? Well, I don’t know. And you look at it and it’s 2GB Excel file. You’ve stretched the limits of this platform. This is not meant to do this. And that was pre understanding of what the data warehouse opportunity was many years ago. Then. Now even today, they’ll put the data centrally. But then a lot of the offloading of the processing is done very client side, and then more and more, but at least the centralization of data has become data goes here first. It’s funny you mentioned this thing about Silicon Valley. Many Silicon Valley folks have always understood that the data has intrinsic value and so we should always keep our data close. Conversely, a lot of organizations are being told by those very same Silicon Valley companies, you should offload everything as a service. So it’s an interesting sort of dichotomy in the approach.

But I see more people are saying we’re going to use the service, but we want the data to stay centralized or at least keep a copy of it centrally. And that’s a fairly recent shift in some of the customers that I’ve talked to.

Yeah. And we see that as a very common pattern. For example, if you’re running transactions through Stripe, big Silicon Valley company, right, where you’re outsourced your payment processing, well, very often we see it in our customer data set. You’re going to pull out that data from Stripe, you’re going to get the full log of everything that’s happened, put it in your data warehouse, how you can connect any transactional events on your product, on your ecommerce website to that Stripe payment. And now you can also analyze that Stripe data. What percentage of my payments fail? What is my credit card distribution? Right. How many folks got the special discount? So we see that pattern quite a bit. And in fact, when we started on Outlook, we were counting on the fact that this is going to become the norm, that more and more companies were going to use these hosted managed services, but we’re going to pull the data back into their data warehouse so that many companies would end up with a copy of Stripe data sitting in their data warehouse. And that would allow us to do a better job because our models would see many instances of Stripes data sitting in many warehouses that we could learn and generalize from.

So we were counting a little bit on that. And in fact, we were seeing that play out.

So let’s talk about Anomalo Pulse, and this is exciting stuff. Let’s dig in a bit on the product side, on what we have there.

Yeah, absolutely, Eric. So Anomalo Pulse is a new kind of visualization dashboard product that we launched as part of Anomalous. And it’s in response to a question that we’ve been getting to a lot from a lot of our customers. They deploy normal. They start monitoring some of their tables. They have issues that come up that they resolve very often. We talk to the VP of data, the chief data officer, and their question is, Elliot, how do I know how well my organization is doing monitoring my data in terms of my data quality? What can I look at that says you’re improving based on all the stuff you’re doing, or you’re not improving based on all the stuff you’re doing. You need to do more. You need to focus on this area. And sometimes it’s even a team based and accountability type question, which is how do I know which of my teams are doing well in terms of the quality of their data and which teams are not really managing the data quality? So we need more help or need more focus in that area? And so we built an envelope post, really to answer that question.

And so you can log in and you can see an organizational view of how you’re doing that on data quality. So how many data tables do you have? What percentage of those are actually actively being monitored for data issues? Right. If that’s a small percentage? Well, there’s probably a lot of issues that you’re not catching. If it’s a big percentage, you’re doing well of the tables that are being monitored, how often do they have an issue, which ones have issues all the time versus every once in a while that can give you a sense of, well, where are the trouble spots in your data and where our successes in your data, which things are sort of clean, in which things regularly have issues? And then, of course, you can break that down by team or schema in your data warehouse and all those kinds of things. So that’s Pulse, for the first time, you can start to develop a sense of how are we doing overall in terms of managing, monitoring the quality of our data.

Now dive into the tech a bit, because I know a lot of folks would ask things like what’s the sort of impact and capability mix where you talk about sampling, taking first samples, then ultimately training, and then throwing it at the entirety of the data set. There are different phases in which you would see adoption, but then also what sort of the processing impact? Where do I fit this in my life cycle of data when it comes to because all these applications get this weird thing where the data part of the organization quite often is a very standalone group or a bunch of standalone groups, and then the application groups are functionally separated, and then you’ve got the CIO, who sort of has responsibility for it. There’s a lot of intermingling, and that’s why where does it fit in? Who would own Anomalous?

Right. Great question, Eric. So what we often see and again, this varies by organization because this is also kind of a new area. Right. How you become more data driven and transform yourself. There’s still a lot of thinking and evolution in terms of how these various teams and roles are structured. But what we see emerging at our most sophisticated customers is a kind of data platform team inside the organization. And so the data platform team is kind of responsible for what are the tools we have in our data stack. And the data platform team, in turn, has internal customers, which could be the business teams, the application teams that want to use data. But they go to the data platform team to sort of get the tools for accessing and using data. Very often the data platform team is the one that owns the data warehouse. Right. They made the selection of which data warehouse it is. They kind of own its access and organization. They may not be the team that feeds the data warehouse with data that might be distributed or that might be a data engineering team, but they kind of own the data warehouse and how it exists and how it works.

And then they might also own things like Bi tools. How do we build dashboards on top of this data warehouse data. So Anomalous fits into that most easily, which is the data platform team that’s responsible for what are the tools that we have as an organization to manage and work with data.

The thing that I like that I believe the industry has finally gotten around to is that there is no such thing as a single pane of glass. We’ve learned that it was a sales pitch for a lot of organizations, that you’ve got 47 tools. I’m going to say the tool that will get rid of the other 47. And in the end, you now have 48 tools is what you’ve got. And it’s true, because even if you get it right, even you say, okay, good, we’ve got three disparate data warehouses. We’re going to merge them together, put them in one fantastically, huge, single, beautiful spot, and then you’re all good. No one does that. But even if you do, let’s just hypothetically say the magic occurred. And then they announced that you’ve just acquired another company. Well, guess what? They have seven data warehouses. They’ve got some on Prem. They’ve got some in the cloud. They’ve got three different clouds because they just acquired two companies. Like, there’s never a final resting place for data. Where does this make the Anomalous story important? Because it seems to me like this is where you can really shine, that you’re not saying you got to put all your data here so I can go get it.

Yeah. I mean, we are counting on it’s going to be in the cloud. Right. And so I think the migration to the cloud is a free train that’s not going to stop. And we are counting on that, Eric. But we do support multiple different places that it might be multiple different platforms that you might set up in your organization to query that data. And you can view all of them in one space in an online set of monitoring for all of them. So we have folks that have snowquake and they have a Google BigQuery. I don’t know why they have to. Maybe it was an acquisition, but it happens. That’s okay. You just connect the download. Right. And as far as you’re concerned, all of your data is now in one place. So absolutely. I do think there’s a pretty big push to centralize to get to one. And of course, that’s tough. And I don’t expect everyone to do it perfectly. This is actually one area where Silicon Valley companies start out having advantage because they’re building from scratch. Right. You start out seven data warehouses that you need to combine. You start out with the one you choose that you need to grow over time.

And so there’s a little bit of an advantage to newer firms. But I do think there’s strong pressure and kind of strong momentum to get unified and get centralized.

Yeah. And even if not for continuous real time, at least the centralization for offline and near real time processing has to be done that central location, because what UI? I often see this pattern.

Right.

Well, they’ll have an app stack that’s Google centric, and then I’ll have another app stack that’s AWS centric. And maybe there’s legal or other requirements, like business requirements that drive those decisions, like architecturally, no one would say it’s a great idea. But then now you’ve got the challenge of centralizing that data to a place for processing. And I think they’ve pretty much accepted. Like I said, the cost of doing storage of this data is not significant compared to the continuous precedent, even like a Snowflake. It’s funny you mentioned somebody I’ve got data inside BigQuery and then date inside Snowflake, which if I were to look underneath covers, probably runs on top of BigQuery or like there’s whatever it is they’re running on the same stack that you’re running on. It’s just that they’ve abstracted it to do additional things. So we will see still those patterns of multiple spots. But the central, like one pool of common data, I think, is where people are heading, whether it’s that real time online. Sorry. Like old school mainframe batch and online, we will see that stuff happen where you’ll have a lot of stuff that’s moving to that batch style, but it’s going to be held in a central spot.

Yeah. And in some cases, you can get there fast if you do a daily Lake type approach where your data is stored in the cloud. Right. But it’s just stored as files and cloud storage somewhere. And now multiple different warehouses can process that data. You can hook it up to Snowflakes, you can hook it up to BigQuery, you can hook it up to data Break. You choose which tool you want to use to process that data, but your data actually is in one place. And so we also see that as well, which is kind of a way to skirt around the unification to say, well, my data is in one place, but I might have multiple tools to query it.

Now let’s talk about the team, because I know we’ve talked about some of your background, and I’d love to dig into the rest of the founding team and what your collective view and approach drew you all together.

Yeah, absolutely. So my co-founder is Jeremy Stanley. We were together at Instacart. I was the chief growth officer trying to get Instacart to grow faster. And he was the VP of data science for Instacart and actually had been a data science leader for many, many years. He tells stories about predictive models for mining companies to predict the mine that was going to have an accident, those kinds of things. And together we’ve recruited a lot of our favorite technical folks for a majority technical team, and have also recruited some of our favorite data scientists, folks that we knew would need a tool like Anomalo that are actually now building that tool essentially for themselves. So Vicky, who was the lead engineer on the Pulse product that we just talked about, is a classic example of something like this. Someone Jeremy and I worked with and someone who in a different life would have been the user of and now is building the products she would have wanted to have years ago. So that’s how we approach building the.

Team when it comes to this.

Right.

You’ve been through different organizations, and especially given that your role is chief in the growth side of things. So you’re like a very friendly, nicer version of Chimath Palpatia, but the human aspect merging with the systematic aspect of growth, you’ve seen it at the growth phases. So how does that influence the initial phase of seeding the company? But having an eye on growth gives you an interesting sort of split of how you have to look at things.

Yeah, I’ll be honest with you. They’re pretty different world. Right. And folks ask me for growth advice all the time, as I’m sure they do to Chamois. Or maybe he moved past that. And the truth is, the early days of a company finding product market fit. Getting those first few passionate users has nothing to do with what we used to do at LinkedIn and, wellfront, Instacart on growing. Once you’ve found your core set of users and thinking about, okay, how do we make this much larger, much faster? So those are very different worlds, and it hasn’t been a huge adjustment for me. But it’s a little bit of adjustment to realize that in the early days you’re not operating with a lot of data despite being a data company. Our own in the early days, our own set of data that we could use was really tiny. Right. As we were trying to get to those first initial users in a low product. So there’s an adjustment where you realize that you don’t have a ton of data. You don’t have a ton of things that you figured out that you could double down into.

Right. A lot of growth mechanics that growth leaders that larger companies use. They just figure out what already works and they find ways to do more. You don’t have that at the early stage. You don’t know what’s going to work. So that’s an adjustment. But the thing that’s universal is the idea of experimentation, whether it’s in the early days of a seed stage company or it’s in the growth context of a larger company, you should constantly be experimenting and learning, trying new things and seeing if they work right. And in a larger company, you can direct your experiments more. You know, the characteristics of things that have worked in the past, that you can be very selective in your experiments. In an earlier stage company, you’re kind of trying everything. You’ve got your gut a little bit more. But that idea of experimenting and learning is definitely still a universal thing.

Yeah. It is funny, though. Need you realize how lucky you are when you’ve got the pool to draw from? And it’s why you see building teams, founding teams, building teams, growing teams are often like the stages of a rocket where they truly just will say that the first stage of the rocket gets us to this altitude, and then we shed the stage. And I’ve seen that. So it’s now interesting that you coming in as a founder. You are going to have to survive different stages that were previously not experienced. It must be an exciting and interesting world to now really see this from zero to one phase of the company.

Yeah, for sure. Eric and I’m actually super cognizant of the phases you’ve talked about because I want to make sure I adjust. I have many experiences in past companies where I came in in the growth stage as a growth leader, and I have this portfolio of techniques and strategies. But the founders are still in the foundation stage. They haven’t made the leap. They haven’t realized yet that you have data. You have a base from which to build. You can double down into things that have already worked. You can be selective. Right. In those situations, I’ve had to convince folks that my approach is a good one, demonstrate results, prove that my approach is the right one for that stage. Of the company. And so I’m very cognizant of that and making sure that when that stage comes and I think we’re inching into that growth stage now in our company’s trajectory and all those trajectory, I want to be very cognizant that. I kind of make that switch in my head and say, okay, we can start to use some of those growth strategies now.

Yeah. Now that you have those levers available, you expose those levers to the business all of a sudden, but you have to build and discover those levers to begin with. And how did you find that very early phase in seeking product market fit? You talked about the customer centric hiring in that you’ve effectively built a team on people that would be consumers of the product. So that will very strongly influence the way you engage with those early prospects and customers. So what was that first phase of finding the development partner customers and such like?

Yeah, you know, to be honest, it was easier than I thought because precisely because of the team we built, we could go to our network and we could find customers from our network. So all of our initial customers, all of our initial design partners were folks that we kind of got connected to through our network, and we had a relationship with, and they agreed to help us out, and eventually they became paying customers a phenomena. And so that’s a pretty powerful way. If you have a network or if you can recruit a team that has a network into your customers and has access to your potential customers, that’s a pretty powerful way to get started. Even LinkedIn back in the day, Eric actually started like that. The first folks invited to LinkedIn were in Reid Hoffman’s network, invited all of his all the PayPal folks and his VC friends, and that formed the core of the original user base. And he could get them to accept because he had a relationship with them and he was Reid Hoffman. Without that network, it would have been a much harder road.

Yeah, it is very interesting. And as far as the product market, fit is often a challenge to find, depending on the friction in which you can consume the product. And that’s why I admire your approach in that, obviously, data has to be in the cloud. All right. It’s kind of a binary thing, but you’re not saying that you need to relocate your data in order for us to be able to make use of it. That is the big thing. There’s a much lower friction to bring Anomalo in which versus many other companies, they find this thing of like, yeah, we’re going to do strict Mason stuff with your data. We just need to move it all over into our data warehouse in order to do it in networking. I used to struggle with this all the time, especially on the consumer side. Every single product you’d buy that has fantastic network monitoring these different tools. Oh, yeah. All you need to do is make sure that we’re routing all your data through this endpoint like we do that seven times already. For all these other things, I can’t continue to reroute my data. And eventually they learned that thanks to software defined networking, you can put virtual taps all over the place, but it easily be physical like it.

That’s how Gigamon became a business, because the idea of aggregated span ports so that you could monitor data flow, that created an opportunity. And now if you told somebody, I need you to route your data through something, they’d be like, you’re nuts.

That’s right, Eric. Lowering friction is a big deal. I would argue lowering friction is one of the most innovative things we can do in many years. And you’re absolutely right. Anomal doesn’t require your data to go into our data warehouse. In fact, we will often to point out where we just sit in the same cloud environment as your data warehouse. Right. So your data never even has to leave your cloud. We just push our application to your data rather than your data having to stream to us or anything like that, or us having to query it and send some results back to our cloud. We just sit where the data is. And then the other element of friction that we reduced, Eric, is just the setup friction because you don’t have to set up rules or tell us what to look for when you set up an ammo. That’s another thing that our customers really resonate with. You can do a few clicks and you’re monitoring your data now, right. And you can fine tune it and customize it if you wish, if you want to go deeper, but you don’t have to get it up and running.

You don’t require a $180,000 professional services engagement to go through proof of concept, then, which is no, not at all, not at all. And I say it, I partially ingest, mostly tongue in cheek, I guess, because I know that’s out there. Right. Like the complexity of the problem that you’re solving usually would require a lot of human interaction and a lot of human development of understanding the business, understanding the policies, understanding the flow. God, I hate to say this word because it came to mind right away. It’s game changing in that it is fundamentally changing how easy it is to get started. And then at that point, now, platform, implementation wise, what’s the most common time frame that folks expect if they say like, hey, alright, I saw Elliot on LinkedIn doing something, I’m going to reach out. I want anomaly in my environment.

Yeah, pretty fast. Obviously there’s legal things where we have agreements and security stuff and all those kinds of things. But deploying an on low in your partner probably takes about an hour to get it up and running. And then maybe another hour to get some things configured and you’re up and running. Right. So we literally when we have a new customer, we book two 1 hour meetings, one to install the product and one to onboard you into the product. At the end of that onboarding, you already have it configured and monitoring critical data in your data warehouse. So that’s all it takes.

You’ve won the friction game. Absolutely. The most friction free implementation. And what I love about this is I can be way more excited about your product than you need to be because for folks that do listen to the podcast, they know no one comes on here because they say we need to talk about our products. In fact, usually I’m the one that’s pulling it out of people because I am excited about what you’re doing. Again, seeing my own experiences in this type of implementation and the complexity that we’ve usually faced, it’s pretty big. And really it goes to the core of the team and your approach, which means that future growth, future development will carry that model forward because that culture seems to be like ingrained in the ethos of the company, which is refreshing, right? That’s where it needs to be. Instead of having to take old methods and then gently refactor them like, no, we’re throwing about the old game plan and this is how it goes now. It’s kind of refreshing.

Thank you, Eric. I mean, we’re definitely trying still an early stage company, still small, so still a lot of things to build and a lot of work to do. But we’re definitely trying and we’re pretty excited about the momentum we’re seeing and how well the product is working for our customers.

I guess I should ask one important question. Really? Who is your ideal customer that will be able to quickly find that fit and value out of the Anomalous platform?

Yeah, absolutely. Anyone with a cloud data warehouse, that’s the first step, right? If you haven’t set up a data warehouse yet, then probably you’re a little too early in your kind of data maturity, data lifecycle. You have a data warehouse. That’s great. We also look for folks where they built out a data team. Right. So again, if you have if you don’t have much of a data team, probably you just aren’t powering enough things with data yet. Experience the paint, update issues and data quality. But even once you have a data team of five or so, well, now you’re probably feeling that pain and we can definitely help you.

Odds are once you throw the first person at it, there’s a reason it gets to five fast. They start feeling that pain pretty quickly. I remember back it used to be like ETL people, and that was the whole big thing. It was just like just getting data between places and they’d have teams doing ETL, then you got DBAs doing the back end. It’s like all of this thing we’ve moved the function and the roles a bit. But in the end, there still is a lot of that really understanding where business logic comes in. And this is why this agnostic, data driven and literally data powered approach that you’ve got makes the move to taking on the platform a lot easier because it’s time and time again, it’s like you come in, the first thing you have to do is set up 17 interviews with product people, and they’re even arguing in between about how it really should go. They sort of unpack this awful family history of where the data came from. You can just be like, okay, no problem. Just plug this in. We’ll be back in an hour, and then we’ll talk about what your data really says.

Yeah, exactly right, Eric. And what a lot of companies are doing is because of this issue, because of how difficult it is to kind of get to ground truth synthesize. They’ve actually just decentralized the management of data. Right. So product managers, well, you own this data set, right? You figure out what’s going on here, and it’s actually another reason to get a tool like an envelope. We are no code, low code tool. Right. You set us up and it goes and we’re going to root cause things and visualize things for you in a way that almost anyone can understand. We don’t require you to kind of understand obscure error messages or parse logs or even query the data yourself. Right. We’re going to do a lot of that work for you. So we’re actually accessible to anyone in the organization who cares about a particular data set. And so we’ve actually helped a lot of our customers kind of complete that decentralization or democratization to sort of be able to push the responsibility for that data set to the product manager or to the team that cares about that data set, rather than having to have folks and data engineering and other functions sort of synthesize all that information from all the various parties.

One thing I’ll ask and I hate to ask a question which I know can be a tough question is how do you deal with things like data separation for regulatory stuff, you’ve got role based access control, lots of different access control lists that are spread throughout these data sets. Where does that come into? How it interacts with an omelet?

Yeah, we’ve had to build all that, Eric. So we have a financial services customer right now, actually, two of them, where data is heavily restricted. If you’re in the mortgage group, you cannot see the data from the banking group and vice versa. And so we have to build that functionality. We have separate teams and organizations within anomalous Nomalo itself can see everything. But if you log in from the banking team, you only see the banking data. Right. And if you log in from the mortgage team, you only see the mortgage data and so we’ve had to build those access controls. And, of course, we integrate with things like Octa and other tools that the enterprise might have to sort of appropriately associate users with teams and with the access that they should have. Good.

That wasn’t so much of a curveball then. I was like, this is probably not the question. You just sneak in at the end of the podcast. The hardest possible question. Let’s talk to the CISO right now. So it’s good. Yes. And another thing that we definitely are seeing more and more of is this where the ethics of data usage and the ethics of data storage and the business rules that are wrapped around that and the legal and regulatory stuff, it creates a real challenge. The truth is, most of these teams, they do their best, but quite often, they don’t even realize how exposed some of those data are to each other. Because what we believe is this true data isolation. There’s many Internet connected systems, so there’s always a path to get from one place to another. But it is, I think, top of mind for CSO and Chief Data Officer. Right. I guess. Is that a role that’s really becoming a.com like CISO got it right. That came in with Sarbanes Oxley and other regulatory requirements. They’re like, you need an officer who is charged with this function, but the Chief Data Officer, it’s still kind of a fuzzy function.

Yeah. I mean, it’s not as well adopted, but it’s coming. We see it all the time. Right. That’s typically we typically interact with a chief Data Officer or someone who’s active at that officer. Maybe they have a VP of Data title or something along those lines. But, yeah, it’s coming. There’s enough complexity in the data the company is using in the system, powering that data in the data teams are large enough where you need an executive driving your data strategy. Data is critical enough. As you said, Data is the new crude oil well. You need an executive who’s going to mine that oil, if you will, and figure out how to process it. So we see it happening quite a bit. And even at older companies that have been around for a long time, or maybe Data was part of their It team, and the CIO used to be in charge of data. Now they’re either opening up or they have data Officer. The other aspect of this, Eric, is folks have realized that managing data systems and getting value from data is different than engineering. Right. It’s different than building other systems, different than building applications or setting up networking.

It’s a different skill set. And so that also kind of created an opportunity for the chief Data Officer to emerge because they can truly have that Data skill set. Rather than starting with an engineering skill set and learning about data, as folks used to do in the past, well.

They effectively become the F one driver to a fantastic F one car team.

Right?

The engineers that build that car will never be able to drive it and get it to perform. So they have to have a specialist that’s like, this is your singular function is get the most value for the least expense and least risk out of these assets and allows them to shape the strategy for it. It’s kind of funny. If you think like 20 years ago, especially you’d hand somebody a business card, it would say VP of Data to be like, no, seriously, this is a joke, right? What does that even mean? This is not a real thing. How many people do you have on your team? You’re like, oh, I’m in charge of the data. We’ve come a long way in a seemingly short time, as far as the dawn of Earth, at least.

Yeah, you’re definitely right, Eric. And I love the Formula One driver analogy. I’ve been even surprised that they’re now product managers of data. Right. I was in product manager for a long time, and normally product managers are for features. You can be the product manager of this page or this flow. Well, data is so important and so integral to product works that we see a lot of customers where they have a product manager. Right. It is kind of coordinates and orchestrates and strategizes a lot of the things that they do in the data work.

It’s where we’re going. And I say, it’s where we’re going. It’s where it’s already going. And I think this is even any organization at least should have a sense of what their strategy is, whether they’re tactically moving towards it as different things. It’s kind of like sustainability. Every time somebody says to me, yeah, we’ve got a sustainability initiative, that’s fantastic. What have you done in the last twelve months to enact things towards this strategy? Like, oh, we’ve got a steering committee. Okay, perfect. But data is a very real thing. Not that sustainability isn’t. I shouldn’t pick on that, but it’s like people say they’ve got a data strategy. What have you done about it? And this is a place where you can find a great fit. All right. I am so happy. Thank you. So again, Elliot, if you want to give up, what’s the best way? If people want to find out more about Anomalo, obviously have links to the website and such. If they want to reach out to you and maybe dig in a little bit more on the platform itself, what’s the best way to do that?

Absolutely. So just go to anomalous.com it’s A-N-O-M-A-L-O. It’s kind of like anomaly, except with an O at the end instead of a why? Check it out. There are demos there. There’s documentation, there’s all kinds of resources on what the product can do, and feel free to contact us there if you want to try it. And, Eric, thank you. So much for having me. It’s really been a pleasure.

I got to say one quick thing too. For people that are about data, you got bloody good designers like your website is just very captivating. I really enjoy the user experience of the way you do your so for people that are living in data you’ve got a bloody good design mind on you.

Well, we’re big believers that you can’t get insights out of data unless it’s visualized in a really compelling way, right? That’s been something we’ve learned over the years and so yeah, we have great folks that are not just great designers but great visualizers of data that contribute their expertise to the product. So absolutely thank you.

If visualization didn’t matter, people would drink sushi smoothies. We don’t it’s even disturbing for a moment to think about it, but yet when given the right visualization fundamentally different and this is it. Well, Congratulations on all of the recent successes and on future successes that you and the team are going to experience. Elliott thanks very much.

Thank you so much, Eric thanks for having me.

Tag: Data Quality

Ep 227 Elliot Shmukler of Anomalo on Data Quality, Growth Strategy, and Finding Product Market Fit

Transcript powered by Happy Scribe

Recent Posts

Recent Comments