Interview: Without open-source, we wouldn't be here.

This is the ninth in a series of interviews with members of the Machine Commons supplier Collective. Subscribe to the site to be alerted about future posts, or become a partner today!

Oleguer Sagarra runs DRIBIA, an AI solution provider that pride themselves on an academic mantra and a focus on bespoke algorithm design.

How’s life been during the pandemic?

“I live in Barcelona. We’ve been in a hard confinement for four months.”

“It’s been hard for so many people – we’re not ‘OK’. Personally speaking, however, it’s been the perfect time to have a kid!”

And business?

“We’re growing now, after the shock clients seem ok.”

“Essentially, we provide services, so it’s not how we reacted, it’s how the client reacted. As a company, we’re flexible. It’s not new to us to work remotely. The profiles who work DS are typically looking for freedom and flexibility. Data Science projects are agile by nature and we’re very used to flexible projects.”

“The problem was with the clients: the typical reaction was at first denial, then once the restrictions were all over the place – the CEOs said, ‘stop everything’! And what’s the first to go? Innovation and research projects.”

Always the most important longer-term objectives dropped first. C’est la vie.

“The first months were rough. Then eventually people realized that the way forward was with more analytics. More prediction not less.”

“At least it gave us a chance to do internal work. To find better tools, to improve our code base.”

How has remote life treated Dribia?

“As a mostly remote business already, we’ve already had to work on our employees’ mental health.”

“There’s a lot you miss by not being in an office, the coffee machine, the small things. There are so many things you miss without that environment, ideas that slip through.”

What do you do to mitigate that?

“We started getting online beers and having online knowledge share base (internal wikipedia) so people can share knowledge in other ways.”

“Overall, the free time that last year gave us, we’ve invested it in a good way, internally. We’re much stronger than we were going in.”

Have you had to worry about monitoring staff, employing time tracker technologies and so on?

“When the main asset in a company is brains, you shouldn’t worry too much about monitoring. We use trackers but not to monitor people, only to monitor projects. Our business is about providing ‘services’ per ‘time’.”

“I mean the main thing we did was that we bought cool chairs and screens and sent them to everyone at home!”

So, what has changed in how you do things?

“Another change, and this is really good."

"There are many, many meetings we no longer do – the main change has been in the culture of our clients, as we were already meetings-averse. The countless unnecessary meetings you have to attend do because you’re a ‘consultant’.”

“I think clients have realized a lot of this were unnecessary.”

What’s been the biggest challenge?

“As a service business we do a lot of workshops. Some have always been online, but man they are so hard. It’s very hard to create engaging dynamics with people in a different environment to you. Difficult to engage in creative stuff, so we still do workshops in person, however, we have experimented and developed online versions of (some of) our dynamics.”

Can you elaborate, what changes when you meet virtually?

“Workshops are based on group work, so you have to change the way you do it. You can’t expect people to engage for an hour and a half when they’re home.”

“Had to do breaks, organise activities. Using blackboards and post its is ok, but there’s fancy stuff you can do, like using type forms and bringing quizzes into play.”

“When virtual, you need to switch on the cameras – so you can’t have anything else on screen.”

Let’s take a step back, what do you do?

“We’re a data innovation studio. We provide tailored boutique solutions using data.”

“This involves:

Identifying those solutions, let’s call it the strategy; and,
Solving them using custom code.

…We work with open source, don’t have licenses and document everything to enable giving the code away at the end of the project.”

What do you love about data science?

“One of the things I really like about data science is that all the algorithms are more or less the same, but all the projects are so different, they take flavours you’d never expect."

"Very different industries. You discover so many companies that do so many different things.”

I guess it’s a bit like cooking. There are base ingredients, but countless dishes you can prepare.

Any awesome projects recently?

He begins listing such an incredible variety of use cases.

--> Catching cheating newspaper agents.

“We had one client, which was a company that sells newspapers. They sold subscriptions, with the sales efforts focused on kiosks – where customers park their car and get a newspaper. There are some owners of these places that cheat, storing the details of their customers directly to cut out the company.”

“We analysed the temporal dynamics of these places (basically the peaks and troughs of sales) and we started to see clumping in the data. You begin to easily see different kinds of fraud."

"That’s a funny project we’re doing right now!”

--> Predictive maintenance of plastic bag machinery.

“Another was an industrial company, based on plastic bags. Factory lines producing bags the entire year – this is crazy – and we’re helping them analyse for predictive maintenance of the machines."

"It helps give you a peak at the complexity of the business world. It thinks this is a privilege.”

--> Helping businesses select busy profitable locations with mobile data.

“Another really cool one was for an American retail company. Essentially, we solved the question of ‘where to put a store?’. Let’s say you want to place a business and you know what the business will be about. You tell us your target, say women of (X) age and (Y) income, and our task is to tell you where to put the (Z) shop.”

“We got our hands on an enormous data set, of 8 million people: where they are everywhere every minute of every day. It’s like 30gbs of data every day."

"You predict where they live, protect anonymity by sticking all the data together. Then you know where these people go, and you can build a predictive algorithm.”

“For every single place in the city, we could predict on average how many people will go by every single place. And we beat the marketing location expects!”

Always satisfying.

--> Recommendation system to increase online basket size.

“We built a recommendations system for an online brand. We asked the marketing guys to make their own recommendations and then we can confirm their efforts against our algorithmic approach.”

“They can do one set of recommendations for the entire data base and we can do one recommendation per person [based on their unique digital fingerprint]."

"The algorithm is more than ten per cent better. It’s not a competition with them – it makes their life easier, not doing these sorts of repetitive tasks.”

--> Where map directions meet electric car battery prediction.

“Once we built a kind of Google maps but for electric cars. When you drive these things, you’re very worried you’re going to run out of battery.”

“We needed a model of how these machines consume energy on a road. Temperature, slope, type of road, and the speed. Then you need to consider where the charging stations are and optimise the entire drive time.”

“It’s an interesting project because the client didn’t have any data on the consumption of the car, we had to build it from scratch."

"We had to go do the physics research, get the formulas out, test them and compare them with real cars. And we did pretty well!”

“We were about 3% off every 100kms. That was a really fun project. Built entirely using open-source tools.”

Do you have a favourite use case?

“My favourite thing isn’t a single project, it’s the entire experience. Nothing in common besides data. Using science to find practical solutions.”

“Convincing the client side people that you’re complementing their work, so they can do their jobs better.”

Tell me more about the open-source element.

“Without open source we wouldn’t be here. What we do, what other businesses in the Machine Commons do, simply wouldn’t have been possible ten years ago.”

“We acknowledge the great work done by larger companies (Uber, etc.,) and scientific communities. This is the future. Making decisions by algorithms need to be explainable. How a prediction was made. And the way to do this is through transparency and documentation. Which means open source.”

“Your only limitation today is your knowledge. You just have to find the right pieces and stitch them together like a logo. It’s all built on the work of others.”

“Providing differential service to a client because they can change provider, makes a difference how you put it all together. Not afraid that the client changes provider and the code stays with the client. The client then decides whether to make the final result open source.”

Open source in this way is more maintainable because you’re not only standing on the shoulders before you but giving and taking in a hive-minded community.

Any thoughts on machine learning in general?

“Firstly, for machine learning professionals, we should be conscious about the hype. Everyone talks about deep learning but there aren’t many companies in a position to do this.”

“We should focus not on the tools but on the problems. If a linear regression works for you, that’s great.”

Right, don’t complicate it. KISS.

“The focus should not only be better data or models, but a better culture of management, improving the ability to translate complex problems to simple use cases that we can solve.”

“The main blocker to machine learning adoption isn’t technological, it’s cultural.”

“We need to keep on evangelizing that it’s not magic, so that people in non-technical terms can help you sell it into the company.”

Any predictions for the future of machine learning?

“I can think of three really interesting trends worth looking at that will shape the future. It’s much harder to make predictions, as the probability of being wrong is much higher than being right!

Here are his trends:

“…

1) Differential privacy is essentially a mathematical framework that allows you to ensure when you disclose aggregated data that you can’t disclose data you shouldn’t disclose. It has many applications. The drawback is that when you’re trying to provide privacy, things get much more complicated. The computational load is higher, the complexity is harder, and the predictions are worse.
2) Causal inference and causal machine learning. Today, machine learning tries to find correlations; on the other end business problems tend not to be just predictive: business problems are causal in nature – tell me what I must change to change X.
Bayesian probabilistic methods. This is a set of techniques that allow you to include a lot of things in business such as business language and apply it mathematically. For example, today you want to predict the length of the interview. You know these go between 15 mins and 45 minutes. But there’s no limitation, it could go from 0 to 3 hours. So, if you tell me that, without data, you already know the typical length, these kinds of techniques allow you to add this anecdotal business knowledge into your predictions. It’s a stupid example but imagine you must do A|B testing of two adverts – which one generates more clicks. If you apply traditional ML, you only get a number. One outcome is 53% and the other 52.9%; you’re just getting one number, so you don’t know how certain you are, there’s no probability distribution. Bayesian methods allow you to add probability distribution to numbers meaning you can consider risk. Also, if you don’t have enough, you can add this business knowledge (like in the interview example). Frameworks STAN, I prefer pYMC, piro made by Uber.

Come on. Make a prediction about the future.

“Whoa, this is hard.”

“Personally, I don’t believe in AGI (artificial general intelligence). Let’s say ‘artificial’ as in machines doing human things. But I do believe it will be much more widespread. A utility, that hopefully will prevent humans doing boring repetitive stuff.”

“A lot of really boring work is still happening today in companies. If we live in a world where people live well enough, if salaries are high, there will be incentive to automate staff. Currently much cheaper to just hire people for automated stuff.”

I couldn’t agree more. The utopia I’m hoping for is less monotonous.

What else have you got for me, anything about the impact on wider society?

“A more general cultural understanding of what algorithms can and can’t do. They can’t predict what you’re thinking as is seen on tv. It’s just not true. I hope people will be knowledgeable enough to understand where it applies or not.”

“A lot this will be automated, but you’ll still need professionals to guide selection of the right algorithms, etc. But it will be much better packaged much easier to do. Businesses will have a culture where people will know that to estimate units to production, they first need to predict demand, etc.”

“These things today are still considered very innovative, but in the future, this will be the everyday work of companies."

"Every company will have a data analytics team and it will just be a normal department, but I mean everywhere. Not just digital service companies.”

That some companies still don’t have a data team is amazing to me. I’ve heard this before, with a comment on ‘literacy’ from a previous interview: "People are almost considered illiterate if they can’t code.”

It’s so true that if you don’t learn at least the implications of data, you’ll be left behind.

“The businesses that don’t transform digitally, not just having data but having it accessible and in good shape and have people who are knowledgeable enough to exploit that data, not just better decisions but automating stuff...the ones that don’t do that will have really hard time scaling up in the future.”

This naturally means machine learning will be involved in just about every process.

Any last words – perhaps something you think is really important?

“I believe everyone should care about data privacy. It’s important we raise awareness on the value of data, both the good the bad; data is such an important asset that it should be controlled by the citizens themselves.”

“This doesn’t mean renouncing it, but if you want to have a sustainable economy of data, we need to find more creative ways of how this data can be shared with different actors.”

“In a way that not only Google has huge amounts of our data, but in ways people can donate data for collaboration – otherwise we won’t solve big community challenges, or ecological challenges like climate change.”

“Citizens should look appeal to their authorities because laws should exist that make it easy to know where data is, how it’s managed and to direct its use.”

“Doing this will benefit everyone. Not just authorities, big or small companies. But literally everyone.”

Without open-source, we wouldn't be here.

This is the ninth in a series of interviews with members of the Machine Commons supplier Collective. Subscribe to the site to be alerted about future posts, or become a partner today!

Recent Posts

Comments