Kaggle Competition Master? That has a nice ring to it, I say.
It’s never enough for engineers like Andrey Lukyanenko, they tend to be a people that always strive for perfection: “Grand master would be better!”
He tells me the way it’s calculated. It’s one big competition and when the period ends, there are a certain number of people awarded each of the gold, silver and bronze medal tiers.
“It normally depends on the number of people who take part in the competition.”
He received 2 gold and 1 silver to be awarded competition master.
I find it interesting to note the award tiers have continued the Olympian tradition of medals - the award for expertise in sport - because they symbolically transfer the same degree of kudos; the engineers like Andrey perform mostly thankless, countless hours to train themselves (and their models), working normal jobs (he’s a data scientist) to solve problems unpaid.
I ask what it would take to become the Grand Champion; “5 gold medals, with at least one won in solo”.
What’s prevented you from this glorious title?
“I haven’t pursued it yet, to be honest because I thought I wouldn’t be prepared - so I didn’t spend a lot of time taking part.”
His normal 40 hour work week is sandwiched by coding before breakfast then rushing home to continue gratuitous work in the evenings.
“It’s a really interesting and unique competition and I thought I wanted to take part in it!”
Over 2-3 months, the length of competition, he *only* accrued about 200 hours.
“I spend time on Kaggle in the morning, after the job, on the weekends. A lot of free time goes towards these competitions.”
Do you win anything?
“Money goes to the top several places, I think the host of the competition decides. Perhaps somewhere between the top three and top ten places are paid.”
“Sometimes the prize pool is $5,000, sometimes 100,000.”
Assuming the prize pool is 100K and, say, the top ten split it, that’s an hourly wage of $50. Bear in mind that these victors are probably putting in more than Andrey’s “only 200 hours” and there are huge numbers of people that enter this competition, to leave unpaid.
What is your main motivation?
“I think there are several reasons. One of them is that there are currently a lot of people who want to be a data scientist, or who already are, so for success you need to be better than a lot of people.”
True.
Kaggle is one way you can prove these skills. To demonstrate that you can work consistently on one problem and dedicate the time to solving it.”
I come from the marketing industry where there are rewards for tying your shoelaces - shoelaces that you already tied as a result of simply doing your job, so find this incremental dedication to competitive excellence intriguing, impressive and slightly intimidating.
“Yes, it’s nice to be recognised: there are very few ways to compare your skills directly with other people.”
I’m still not quite understanding the motivation for so many hours, until he raises a really interesting point.
“The competition allows you to work on very different problems. When you work in a company, they have their business needs and problems, which is great, but Kaggle competitions allow you to work on completely different problems, different types of data and so on, which is very valuable experience.”
“In the real world, building machine learning models is not the whole work, you need to define the problem, and so on. But: the core element of the ML engineer is the maths, the quality of the models. Unfortunately, the opportunity to actually refine these skills at work is limited. In most cases, you need to do a lot of different things, ML only takes 10-20% of time. But in the competitions, it’s entirely about the machine learning.”
People who find what they do interesting are always the most interesting people. I ask him more about the specifics of his competition entry.
“This was a chemical competition. There are molecules with atoms that have various attributes, so we were tasked to predict the magnetic interaction between pairs of atoms (the scalar coupling constant).”
“It takes a lot of computation - quantum computers are needed. So the idea was: maybe we can use ML to make ‘good enough’ predictions, in much less time, so that quantum researchers could potentially use our work to run more experiments.”
I remember an old conversation with a quantum researcher who told me he used to have to pay for quantum computer time with tokens, of which he was allocated very few. One of the main barriers to the advancement of quantum computing is something referred to as ‘fuzzy’ logic, where results aren’t easy to interpret/predict and therefore leaves computation mostly unusable.
“In fact, some approaches are already in use - something called a graph convolutional neural network, and we tried to make something better. They take the whole molecule as an input and can predict the value of the attribute that we need. It could predict anything, but in this case we predicted the scalar coupling constant.
He briefly explains that the scalar coupling constant shows the interaction between atoms.
“The implication is that researchers can understand better insight about properties of molecules and result in better understand predictions of the properties of new materials. They could design molecules for a specific task.”
By quickly predicting the value of a property, machines could speedily design new medicine, make stronger materials, etc. - fit for purpose molecule design. That’s pretty cool.
I was surprised to hear the work from Kaggle competitions are actually used in the real world.
“At least partly. In competitions, people aim to get the best possible score. They spend hundreds of hours training models. In the real world, usually no one would use the whole solution but people take parts of these solutions and integrate this with their own work for a better overall result.”
“There was a plan to have some kind of joint publication between the top teams, addressing new architectures that were developed - because each team practically designed their own custom neural architecture. Each one deserves special attention and this work would present the technical part of the solution and demonstrate how this could be used in the real world.”
Guys and gals, we’ve gamified technological progress. Making it abundant and mostly free in the process.
I ask him what he’s paying attention to in ML field.
Last year, for example, there were a lot of big companies creating big models in NLP (Google, etc), releasing better and better models on benchmark datasets. They take a lot of computational power.”
“I’m not interested in those specific things, I’m interested in the maths, in the efficiency and what they could achieve. Some smaller models may be able to achieve the same performance.”
People forget, sometimes true progress is just doing the same thing better.
...Future of ML?
“Well, there will be no AI - a thinking machine, I think that’s completely impossible! ML will be better standardised, and automatic ML will be better developed.
“ML will be used in almost all companies because it will be much more approachable as a field.
Think about when web dev started, it took a lot of expertise to do things. Now, even small businesses can web dev with little cost. It will be the same with ML.”
“A problem could be that models will generate content - text and videos - much better. It will get to a point where it’s very difficult to tell whether it’s real or generated and this could become a problem.”
“I hope this is solved though I have no idea how this would be solved - the cat is out of the box, you can’t stop technology, it’s not possible! Mwuhahahaha.” (Ok, the mwuhahahaha was mine.)
Maybe regulations will save the day, but we suspect machine learning will be the only thing that could combat machine learning. Perhaps future media providers will have automatic ‘real’ detectors, such as Facebook warnings on videos.
What are you most excited about?
“Difficult to say... I’m really excited about engineering development, about tech development in the world. We already have hoverboards, and there will come new machines, maybe new tech will allow us to do new things that we thought as impossible. Magic is just sufficiently developed technology!”
What should aspiring data scientists know?
“People should remember that machine learning is not magic. It can’t do everything. It depends on algorithms and the data. Without the data, it can’t do anything. With the data, well: it’s still just a tool that solves specific problems.”
Comments