We sat down with our CTO, Michael Chrzanowski aka Chrzy (a nickname that proved to be almost unpronounceable to our English-speaking colleagues), and asked him how our data department is dealing with the expansion into Italy and Spain, how data analytics relates to language, what not knowing local terms and acronyms leads to, or if he's afraid of data analytics in the Greek alphabet (spoiler: he frowned a lot). So if you're interested in the background to our expansion into Europe or want to learn more about our CTO, read on.
The first step is to move from on-premise to cloud, mainly because of horizontal scaling. One of the main challenges we’re facing is the volume of information we have to process. In terms of the amount of data, the Czech Republic and Slovakia are at ten or fifteen percent compared to Spain and Italy. We can process the Czech Republic data in five days, but suddenly we have something ten times bigger in volume.
The second thing is the transition from a manual to an automated process. Up until now, we've been running everything manually, using simple containerization to start the data collection, and then using remote procedure. In short, there were time delays and the process required constant attention from our team members, otherwise it stopped and delayed the client output.
For Market Meter, it was like this even at the start of this year. Now we've taken a huge step forward towards automating and distributing the calculation. This means that if the volume of data inflates ten times, we don't wait half a year for the result.
The third thing is that we are looking for information in text and facing linguistic diversity. And it's not only language, but also local customs. For example, there are three or four ways to write down an address. But if I don't know the language and the written customs of the country, I won’t be able to decipher it. The problem is also the way of shortening words, which we have to learn.
We're looking for some good existing solutions to free our hands so that we can think more ahead and not just address acute needs. There are approaches based on machine learning or AI, we just have to find the right one. The idea is to reduce labor and error and simplify application to other countries. Because what I call a bar overlaps with the definition of a fast-food restaurant, a disco or a pub. Plus it might look totally different in the Czech Republic, Poland or Spain. To deal with this manually takes a lot of working hours from our team, which we’d like to avoid.
No, there's also culture and customs mixed in. But knowing the language is key. Because without it I'm unable to read the information in front of me and I'm not even able to guess what it means. Not even in Polish, which you'd think is close to Czech. For example, I recently learned that piwnica is Polish for cellar. (Editor’s note: The word piwnica sounds almost like a Czech word for “pub”, hence the surprise on Chrzy’s side)
For us, a lot. You can look for information in a text field using word order, word frequency or habits and language rules. Structurally settled languages work better in this way. For example, if I change the order of a few English words, the whole sentence might still make some sense. In another language, it might give me a nonsensical word mess.
It has to do with the validity and accuracy of the data, especially for longer text strings. For example, have you ever tried to understand reviews in Spanish without knowing the language? Not even Translator can help you there.
This leads to two problems: Everything takes a long time and you don't know if you got it right. And our team places a great emphasis on doing things accurately and turning in the best work possible, i.e. in our case the best data information possible. But then you're faced with poorly maintained data on primary sources, linguistics or cultural problems, and you're looking for a way to make a viable product out of it.
For example, matching outlets by address is more problematic than we originally thought. You can have a shopping mall with several restaurants. And they'll all have the same address. So you can't use address as a matching constraint in a straightforward way.
Plus, a lot of that information is human-made and nobody verifies it. Sometimes an outlet closes and a new one opens in its place, but with the same or similar name under different owners. But nobody updates the information on Google and you have no way of knowing whether it’s right or wrong. There's a lot of uncertainty at play there. And this is why a lot of companies fail at this. It's not a simple thing. But we can do it.
We’ve talked about Greece, for example. It is a lucrative and interesting place to explore from the on-trade perspective. But their alphabet might prove challenging to conquer.
That would be a shame. We just need the support of someone who knows the language like in Spain and Italy, where Elena García Vargas and Elisa Arietti from the Business Development department help us progress our work further. It's all about that initial onslaught of unforeseen obstacles, then it goes into some sort of stable phase and routine.
We use a lot of heuristics. We translate information into numbers, and we try to settle the numbers - verify them. And either it makes sense or it doesn't. When it comes to approximation, it's increasingly leading to machine learning. But that needs a huge amount of input and test data for the system to learn from. You need to put together a good training sample on which it can learn. It's a whole field called data science. We plan to pursue these possibilities to fine-tune our products to perfection.
If I were to ask my long-time colleagues from previous jobs how they perceive me on a work level, they would say that I can be insufferably arrogant and aggressive at times. But here it doesn't happen to me at all! We always work things out calmly. Plus, we're a small team, we're always discovering and exploring new frontiers, sitting at the same table and having instant feedback to move projects forward. I'd put it this way: We don't have to write stuff on the walls to make things happen. They just happen.
At the same time, I like the business-aggressive nature, like the expansion into new markets and the overall openness to new ideas. Thanks to this, I can also use my experience from the OLTP (online transaction processing) sphere, where I worked before and which is quite different in its focus on speed and immediate reactions rather than large volumes of data.
I like viable projects and I'm interested in how the world works. That's why SharpGrid is a perfect place for me. I've found a lot of beautiful things here that I want to solve. I couldn't have opened a business on my own as I sat behind a computer for 30 years, so it probably wouldn't work (laughs). But to make things work, that's nice. That's what I enjoy.