The data science revolution is sweeping the investment industry. Some firms are looking to create an in-house solution combining artificial intelligence, machine learning and data science. Picking the right data set, and separating the wheat from the chaff, is critical.
Of all the data in the world, 90 per cent dates back to the last two years. Every day, 2.5 trillion bytes are added worldwide. Fortunately, machines process data up to 2,000 times faster than humans. But that does not alter the fact that the candy store is simply too big, acknowledges Herman van der Sluis of pension investor PGGM. The problem is not the data processing by the computer, but the question the investor has to answer: which data does or does not add anything?
The largest, most sophisticated parties in the investment industry, such as PGGM and APG in the Netherlands, are working to bring the future in-house - which is one of artificial intelligence, machine learning, alternative data and data science. To ensure that these techniques and datasets add something to investment processes, many specialists are being brought in, such as mathematicians, econometricians, software specialists and data scientists. They have to help determine which traditional and alternative data adds value to the investments of PGGM, the in-house investor for Pensioenfonds Zorg en Welzijn.
Herman van der Sluis (photo), who is lead portfolio manager of Investment Analytics, said in an interview with Investment Officer that the data science revolution currently sweeping through the investment industry is the result of three developments coming together. These are the emergence of a huge amount of datasets available to investors, the development of advanced analytical techniques, such as artificial intelligence, machine learning and natural language processing, which can be used to process this data, and the availability of particularly powerful computers to actually perform these analyses.
“Given the large numbers of datasets available and the often high prices per dataset, the key question is how to “separate the wheat from the chaff”, Van der Sluis said. “My advice is: start with yourself. As an investment team, what do you really want to know? What actually adds something to your investment decision and to your investment goals? I think that is the right starting point.”
Deploying data scouts
To find your way faster in the world of alternative data, you can then use data scouts. “They know the market and what the good providers are. Which datasets are available, for example on a specific ESG topic, and what is the available history, coverage and quality of each dataset. This can make the search for suitable datasets smoother.”
Alternative data can not only provide insights on ESG, but the right information can also contribute to returns. Alternative data that PGGM is currently looking at is, for example, textual information you can extract from annual reports and sentiment analysis. This often involves specific written or spoken text, which cannot be found in traditional data. “For example, we look at whether a CEO paints a positive picture following the annual results.”
“Then the question comes to us: how do we take it from raw data to relevant information, and how do we bring this information to the investment teams? We are now working on that in full swing. Both building up and expanding the underlying data platform and setting up the analytics platform is an activity we will continue to invest in in the coming years.”
“You also have to know how to set yourself up for that flow of information organisationally and process-wise,” van der Sluis argues. “Processes need to be adapted. How do you handle the dataset? How do you decentralise it to the level of investment teams? After all, you don”t want all data flows to have to go past a central point, because that makes the queues too long. We are therefore working out a “data mesh” setup.”
What does it contribute to alpha?
The mother of all questions when selecting data is whether it contributes to alpha. “That is the critical question to ask with every dataset. We think ESG data is very important at the moment, because we want to make an impact with our investments more than anything else. So we don”t just look at the risk-return ratio, but we also ask the question: what is the impact we are having with these investments, and therefore with these datasets,” van der Sluis says. “What are the companies doing that we have exposure to? What is their communication? What is their footprint? These are important aspects for us to get a picture of.”
Whether additional, alternative information adds enough is an essential question to answer. Because that is out of all proportion to the gigantic supply, which van der Sluis believes is as large as $5-10 billion. It has become a market in which, as an investor, you need to know, above all, what exactly you are looking for. But, van der Sluis adds, if you make the right choice then it can also be alpha enhancing.
Hedge funds, for example, paid a lot for credit card data because it generated as much as 4 per cent alpha on an annual basis. But the decay in that was also quite large on an annual basis, because - due to its success - it became a “”crowded trade”“. Alpha is therefore - historically - less and less up for grabs, van der Sluis acknowledges. But that is where machine learning and artificial intelligence can play a positive role, as it is credited with predictive value. Models should determine whether that assumption turns out to be correct.
The fully data-driven organisations that PGGM and APG will become in the coming years places high demands on the organisations themselves, but also on the knowledge and skills they have in-house. It leads to the creation of agile teams and training employees in the necessary “data science” techniques - the new world demands that one has to bring together different expertise and experience.
What about the carbon footprint?
In this respect, a new, not yet fully thought-out challenge also glows for pension investors and the pension funds they have as clients: after all, the unprecedented increase in computing power in recent years, which institutional parties are now capitalising on in the hope of generating more insight and returns for their participants, also comes at a price: more data centres and thus more energy consumption, which has implications for the carbon footprint.
Herman van der Sluis and the spokesperson present at the talk acknowledge that it will potentially increase the carbon footprint, forcing introspection on the part of investors - who also gauge the companies they invest in on that point. “Indeed, fair point, you cannot measure others without doing the same for ourselves.”