Numerai is a hedge fund with big ideas and even bigger ambitions: using crowd-sourced machine intelligence to transform the way money is managed. Crowd-sourced investment strategies are many and varied, but Numerai has a novel way of applying technology.
The fund supplies its network of data scientists with encrypted data that allow them to test their machine learning models, thereby removing any bias attached to the application of algorithms.
These models are entered into a monthly tournament and the best ones receive a pay-out. This was previously done using Bitcoin (because it was efficient and more anonymous than PayPal), but more recently Numerai launched its own token, Numeraire (NMR), on Ethereum, the public blockchain which has spawned a multitude of trustless, decentralized applications.
The aim of the token was to create more value for Numerai’s growing network of scientists, and further align them with the collaborative goals of the project. NMR tokens were not sold like a typical initial coin offering, but rather 1.2 million of the tokens (a cap of 21m has been stated) were distributed via smart contracts on Ethereum, only to participating data scientists.
On the eve of the token distribution, Numerai founder and CEO Richard Craib took a sober view of the token issuance—certainly too sober for the hysterical world of Ethereum-backed tokens. He said at the time he hoped NMR would be “priced rationally”, and also highlighted the need to ensure the smart contract code on Ethereum was robust, adding that someone who helped discover the DAO exploit had been auditing the NMR contract.
“Even if the contract is correct, you could also mess up the incentives and then suddenly Numeraire becomes a speculative instrument that actually hurts the system,” said Craib. “So you have to be very careful with your system design. I think it’s quite good for the users to have liquidity and for it to be priced and tradable, but I hope it’s priced rationally.”
A couple of days after the NMR were distributed the price of the tokens rocketed to $168 and the market cap crossed $200m. Around 60,000 NMR are paid out in each tournament, which meant the monthly payout reached almost $10m—or the equivalent of 10 Netflix Prizes, making Numerai by far the highest paid data science competition the world has ever seen.
All this value pumped into the tokens at once presented a number of immediate risks to the ecosystem, offering a huge bounty to hackers and threatening its carefully aligned goals; a worry aired on discussion forums was that speculation around the coins could ultimately detract from their primary function—to get create good models and garner network effect.
Numerai rightly intervened and reduced its NMR payouts by 90% to 1510 per week, while increasing staking payouts threefold to $3,000 per week. NMR tokens have now cooled and stabilized at around $40.
Craib says he was not expecting Numeraire to be traded on secondary markets to the extent that they were. But since the tokens belong to the data scientists and are on the blockchain they can do anything with them, he said. “Speculation doesn’t hurt the use of Numeraire, which is to stake. Our data scientists earning more money is also clearly a good thing. Some have quit their jobs or are working with speculators to stake their models.
“More prizes mean more incentives to create duplicate accounts etc. We have solved a lot of this in the last few days.”
With benefit of hindsight, would he have done anything differently regarding the issuance and distribution of Numeraire?
“Not at all,” said Craib. “It was extraordinarily successful beyond anything we could have hoped. I am very happy with our approach of giving it away to our core community rather than selling it in an ICO.”
Aside from the NMR token, Craib elaborated on some of the other unique points the distinguish Numerai, such as the problems it solves with encryption.
“All the algorithms are trained on encrypted data which makes it very difficult to cheat or create models that look good but aren’t actually good. A lot of the quant crowdsourcing has this problem of models that don’t quite work very well because they just overfit. Because they can see the data, they can cheat; they can put in their biases.
“But on Numerai it’s so abstract that you really have to have a good model to do well. And that’s why we are not crowdsourcing quants at all; we are crowdsourcing machine intelligence.”
A common approach to verify accuracy in machine learning is to break the dataset into train and test sets. A trained model can be tested for accuracy on the test set, which it has never seen. However, to maintain statistical validity, this test set should only be used once. When a data scientist accesses the test set multiple times and uses that score as feedback for model selection, there’s a risk of training a model that overfits the test set. This hurts the model’s ability to perform well on new data.
This problem is removed when you don’t know what you’re modeling. Ordinarily encrypted data becomes useless to a data scientist, but new developments such as neural cryptography, dedicated to the application of artificial neural network algorithms allows Numerai to share datasets securely while preserving their structure. Because the raw data remains obfuscated, it’s impossible to take that data and use it yourself.
Asked about the characteristics of Numerai’s 20,000 or so participants, Craib said: “You really have to know about machine learning. If you know something about finance, it doesn’t help you at all, because it’s all abstract…but we do have certain people from hedge funds who joined, and we can see by their email addresses.”
“Every now and again you see something like @stanford.edu. “It’s very global—huge in Russia, also in India, and lots of young people. But very few girls actually; only about 3% I think,” he added.