Stephen Lam/Getty Images
- Web scraping has become integral to many hedge funds’ data-collection processes, with one out of every 20 webpage visits in 2018 coming from by a scraping bot run by a fund or sell-side research institution, according to a recent research report.
- Hedge funds are expected to pay nearly $2 billion in 2020 on the collection and storage of data that had been scraped from the web, the report notes, but a lawsuit involving LinkedIn could change what funds are legally allowed to collect.
The internet is crawling with bots, and many of them belong to hedge funds. But a lawsuit involving LinkedIn could change what funds are legally allowed to collect.
One out every 20 website visit is from a fund or sell-side research firm that is scraping the page for info, according to a report from Opimas. Hedge funds have built the tremendous amount of data they scrape into their systems, and will spend nearly $2 billion on web-scraping alone in 2020, a sliver of the overall money that is pouring into the exploding alternative-data scene.
But the pedal-to-the-metal approach of scaling up and building out web-scraping units by hedge funds may be for naught as the courts try to create a framework for what is allowed on the web.
The case, which is currently in the appeals process in the 9th district, is being watched by lawyers and funds closely to determine what the future will be for an increasingly important part of hedge funds’ investment process.
With the judge ruling against LinkedIn, funds are still ramping up their web-scraping for now, according to Peter Greene, vice chair of the investment management group at law firm Lowenstein Sandler.
“I don’t think it is scaring folks from using it," Greene said of LinkedIn’s call to stop scraping. The most significant change has come from hedge funds’ compliance departments that are tracking litigation related to web-scraping and are generally more knowledgeable and sophisticated around digital data collection, he said.
A backlash has mounted over perceived invasions of privacy
But as a backlash has mounted over perceived invasions of privacy by tech companies like Facebook and Google, hedge funds need to be prepared to defend and possibly alter its data strategies if there is a sudden pullback on what is legally allowed to be scraped, lawyers say.
"The key is determing what does it mean to be public on the internet," Greene said.
Just a couple years ago, he said, it was only the biggest firms that understood the ins-and-outs of the litigation around the space, but now it’s "all sizes of managers."
And beyond the legal risk, the headline risk that comes with collecting and using large swatches of online data needs to be top-of-mind as well, said Stacey Brandenburg, a lawyer with ZwillGen, in a presentation at data company Quandl’s annual conference last month.
Web-scraping in particular is an area where there are motivated third-parties — the websites — that are unhappy with the practice.
“When you’re developing a web-scraping program, you want to think through and monitor carefully what your web-scrapes look like and how they are being responded to by a site so you could be on notice to the point where it is unequivocally clear that a site has revoked your authorization and doesn’t want you to be there, because the next step is to send a cease-and-desist and potentially to sue you," Brandenburg said.
Fund managers are getting overwhelmed
A pullback on the amount of data that can be scraped could potentially be a good thing for managers that are being overwhelmed by the amount of information coming in, said Fidelity’s head of artificial intelligence and advanced data John Avery at an industry conference earlier this year.
“If anything, I think folks are scraping more than they need,” said Evan Reich, a data strategist at $20 billion BlueMountain Capital Management. He warned against overreliance on web-scraping data that isn’t properly filtered.
If you do accidentally pull in information that has data that a hedge fund can’t legally use — like personally identifiable credit card info — and build models using it, "then it may not be able to be purged," Reich said.
“Ideally you never want to have, 10 years down the road, someone says purge my data, and you’d rather it not be something that’s impossible, or extremely difficult, to purge," Reich said.
“No dataset is so good that it is worth betting the firm on.”
- The explosive growth of quant investing is paving the way for ‘super managers’ in the hedge-fund industry
- A bunch of hedge fund managers featured in ‘The Big Short’ are among the casualties of Citadel’s most recent cuts
- $18 billion hedge fund CQS is pushing into the US as its new CEO looks to a new strategy