Is unauthorized online copying theft and does it hurt creators?

October 15th, 2013 by Barry Sookman Leave a reply »

Slavish copying of a work protected by copyright without consent is sometimes called theft. There is a long history of this association in the Commonwealth and the United States. In fact, in a leading case, the Privy Council stated that the moral basis of copyright rests on the 8th Commandment “Thou shalt not steal”. Despite the long lineage between unlawful appropriation of copyright material and the concepts of “theft”, “larceny” and “steal”, there are still debates as to whether the term is accurate or appropriate to use in this context. There are also still debates as to whether online piracy hurts creators and the creative industries. The recent US case Tamburo v. Dworkin 04 C 3317 (N.D. Ill. Sept. 26, 2013)  and several recent reports including a brief by the London School of Economics sheds light on both of these debates.

Tamburo v. Dworkin was an unusual case involving unauthorized copying of an online dog breeding database.. The compiler of the database published statements accusing the copier of “theft”. The copier sued the compiler for defamation, but lost the defamation claim when the court ruled calling unauthorized copying “theft” was true and not defamatory. The court also held that a person who posts a work on the web does not implicitly consent to copying and reuse of the posted work even if the person does not use the Robots.txt protocol to prevent it.

The gist of the case involved the defendant Kristen Henry, a dog breeder and computer programmer. She spent almost five years creating an extensive database of dog pedigrees which she made freely available for use by fellow breeders through her web site, The site contained a database of pedigree data for a small, mischievous Belgian dog breed called the Schipperke. The website was developed for non-commercial use by the dog breeding community. Henry spent almost five years collecting the dog pedigree data contained in her database from different sources. She had collected data in her database on over 23,900 dogs.

The plaintiffs created software products for use by animal breeders and pet groomers. They also operated The Breeder’s StandardTMNET (“TBS”), a web-based dog pedigree software program. The program operated as a database designed for research and genetic calculations. They also developed a scraper (a data mining robot or crawler) computer program that used a web browser to scan the internet for information about dog breeds and to copy data from dog pedigree websites. The scraper program accessed and copied the information on the website including the databases compiled by Henry. The scraped data was incorporated into TBS and money was charged for the information.

Henry learned of the plaintiffs’ conduct. She deposed that she felt frustrated and violated by the plaintiffs’ use of her database. Henry told the plaintiff Tamburo that he had no right to collect data from her website in an automated manner and requested that he stop using that data. When the copying and use didn’t stop, in frustration, Henry made statements on message boards, through emails, and on her website about TBS stealing her data including the following statements:

• My Schipperke Database has been stolen, and The Breeders Standard is the thief. • Breeders Standard has stolen my Schipperke database . . . .

• [T]he 23900 dogs that I keyed in all by myself to chronicle the history of the Schipperke breed have been stolen. . .

• Why should I do all this work so MBFS can steal it and sell their software with this stolen perk? • MBFS has written an agent robot to go to these individual sites and steal certain files . . . .

• MBFS has targeted and stolen my personal data files from non public areas of my website domain.

• MBFS (The Breeder’s Standard) has stolen the Pedigree Databases of many breeds including this one using a Data Mining Robot. . . . MBFS stole it in one day.

• [T]here are now a few irons in the fire regarding other people he has stolen from too . . . .

The plaintiffs only apparent remorse was to sue Henry for defamation based on the theory that the unauthorized copying and uses of the database were technically not “theft”. The court gave short shrift to this averment finding the allegation substantially true and would be understood to be true by the lay person.

Nor are Henry’s statements that Tamburo’s actions constituted “theft” actionable as defamation, because the statements were substantially true. “Truth is an absolute defense to a defamation action.” Lerman v. Turner, No. 10 C 2169, 2013 WL 4495245, at *18 (N.D. Ill. Aug. 21, 2013) (citing Hnilica v. Rizza Chevrolet, Inc., 893 N.E.2d 928, 931 (Ill. App. Ct. 2008))…

Here, there is no dispute that Henry’s description of the events in question was accurate: the plaintiffs harvested data from Henry’s web site that she had spent years collecting, they repackaged it for their own use and profit, and they did not have Henry’s permission to use the data harvested from the site. The only question is whether Henry’s characterization of these events as “theft” is false.

Tamburo argues that Henry’s statements that he stole from her are false because he did not commit theft. He did not delete or remove data from Henry’s site (thus depriving her of her property), Henry had made her data freely available, and no robots.txt file was visible on her site at the time the Data Mining Robot copied information on the site. According to Tamburo, because the data was not protected, either legally or by security protections on Henry’s web site, he could not have committed theft by appropriating it.

Even so, the court concludes that no reasonable jury could find that Henry’s statements were not substantially true. Henry accurately reported in her messages that Tamburo copied data from her site without her permission, and it is undisputed that he did so. Henry states under oath that she believed (and still believes) that this constituted theft, and that she felt injured by Tamburo’s actions. It may be true that Tamburo could not be prosecuted or held liable for his actions because the data was publicly available and not protected by adequate security measures. But Tamburo’s argument relies on a narrow legal meaning of “theft.” Under Illinois law, the court must consider whether Henry’s use of the word “theft” is reasonably susceptible to a non-defamatory construction. Solaia Tech., LLC v. Specialty Pub. Co., 852 N.E.2d 825, 839 (Ill. 2006). It is. To a lay person such as Henry, “theft” can also mean the wrongful act of taking the property of another person without permission. The data Henry had collected could be reasonably understood as her property—she had collected it, and it was her work in compiling it that gave it value. She did not give Tamburo permission to copy it and sell access to it. Although Henry might not be able to successfully sue Tamburo for using her data in this way, the gist of her statements was true: he took the data without her permission.

Tamburo also argued that because Henry did not use the Robot Exclusion Standard, (robots.txt protocol) to limit the access of web crawlers to her website, she had effectively invited him to use the data, and thus knew his actions were not “theft”.[i] The Court, relying on the recent decision Associated Press v. Meltwater U.S. Holdings, Inc., 2013 WL 1153979 (S.D.N.Y. Mar. 21, 2013) finding Meltwater’s news clipping service liable for copyright infringement, rejected the theory that a person who posts material unto a website implicitly consents to its electronic scraping and online publication stating:

The court has found no precedent indicating that failure to use the Robot Exclusion Standard means that a website creator has given up her intellectual property rights or invited the general public to copy the material on a site. To the contrary, in the copyright context, one district court recently rejected the idea that a website’s failure to use the robots.txt protocol to block access created an implied license to copy copyrighted material. Associated Press, 2013 WL 1153979, at *24. That court stated that “there is no fair inference, based simply on the absence of the robots.txt protocol, that there has been a meeting of the minds between the copyright owner and the owner of the web crawler about the extent of copying.” Id. It follows that the failure to use the robots.txt protocol is not evidence that Henry knew that Tamburo was allowed to copy her data and knew that her statements that he had stolen her data were false.

The Tamburo v. Dworkin case did not canvass the role or effectiveness of intellectual property rights in addressing online infringement or whether online appropriation of information including copyright materials hurts creators or the creative industries. There was no need to as the case, for the most part, was about whether calling unauthorized copying and dissemination of information “theft” was true for defamation purposes.[ii]  The issues are  important, however. They are also very topical ones. For example in the last two months:

  • Millward Brown Digital published a report Understanding the Role of Search in Online Piracy. It found that overall, search engines influenced 20% of the sessions in which consumers accessed infringing TV or film content online between 2010 and 2012. For the infringing film and TV content URLs measured, the largest share of search queries that lead to these URLs (82%) came from the largest search engine, Google.
  • According to data compiled from Google`s transparency reports by TorrentFreak, Google`s indexes are a major source of infringing URLs. In the last week of September alone Google processed requests to take down a record-breaking 5.3 million pirate links. That means Google is now removing nine allegedly-infringing URLs from its indexes every single second of every day.
  • The UK Culture, Media and Sport Committee issued a report calling for “a strong regime for the protection of intellectual property including copyright.” It found that “The greatest threat to recognition and just reward for creativity is illegal copying, particularly online piracy.” The Committee was particularly critical of Google stating “We strongly condemn the failure of Google, notable among technology companies, to provide an adequate response to creative industry requests to prevent its search engine directing consumers to copyright-infringing websites. We are unimpressed by their evident reluctance to block infringing websites on the flimsy grounds that some operate under the cover of hosting some legal content. The continuing promotion by search engines of illegal content on the internet is unacceptable. So far, their attempts to remedy this have been derisorily ineffective.”
  • NetNames (formerly known as Envisional) published a report Sizing the piracy universe. It found that “the practise of infringement is tenacious and persistent. Despite some discrete instances of success in limiting infringement, the piracy universe not only persists in attracting more users year on year but hungrily consumes increasing amounts of bandwidth.” It also found that 13.9 billion page views were recorded on web sites focused on piracy in January 2013. This figure increased by 9.8% in the fifteen months from November 2011.

In September, the London School of Economics also published a brief Copyright & Creation: A Case for Promoting Inclusive Online Sharing. Among other things, it claimed that the creative industries are not suffering a loss of revenues due to copyright infringement. Based in part on this assertion, it  recommended that the anti-piracy graduated response measures to be implemented pursuant to the UK`s Digital Economy Act be dropped. Many flaws in the LSE report have been pointed out by others.[iii] Among the key deficiencies in the brief are the following:

  • The LSE brief assumes that if revenues of a creative industry have remained flat (a finding in the brief that is hotly disputed by critics) that the industry has not been detrimentally affected by online piracy.. This conclusion by the LSE fails to address the counterfactual questions of what sales would have been in the absence of the unauthorized file sharing and if more effective anti-piracy measures were in place.[iv] The LSE assumption is tantamount to a claim that the banking industry is not affected by cyber-crime if  the overall banking industry revenues do not decline year over year.
  • The LSE brief assumes that a status quo in revenues is an acceptable public policy choice for the creative industries that can justify not implementing laws to curb online piracy. This assumption runs counter to the goals of most countries including the United States, UK, and Canada and their creators and creative industries who want to expand these industries and the jobs, economic growth, innovation, consumer benefits, cultural expression, and taxes that come with this expansion.
  • The brief devalues intellectual property rights. It suggests that the music industry does not need more effective protection against online infringements of copyright because concerts (performance) revenues have increased. That is like telling companies in the software business like Micorosft they don’t need protection against online file sharing because they can make up lost revenues through services or telling companies like Coke or Pepsi they don’t need effective trade secret laws to protect their cola formulations because they can make up lost revenues by more promotion of their brands.
  • The LSE brief fails to address the significant body of research linking online piracy to a decrease in sales and the effectiveness of anti-piracy legislation in helping to foster legitimate markets for creative products and services.[v]
  • The LSE brief does not address how the magnitude of online file sharing and widespread dissemination  and availability of infringing content, as highlighted in the Millward Brown Digital and Netnames reports and elsewhere, could not have significant impacts on sales.
  • The LSE brief views online piracy and its effects only at the macro level. It entirely ignores all of the individual impacts including impacts on individuals, families, and small, medium and large organizations in different segments of the creative industries. Its policy completely neglects to consider the distributive justice implications of online piracy.

The LSE brief also completely overlooks the human impacts of the unauthorized appropriation on artists, writers, authors, and other creators. Kristen Henry felt frustrated and violated when the database she spent five years compiling was slavishly copied and appropriated by the plaintiff’s using an online crawling tool and sold to others for a fee. There are hundreds of thousands if not millions of Kristen Henrys being affected by online piracy. The LSE policy brief overlooks the repercussions of online piracy on all of them.

[i]  The robots.txt protocol, or Robot Exclusion Standard, is a convention to instruct cooperating web crawlers not to access all or part of a website that is publicly viewable. If a website owner uses the robots.txt file to give instructions about its site to web crawlers, and a crawler honors the instruction, then the crawler should not visit any pages on the website. See, Field v. Google, Inc., 77 U.S.P.Q.2d 1738 (D. Nev. 2006), Parker v Yahoo!, Inc. Civ. Action No. 07-2757 (E.D. Penn. Sept. 25, 2008), eBay, Inc. V Bidders’s Edge, Inc. 100 F.Supp. 2d 1058 (N.D.Cal.2000).
[ii] The court did  make any finding as to whether Henry’s database was protected by copyright. However, if Henry exercised sufficient creative efforts in selecting or arranging the data, she could have had a copyright in an original compilation of the data.
[iv] Counterfactual reasoning is a method for evaluating claims of causation by exploring what might have happened had the causal event not occurred. Such reasoning is a common test of the validity of claims in the social sciences and in historical studies. Oxford University press
