September 3, 2014

We’re moving this blog to FTC.gov

by techatftc

Hello,

Please note we’re moving this blog archive from WordPress to http://www.FTC.gov/tech. We will move all posts and comments, and delete the original blog in the coming weeks. Please look for the posts on our agency website.

Thank you,

Cheryl Warner

FTC Office of Public Affairs

August 23, 2013

So Long, and Thanks for All the Fish! (Phish?)

by Steve Bellovin

I’ve had a great time at the FTC, but today is my last day; I’m returning to academe–what Tom Lehrer aptly described as “ivy-covered professors in ivy-covered halls”. In due course, this blog and the @TechFTC Twitter account will be taken over by my successor; stay tuned.

–Steve Bellovin

March 29, 2013

SSL, RC4, and Site Administrators

by Steve Bellovin

There’s been yet another report of security problems with SSL.  If you run a website or mail server, you may be wondering what to do about it.  For now, the answer is simple: nothing—and don’t worry about it.

First of all, at the moment there’s nothing to do.  You can’t invent your own cryptographic protocol; no one else would have a compatible browser.  Besides, they’re notoriously hard to get right.  In the very first paper on the topic, Roger Needham and Michael Schroeder wrote “Finally, protocols such as those developed here are prone to extremely subtle errors that are unlikely to be detected in normal operation. The need for techniques to verify the correctness of such protocols is great, and we encourage those interested in such problems to consider this area.”  Why do you think your design will be better than one that has been scrutinized for more than 15 years?

Some of the trouble in this latest breach is due to weaknesses in the RC4 cipher algorithm.  No one who works in cryptography was surprised by this report; it’s been showing cracks since at least 1997.  What’s new is that someone has managed to turn the weaknesses into a real exploit, albeit one that needs at least 224 and preferably 230 encryptions of the same plaintext to work.  (By the way, this is why cryptographers are so concerned about minor weaknesses: as Bruce Schneier is fond of noting, attacks always get better, they never get worse.)  Besides, ciphers are even harder to get right than protocols are.

The real reason not to worry, though, is that unless you’re being targeted by a major intelligence agency, this sort of cryptanalytic attack is very far down on the risk scale.  Virtually all attackers will look for unpatched holes, injection or cross-site scripting attacks, people who will fall for spear-phishing attacks, etc., long before they’ll try something like this.  The common attacks are a lot easier to launch; besides, the attackers understand them and know how to use them.

In the long run, RC4 has to be phased out.  I certainly wouldn’t start any new designs that depended on RC4’s characteristics or performance, but there are plenty of other algorithm possibilities today.  Vendors do need to ship web browsers and servers that support newer versions of SSL (formally known as TLS); weaknesses at the protocol level can’t always be fixed by patching code.  For now, though, stay up to date with your patches and software, and practice good security hygiene.  (And if you are being targeted by a major intelligence agency, you should talk to a major counterintelligence agency, not me…)

March 21, 2013

Storing Passwords, or The Risk of a No-Salt Diet

by Steve Bellovin

A while back, I wrote about passwords and promised a later post on salting.  This is it: a deeper look at how servers should accept and store passwords.  This is a complement to the usual articles on passwords, which focus on the user (you know the ones: “pick strong passwords”); here, I’ll be looking at the server side, and in particular how to store passwords for web sites.

A prefatory note: I’ll be referring a lot to Robert Morris and Ken Thompson’s classic paper “Password Security: A Case History”.  If you’re at all involved in password handling, you should read, understand, and cherish it; virtually all of what we can do to protect passwords, even today, is based on what they wrote almost 35 years ago.

The first rule of handling passwords is that you should never store them in the clear.  There is no good excuse for it whatsoever.  Why not?  It’s a prime rule of security: something that doesn’t exist can’t be stolen.  Conversely, if something does exist, it can be stolen or leaked in many, many ways.  (Read the Morris and Thompson paper for some examples.)  Why do some sites store passwords in the clear nevertheless?  They think it’s user-friendly to send back the original password when people click the “I forgot” link.  Resetting the password to something strong and random is a much better idea.  (Password reset carries its own set of risks, but that’s a topic for another day.)

The alternative to “in the clear” is not “encrypted”.  If a password is stored encrypted—and I’m using that word in its technical sense—that means that there’s a key that can decrypt it, which in turn brings us back to my previous observation: if something exists, it can be stolen, keys included.  Instead, what we do is “hash” the passwords using a non-invertible function.  When the user supplies a password at login time, it, too, is hashed; that value is compared with the stored one.  Today, the best choice is a standardized cryptographic hash function such as SHA 512.  (Why SHA 512 and not MD5 or SHA-3?  They’re all fine here, as we shall see below.)  Cryptographic hash functions have two very important properties for protecting passwords: they can’t be inverted (in the crypto world, we call that “preimage resistance”); and they take arbitrarily long input strings, i.e., passwords.  This is an advance over Morris and Thompson’s solution; at the time they wrote their paper, there were no such functions, so they were forced to use an encryption algorithm in an unnatural way, which in turn limited password length to 8 characters.  Given the functions and line speeds we have available today, there are no good reasons to limit passwords to anything less than 1024 characters or thereabouts.  Restricting the character set is almost as bad.  Why shouldn’t I be able to put an א or a θ in my password if I want to?

The second and third things you should do are to “salt” the password before hashing, and to iterate the hash.  Why?  To answer that we have to understand how hashed passwords are cracked.  First, assume that the attacker has somehow obtained—stolen—a copy of your hashed password file.  That shouldn’t be possible, of course, but “shouldn’t” is a dirty word in the security business.  Next, the attacker makes lots and lots of guesses about possible passwords, hashes them, and sees if one of the hashed guesses matches the hashed value.  This is why strong passwords help: they’re harder to guess algorithmically.  Of course, many people still pick really bad passwords.  (And how do we know that?  Some sites stored their passwords in the clear; some have been stolen and the files posted online.  See above for why that’s a bad idea—and was known to be a bad idea in 1979…)

If the attacker is trying lots of guesses, one very useful defensive technique is slow down the process by making the hash function expensive.  Suppose, for example, that you want to use SHA512.  On my laptop, handling password-size strings, it runs at about 750,000 SHA512 operations per second.  If the function is applied 75,000 times to the password, it will take 1/10 seconds to validate a login—or a guess—compared with 1/750,000 seconds without the iteration.  No matter how much computing power the bad guys have (and they have more than you do, since they’re stealing it via botnets and using GPUs (graphics processing units) to do the guessing), it cuts their guess rate by a factor of 75,000.  That means they can try many fewer guesses or attack many fewer people’s passwords in any given amount of time.  You can’t make the iteration count too high or it will take you too long to validate legitimate login attempts, but you can certainly afford a slower response time than 1/750,000 of a second.  This also explains why the speed of the underlying hash function isn’t very important: you just adjust the iteration count to match.  Again using my laptop as an example, MD5 runs at roughly twice the speed of SHA512.  That means I’d use 150,000 iterations instead.  (Btw—you may have heard that MD5 has been cracked.  It has been, but not in a way that matters here.  MD5 has poor “collision resistance”; for password hashing, we only need preimage resistance.  For other aspects of cryptography, though, both matter.   For password storage, any hash function for which one cannot compute preimages is appropriate; you just have to set the iteration count properly.)

An implementation hint: store the iteration count with the hashed password. That way, as computers speed up and you decide to increase it, your old hashed passwords will remain usable.

Password salting is a more subtle concept and in certain circumstances is not usable. It defends against two different attacks, both of which are quite important.  (Morris and Thompson also intended it to defend against custom encryption chips—read the paper—but that’s not seen as a major threat today, at least as compared with botnets and GPUs.)

What is “salt”, when it comes to passwords?  The salt is a random number selected when you set the password.  This number is stored, unencrypted and unhashed, with the hashed password.  It becomes another input to the hash function.  This is best explained by a formula.  If P is the user’s password, H is some hash function (e.g., SHA 512 repeated 75,000 times), and S is the salt, the system will calculate H(S,P) and store S,H(S,P) in its password database.  Since S is stored in the clear, the system can calculate H(S,P) any time someone tries to log in.  The question, though, is why we’d do this.

Hashing all of these guessed passwords is expensive, especially for iterated hashes.  Attackers would rather save the time that it takes to crack each new site.  Accordingly, they’ve built up tables—giant files—of password/hash pairs.  Cracking a password, then, is simply a matter of searching this file, and any programmer knows how that can be done very efficiently.  The obvious objection is that the file is too large—and it would be, if were stored naively.  Is it?  Let’s crunch some numbers.

Suppose that people choose 8-character passwords made up of lower-case letters, with just a few tweaks added.  There are about 2∙1011 possibilities.  Done in a straightforward fashion, we’d need 8 bytes for each password and 16 for an MD5 hash; that comes to about 5 terabytes.  That’s not prohibitively large, but larger than any of (today’s) commonly available disks.  Suppose, though, that users optionally add a single digit to the end of their 8-letter password.  55 terabytes is a fair amount of storage, even today.  Move to 9-letter passwords with that extra optional digit and it’s game over—or is it?

Using a space-time tradeoff, in particular, a technique known as rainbow tables, can drastically reduce the amount of storage necessary.  You spend a bit more time cracking each password, but you make it back on space.  How much is the reduction?  Assume an 8-character password chosen from the entire keyboard: all letters, upper and lower case, all digits, and all special characters.  Using the same straightforward storage algorithm, of 24 bytes per password, you’d need about 160 petabytes of disk space.  That’s a lot…  However, rainbow table for the Windows Vista hash takes only 2 terabytes, enough to fit on a single drive.

Now let’s add some salt.  Because the hash of a salted password is different for every possible salt value, you need a lot more storage, even with rainbow tables.  Using Morris and Thompson’s 12-bit salt—4096 (212) possible values—that Windows Vista rainbow table would need 8192 terabytes (8 petabytes).  And if you used a modern salt, with 232 or even 2128 possible values—well, it would take the proverbial pot of gold from the end of the rainbow table to afford that much disk space.

There’s another advantage to salting, especially on large password files.    Suppose that two users have chosen the same password.  Shocking, I know, but that study of password choices I mentioned shows just how often it happens.  Without salt, identical passwords have identical hashes; with salt, both the salt and the password would have to be the same.  With even a 32-bit salt, that won’t happen to any noticeable extent; the attacker would have to go after each of them individually.

The first two defenses I mentioned, hashing and iterating the hash, are always applicable.  If you’re not doing these, you’d better have a really good reason—and as I noted, being able to send out a forgotten password doesn’t qualify.  Salting, however, isn’t always possible.  If the password is used to produce a cryptographic key for the user, and if there’s no way to get the salt securely from the server to the user’s software before the encrypted connection is established, you can’t use salting.  This is rarely the case on the wide-area Internet, especially for websites (websites almost always use SSL for encryption, and authenticate later), but there are certainly situations where it occurs, especially within organizations.  There may be a solution in the future (you can find my attempt here), but for now we’re stuck without salt in these situations.  If you’re running a website with SSL and password authentication, though, you should always salt.

Is there anything else you should do to protect passwords?  Yes—lock down the server that holds your users’ passwords.  I mean really lock it down; make every open service beg for its life.  Don’t just lock down the Internet-facing side while leaving it open to your Intranet, completely lock it down.  If that data is stolen without you knowing, you—or, more accurately, your users—are in for a world of hurt.  This post is long enough already, though; I’ll save that for another day.

February 22, 2013

Shipping Security

by Steve Bellovin

It’s a ritual we’ve all grown accustomed to: something needs a software update to repair security flaws.  Traditionally, it’s been our computer; increasingly, it’s our smartphones or their apps.  In the not very distant future (possibly now, for some of us), it will be our printers, our thermostats, our cars, our “anything that uses software”—and that will be more or less everything.  WiFi-controlled light bulbs are already on sale in some countries; if it’s WiFi-controlled, it may be Internet-accessible and some day in need of security patches.  Such patches don’t create themselves; it’s worth stepping back and looking at the process, which in turn helps explain why some companies are so much better at it than others.

The first step is remarkably hard: understanding that you have a problem.  More precisely, it’s understanding that you’re in the networked software business, with all that implies, rather than in the phone, thermostat, printer, light bulb, or what have you business.

Rule #1: If something has software that can talk to the outside world, it can (and probably does) have security problems; this in turn means that remediation mechanisms are necessary.

The phrase “remediation mechanisms” covers a lot of ground.  It means that you need, among other things, a process for reporting security problems (and this may be from your own developers, partner companies, academics, security researchers, and more); equipment and people to reproduce and analyze the problem; coders and testers to produce a fix; and a rapid and effective system for pushing it out to end users and helping them install it (saying “plug your Wi-Toaster into a Model 3.14159 Development Unit and Crumb Cleaner” won’t cut it); and more.  Most important, you need management energy behind all of this, to make sure it all works effectively.

 

Rule #2: Unless the security process is someone’s responsibility, it won’t work well (or possibly at all).

Mainstream software companies understand this; they’ve been through the wars.  One can argue if Microsoft’s “Patch Tuesday” or Apple’s “Good morning—here’s a security patch you should install immediately, even though it’s 3am on a Sunday” is the better model; nevertheless, both companies understand that security flaws are not ultra-rare events that can be dealt with in the next model of their products.

Companies new to the software world don’t always get this.  With the possible exception of cars that may receive regular oil changes, most consumer products are “fire and forget”.  Only expensive products, such as major appliances, are routinely repaired (or even repairable); improvements wait for the next model.  That’s fine for routine matters; it may even be acceptable for dealing with occasional “inexplicable” outages.  It’s a non-starter for most security holes, since those can become critical, recurring problems any time some attacker wants them to.

 

Rule #3: Bugs happen, ergo fixes have to happen.

In a technical sense, pushing the patch out to affected devices is often the hardest.  It’s relatively straightforward technically for today’s computers and smartphones; virtually all of them have frequent or constant connectivity.  It’s less clear what to do about devices with, say, local-only connectivity.  (Many Bluetooth devices fall into this category.)  They can be attacked from nearby, perhaps via an infected laptop, but can’t always be patched that way.

In some markets, notably phones, no one party controls the patch deployment channel.  With Android phones, for example, software—and hence fixes—can come from any of three parties: Google, the device manufacturer, or the wireless carrier.  This, coupled with the comparatively short lifespan of many phones, has led to delays in patching and even out-of-date software being shipped with new devices.  It’s easy to understand why this has happened; that said, it leaves most users without effective recourse.  As we move towards complex service models—one company as the front end for another’s cloud-based system, running software from several different vendors?—we’ll see more and more of this.  Who is responsible for security patches?  Who should be?

 

Rule #4: Own the patch channel.

That last point deserves a closer look.  In a multivendor world, who should own the patch channel?  There are two possible answers: the party with the ability to distribute patches, or the party the consumer will blame if something goes wrong.  They’re not independent, of course; occasionally, they’re contradictory.  In today’s world, phones are updatable only by the carrier (Apple iPhones are a notable exception), but will people blame the carrier or the manufacturer if there’s a security problem?  Put another way—and arguably a more important way from a business perspective—if something goes badly wrong and consumers are angry enough to switch, will they switch carriers or brands of phone?  The answer will vary across markets, and depend on things up to and including who has the better brand awareness; the answers, though, might help structure the contracts among the various parties.

 

The context here, of course, is the settlement just announced with HTC.  That situation is in some ways a special case, in that the vulnerabilities were introduced by HTC.  The problem, though, is broader and not limited to Android.  Apple, for example, controls its own patch distribution for iOS; that’s good, but their approval system can slow down shipments of updated phone apps.  In other words, their app vendors do not control their own patch channel.  (I should note that patching isn’t the only security issue with smartphones.  The FTC will be holding a Mobile Threats workshop on June 4 to discuss many other concerns as well.)

Embedded devices—the computers built into our printers, modems, thermostats, and more—are problematic in a different way.  Vendors can prepare patches, but they often have no good way to notify users about the patch.  Similarly, the device itself has no good way to inform its owners that it wants to be updated.  (Quick: what should an online light bulb do?  Blink SOS in Morse code?)  Autoupdates are one answer, but if the vendor gets the patch wrong they’ve bricked the device, with all that implies.

Consumers have to worry about such things, too.  If you’re buying something, how will you be notified of security patches?  How will you install them?  For that matter, for how long will your vendor keep producing patches?  Any time you’re running software that’s been “EOLed”—end of “lifetime”—by the vendor, you’re taking a risk; there are almost certainly residual holes, but there won’t be new patches.  You need to plan for this and upgrade your computers (and phones, and perhaps embedded devices) before that happens.  (If you’re still using Windows XP, note that Microsoft says it will discontinue support on April 8, 2014.)

 

Patching isn’t easy, but even in a world of 0-days, it’s still important.  Vendors and consumers need to take it very seriously and understand how it will happen.

Tags: ,
February 5, 2013

Defending Against High-End Threats

by Steve Bellovin

Most attacks are pretty mundane.  Some aren’t, though, and we can learn a lot from them.  Let’s consider the recent case of the New York Times being hacked, allegedly by China.

Several things stand out from the article.  For one thing, it was a targeted attack.  Most hacks, even very serious ones, are opportunistic, in the sense that the attacker doesn’t really care which system is penetrated.  Someone who, for example, wants to use an open WiFi access point to penetrate a big box store’s  network doesn’t really care much at which store the attack succeeds.  If the first store has a protected net, it’s on to the next parking lot to try the next big box in the strip mall.  Someone launching a targeted attack, though, has a very different goal.  To misuse the old joke, if you’re being targeted, it’s no longer enough to outrun your friend; you have to outrun the bear, too.  In this case, the attackers were after not just the Times but information on one particular story.

Note the duality: an opportunistic attacker tends to be technology-focused: he or she will have a set of break-in tools.  If they don’t work against some site, that site is probably safe (until, of course, a better-equipped bear comes along).  In this case, though, the focus was on the victim, with the attackers trying or building whatever tools were necessary.  These people knew what they wanted.  Once they were into the Times’ network they found and cracked the domain controller that had the master password file, then cracked some employees’ passwords.  They wanted these not to use on other sites, but to gain access to particular Times computers, and in particular to certain reporters’ email and files.  As I’ve noted before, strong passwords are an overrated defense in general, but this is one of the exceptions that proves the rule: when you’re being targeted, password strength can matter very much.

Another interesting point is the failure of antivirus software:

Over the course of three months, attackers installed 45 pieces of custom malware. The Times … found only one instance in which [it] identified an attacker’s software as malicious and quarantined it

This should not be a surprise.  Most antivirus packages work by matching files against a database of known malware.  A custom tool, or one that has not yet been analyzed by your antivirus company, by definition won’t be in this database, and hence won’t be detected.  Indeed, the Times itself has recently reported on the growing failure of this technology:

A new study by Imperva, a data security firm in Redwood City, Calif., and students from the Technion-Israel Institute of Technology is the latest confirmation of this. Amichai Shulman, Imperva’s chief technology officer, and a group of researchers collected and analyzed 82 new computer viruses and put them up against more than 40 antivirus products, made by top companies like Microsoft, Symantec, McAfee and Kaspersky Lab. They found that the initial detection rate was less than 5 percent.

The companies themselves realize this, and are moving on to newer techniques, though these themselves are imperfect.  A spokesman for one company said, “In over two-thirds of cases, malware is detected by one of these other technologies.”  Two-thirds is much better than 5 percent, but it’s still not great, especially against a serious adversary.

Is traditional antivirus software useful?  It is and it isn’t.  It does reasonably well against older malware.  It does little, if anything, to protect you against “advanced persistent threats” or even the more clever cybercriminals.  Should you run it?  Let’s put it like this: just because you’re actually being hunted by invisible flying assassins from the Andromeda Nebula doesn’t mean you can ignore traffic when crossing the street; you can still be hit by an ordinary car.  Traffic signals have their uses, even if invisible flying assassins don’t pay any attention to them.

The last point worth mentioning is how the Times cleaned up the mess.  You often hear “don’t clean up the machine; reformat the disk and reinstall.”  That’s good advice; it’s fiendishly difficult to exorcise all of the nasties any bad guy can leave behind, let alone a high-end attacker.  Maybe one day your antivirus company will be able to detect and quarantine those files, maybe not; in any event, you can’t wait.  Disinfecting a network isn’t as simple as waving a wand and chanting “Expulso”, either, and most companies can’t afford to reinstall the OS on all of their computers.  More importantly, they can’t afford to lose the data, which is worth far more than the hardware it sits on.  The Times approach was to wait and watch: watch to see which machines appeared to be infected, and how the malware behaved.  This let them learn which machines were infected, and how the infections behaved; that in turn permitted replacement of just the affected computers, and the creation of new, tailored defenses.  This isn’t a foolproof process—the article itself mentions a previous incident where a thermostat and a printer had been compromised—but it’s better than either guessing or throwing away everything.

There are a number of lessons from this story.  The most important is that you need to understand if you’re at risk of serious, targeted attacks.  The APT—Advanced Persistent Threat—concept is overhyped, but regardless of the attacker’s absolute capabilities the fact of targeting makes a big difference in your defensive posture.  Not only do you need stronger defenses, you need different types.  Phishing attacks happen to everyone, but they’re generally trying to extract your password for the Bank of Ruritania or some such.  Spear-phishing is generally aimed at planting malware, and may carry a payload undetectable by most antivirus programs.  Password strength can matter more, too, against an attacker who steals your site’s hashed passwords.

There’s another, more subtle point: should you centralize or decentralize your resources?  Against random attacks, a single, strong server complex makes sense.  If you’re being targeted, though, perhaps you should spread out your resources, and increase the number of systems the attackers have to penetrate.  There’s no one answer to this question, but you should give it some thought.

There’s one more point to consider about targeted attacks: what are the attackers actual goals?  It isn’t always obvious:

The attackers were particularly active in the period after the Oct. 25 publication of The Times article about Mr. Wen’s relatives, especially on the evening of the Nov. 6 presidential election. That raised concerns among Times senior editors who had been informed of the attacks that the hackers might try to shut down the newspaper’s electronic or print publishing system. But the attackers’ movements suggested that the primary target remained Mr. Barboza’s e-mail correspondence.

“They could have wreaked havoc on our systems,” said Marc Frons, the Times’s chief information officer. “But that was not what they were after.”

That is, the Times was concerned that the attackers might try to vandalize the network, either in revenge or to prevent more embarrassing articles from being published.  Could they have coped?  Could you?

January 9, 2013

Standard-Essential Patents

by Steve Bellovin

The FTC has just announced a broad settlement with Google.  Let’s talk about one aspect, the consent order on “standard-essential patents” (SEP).  It’s an important issue; the New York Times noted that “legal experts say Google’s settlement with the F.T.C. signals progress in clarifying the rules of engagement in high-tech patent battles, and thus could ease them.”

Patents have long been a part of American society.  Indeed, the Constitution (Article I, Section 8) explicitly endorses them:

The Congress shall have Power …

To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries

Thomas Jefferson was a principal architect of the American patent system (and indeed, as the Supreme Court has noted, was the “first administrator of our patent system”).  He found patents, on the whole, to be beneficial, and noted that “An act of Congress authorising the issuing patents for new discoveries has given a spring to invention beyond my conception.”  Still, he was cautious.  By definition, a patent is a monopoly, albeit one presumed to be a net benefit to society.  The Supreme Court, channeling him, put it this way:

The grant of an exclusive right to an invention was the creation of society—at odds with the inherent free nature of disclosed ideas—and was not to be freely given. Only inventions and discoveries which furthered human knowledge, and were new and useful, justified the special inducement of a limited private monopoly. Jefferson did not believe in granting patents for small details, obvious improvements, or frivolous devices. His writings evidence his insistence upon a high level of patentability.

The issue of monopoly is the crux, though.  A patent is not the right to do something; rather, it is the right to prevent others from doing it.  Suppose, for example, that you hold the patent on the pencil and someone else holds a patent on an eraser.  Without the consent of the other, neither of you can manufacture pencils with attached erasers; such a device would infringe both patents.  (The question of whether or not this combination is itself patentable is a complex one; I won’t address it here.  For now, let it suffice to say that a patent may not be granted “if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains”.)

Beneficial monopolies can arise from another scenario: standardization.  Technical standards more or less by definition create monopolies; the chosen solution becomes the one everyone needs to implement: “[i]n the case of a standard that effectively requires the use of a proprietary technology, the standard, if adopted (whether de facto or by formal process), can imbue the technology with market power that it previously lacked. Thus there is the potential for monopolization, or more minimally a raising of rivals’ costs, through the conjunction of an adopted standard and a proprietary technology”.  Note especially the last clause: “the conjunction of an adopted standard and a proprietary technology”.  Therein lies the rub: how should the combination of a patent—one form of beneficial monopoly—and a technical standard—another form—be handled?

The issue can be thorny.  Not surprisingly, standards organizations have long recognized this problem, though different ones have taken different approaches.  The Internet Engineering Task Force (IETF) requires full disclosure of applicable patents, and lets the working groups decide if the proffered terms are acceptable.  The World Wide Web Consortium (W3C) requires a commitment to royalty-free licensingIEEE (the Institute of Electrical and Electronics Engineers)insists on a commitment that “a license for a compliant implementation of the standard will be made available to an unrestricted number of applicants on a worldwide basis without compensation or under reasonable rates, with reasonable terms and conditions that are demonstrably free of any unfair discrimination”.  In other words, the means and mechanisms differ, but the overall goal is the same: to ensure that many different parties can implement the standard, thus mitigating the monopoly characteristics of the patent.

Trouble can arise when companies abuse the standards process, especially with respect to patents.  The FTC has long taken a dim view of abuse: “This settlement makes it clear that firms cannot commit to an open standard, and then, after it becomes successful, assert patent rights in an effort to block use of the design or drive up the price through royalty payments.”  (In this case, Dell had asserted a patent pertaining to a video card interface design after it was accepted as a standard.)  Some patents, though, are more problematic than others.

If a patent applies to something within a device, it can be easier (though not free) to work around.  Consider that patent in the Dell case.  If that patent wasn’t licensable at a reasonable cost, a company could have designed its own interface to its own cards.  That’s more expensive (and hence still potentially abusive), because it precludes buying off-the-shelf, standards- (and patent-) compliant ones; still, it’s probably doable.  Life is much harder when the patent concerns ways of interacting with the outside world.  A cell phone, for example, must talk to the cell phone network; if it can’t, it’s nothing but a fancy toy.  No single party is big enough to build their own world-wide phone network; if nothing else, suitable radio spectrum isn’t available.  That’s why the consent order with Google is so important: Google has agreed that it will not seek injunctions to bar companies from using Google’s SEPs while license negotiations are taking place.  The issue of what constitutes a “FRAND”—fair, reasonable, and non-discriminatory—license fee is not resolved; however, it does prevent Google from keeping other companies out of certain markets entirely.  “The FTC concluded that this type of patent hold-up is precisely what the standard setting organizations sought to prevent by instituting FRAND licensing requirements”.  (In a related item, the Justice Department and the U.S. Patent Office have issued a statement saying that the International Trade Commission should not, in general, grant exclusion orders based on SEPs.)

“Talk to the outside world” is one aspect of interoperability.  Some of the devices affected by this settlement (e.g., smartphones and tablets) have to interoperate at many different levels.  Consider the iPhone.  It has to adhere to the myriad technical specifications needed for several different cell phone standards (2G, 3G, and CDMA, and LTE for the newer models); all of those come in different flavors for voice and data, and don’t forget text messages.  There are standards for WiFi and for USB.  There are IETF and W3C standards that say how to send and receive email and web pages.   JavaScript has its own definition.  JPEG standards have to be followed, not just for the camera but also for web pages.  Even many of the apps which don’t overtly use these mechanisms employ them under the hood, for their own purposes.  After all, why invent your own image or transport formats if you can use JPEG and HTTP over TCP instead.  All of these standards are needed; any may be subject to patents.  (A few months ago, I asked the head of a major standards body which areas in his organization were most affected by patents.  His answer was succinct: “All of them”.) 

It is no stretch to say that without these standards, smartphones couldn’t exist.  This settlement will allow for “reasonable” incentives for inventors, while preserving the interoperability necessary in today’s interconnected world.

January 2, 2013

COPPA and Signaling

by Steve Bellovin

As has been widely reported, the FTC recently amended its COPPA Rule enforcing the Children’s Online Privacy Protection Act. There’s a lot to be said about the new amendments to the Rule—indeed, a lot is being said—but as this is the FTC Tech Blog, I’m going to restrict my comments to technical aspects. Today, I’m going to talk about signaling—the way that a website can signal its COPPA status to the operators of other sites who provide it with some of the content that users see.

If you run a simple website, complying with COPPA is reasonably straightforward. If you’re covered—that is, if you have actual knowledge that a child is using your site, or if your content is directed towards children younger than 13— you must get parents’ permission before collecting personal information from kids. (N.B. Please see the formal rules to learn who is covered and for the precise definition of a “website or online service directed to children.” The Federal Register notice with the new rules is 167 pages of PDF; I’m not going to try to interpret or even summarize all that text. And ask your lawyers, not your computer scientists.) However, many commercial websites contain content from multiple sources: ad networks, third party plug-ins, etc. Who should be responsible for their COPPA compliance?

The announcement of the amended Rule makes this very clear: “The definition of an operator has been updated to make clear that the Rule covers a child-directed site or service that integrates outside services, such as plug-ins or advertising networks, that collect personal information from its visitors.” If it’s on your site, you’re responsible—period.

The announcement also says that “the definition of a website or online service directed to children is expanded to include plug-ins or ad networks that have actual knowledge that they are collecting personal information through a child-directed website or online service.” How can a plug-in “have actual knowledge” that it is on a child-oriented site?

To answer that question (and to return to purely technical matters), we have to take a deeper look at how a website is constructed. When a user types a URL into a browser or perhaps clicks on a link on some other site, the browser contacts the site to retrieve an HTML (Hypertext Markup Language) file. That HTML file, in turn, can contain pointers to other content necessary to render the page: style sheets, images, IFRAMES (mini-webpages embedded in a larger one) and more. The user’s browser, not the website, then fetches these additional HTML files, which in turn can contain other embedded content.

In many instances, a plug-in or other site or service offering up content will not know everywhere that it has been embedded, nor can it easily control or prevent embedding. A site may receive a Referer: header, but sending those headers is optional. (In fact, some browsers let you disable sending them.) If one is present, it may be from yet another party; often, references in the original HTML file point to, say, an ad network, which in turn points to the actual ad. But let’s assume that there’s a genuine Referer: header that really mentions the COPPA-covered site. Does the embedded site then “have actual knowledge”? That’s not likely without further information.

We can resolve this problem if there is explicit signaling from the embedding web page to the plug-in or other included content. This could be accomplished by a joint effort of industry members. Indeed, such signaling is already in place for other purposes; ad networks generally prescribe how to request ads that are relevant to the page on which they’re being displayed. Here’s a random example I stumbled on recently in a news article about North Korea:

http://ad.doubleclick.net/adj/trb.chicagotribune/news;;ptype=s;slug=sns-rt-us-korea-north-touristbre8bk10z-20121221;rg=ur;ref=chicagotribunecom;pos=T;dcopt=ist;sz=728×90;tile=1;ca=CrimeLawandJustice;en=SeoulSouthKorea;at=CrimeLawandJustice;at=SeoulSouthKorea;at=PhysicalFitnessandExercise;at=CivilRights;at=PyongyangNorthKorea;u=sz%7C728x90%21;ord=84919093?

Quite obviously, Doubleclick is being passed information about the website, the article name, the countries involved, and various keywords relevant to the topic. It would be no stretch at all to include a “COPPA-covered site” flag as well.

We could do better than this if there were a formal standard or agreed-upon convention, though. What might one look like? It has to be something in the URL, since that’s the only thing that a browser will understand and be able to pass on. While it’s possible to put the signal into the hostname or username/password sections of the URL, those are awkward. A better idea is to put the signal into the path. Thus, an IFRAME directive from a COPPA-covered site might start something like this:

A more general form, perhaps to be adopted by the W3C, might look something like this:

http://hostname/__RESTRICT:US-COPPA13,EU-PRIVACY,etc/…

Furthermore, if the embedded content itself embeds other content from yet other sites or sends Redirect messages, it would be obligated to pass along the COPPA signal. Note that in the first scenario, one couldn’t, even in principle, rely on the Referer: line, since it only goes back one hop.
Your plug-in doesn’t collect any information that would implicate COPPA? No problem; with the Apache web server (and almost certainly with other popular web servers), configuring the system to ignore such flags if irrelevant is almost trivially simple.
The same principle can be applied to links to platforms (e.g., a Facebook “Like” button, a Google “+1” button, etc.): the embedding site is the only component that knows at the outset at whom it is directed, and hence must pass along that signal.
This can’t be done today to embed arbitrary content; as I said, there are no current COPPA signaling standards. But there’s an opportunity here for industry action to make such signaling a real option for tomorrow.

December 12, 2012

History Sniffing

by Steve Bellovin

We all know about links on web pages.  We’ve all also noticed that links we’ve visited look different.

You’ve never visited this one, right?  (I’m pretty sure you haven’t, since the link is to a randomly-generated URL.)  On the other hand, if you’re reading this blog you probably have visited http://www.ftc.gov/ or http://techatftc.wordpress.com/.  (If not, visit one of them now and come back to this page.)  You can see from this example how they look on this blog.

Visited and unvisited links are displayed differently on different web sites.  Here’s what a piece of this page looked like when I was creating this blog post:

image001

It looks different on the live blogging site, though.  The ability to customize the look and feel of a site is very important in today’s Web.  If nothing else, being able to change how links are displayed can make that aspect consistent with the rest of a web page’s visual design.  Unfortunately, there’s a privacy risk: doing that makes it possible for web sites you visit to figure out where you’ve been.

This technique, sometimes called “history sniffing”, isn’t a newly-discovered problem; a Firefox bug report about it was created more than a dozen years ago.  It’s not a simple problem to solve, though, because of the complexity of modern web pages: there are many different ways in which link-styling can be indicated, and many ways in which it can be queried.

The simplest setup starts with a Cascading Style Sheet entry for link-coloring:

:link {
    /* for unvisited links */
    color: blue;
}
:visited {
    /* for visited links */
    color: purple;
}

(You can find many different variants by searching the web for “history sniffing”; this particular one is taken from an excellent web page by L. David Baron, a long-time Firefox developer.)  So far, so good; problems arise because it’s possible to query the color of any given link, and hence its “visited” status.

Again, there are many different ways to do this.  Some rely on the JavaScript function getComputedStyle(); here’s Baron’s example:

if (getComputedStyle(link, "").color == "rgb(0, 0, 128)") {
        // we know link.href has not been visited
} else {
        // we know link.href has been visited

(I’ve omitted the code that sets the variable “link”; again, see his page for the gory details.)  Others use built-in features of CSS.    (See this paper for a technical analysis of different schemes.)  Once a web page has detected that you’ve visited some site of interest, it can request some specific page, not because it wants the content—it might just be a transparent GIF—but as a signaling mechanism.  The net result is a violation of user privacy.

What should you do?  If you’re a web site developer, you certainly should not engage in history-sniffing; apart from being unethical, you might run into legal difficulties.  Indeed, the FTC has just announced a settlement in a history-sniffing case.  Equally important, if you include content from other sites on your pages (and most commercial sites do), make sure they’re not doing anything nasty.

Consumers face a harder problem.  The simplest thing to do is to upgrade to a modern browser; today’s browsers incorporate certain defenses.  Firefox, for example, now has a variety of features to defeat most forms of history-sniffing; Internet Explorer has similar defenses as of IE9.  (The IE9 explanation also has a nice demo page at http://www.debugtheweb.com/test/cssvisited.htm—try it to see if your browser has fixed the problem.)  For example, JavaScript code only sees the “unvisited” colors, and URLs loaded via CSS directives are always loaded, regardlessof whether the directives apply to the visited or unvisited variants.   (Installing the latest version of a browser tends to solve many other security problems as well.  Unless you have a really good reason to refrain, you’re probably well-advised to turn on autoupdate.)

You can always clear your browser history before visiting a site you suspect of doing this, though of course that deprives you of history information.  Better yet, switch to “private browsing mode” (sometimes known as “incognito mode”).  These solutions, of course, assume that you know which sites are doing this.  A study published two years ago showed some modest incidence of history-sniffing in the wild, including on one Alexa “Top 100” site.

October 3, 2012

Complexity and Scams

by Steve Bellovin

All of us use gadgets—cars, phones, computers, what have you—that we don’t really understand.  We can use them, and use them very effectively, but only because a great deal of effort has been put into making them seem simple.  That’s all well and good—until suddenly you have to deal with the complexity.  Sometimes, that happens because something has broken and needs to be fixed; other times, though, scammers try to exploit the complexity.  The complaints released today by the FTC illustrate this nicely (the press release is here; it contains links to the actual filings), with lessons for both consumers and software developers.  (It turns out that programmers speak a different language than normal people do—who knew?)

It’s a long story, but I can summarize it easily enough: scammers call people claiming to be from a reputable vendor.  They trick their victims into thinking that their computers are infected, and persuade them to fork over $100 or more.

The scam starts innocently enough: people receive a call telling them that their computer “may” be infected.  (The call itself may be illegal if it’s a “robocall”—do you know about the forthcoming FTC Robocall Summit?  Even if it’s not a robocall, it’s illegal if the recipient is on the Do Not Call list.)  The caller will claim to be from a computer or computer security company. (I received such a call, well before I joined the FTC; that person claimed to be from Microsoft.  Yes, I’m on the Do Not Call list.)  The victim will be talked through some steps designed to “demonstrate” that their computer is infected.  You’re then given the “opportunity” to pay them for fixing it.

Lesson 1: Be extremely skeptical if someone calls you; reputable security companies don’t “cold call” people.  If you have any doubt whatsoever about the legitimacy of the caller, call back using a number you’ve learned independently, perhaps using a phone number from their web site.  (This issue is broader than just this scam.  For example, if a caller claims to be from your credit card company, don’t give out any information; instead call back using the number on the back of your card.  And don’t believe Caller ID; it’s easily spoofed.  There are also lessons here for developers, but I’ll save those for another post.)

 This is the most important lesson to learn: “Don’t call me; I’ll call you.”

It’s worth noting that scammers in this case did in fact use Caller ID spoofing, to make the calls appear to be coming from the U.S. rather than India.  That turns out to be remarkably easy to do.  Here’s the crucial question: when a call starts on one phone company’s network but terminates on another’s, how does the receiving company know the caller’s number?  Answer: the receiving company believes whatever it’s told, whether the information is coming from another phone company or a private branch exchange (PBX).  This worked tolerably well when there were only a few, large telcos; now, though, there are very many—and every Voice over IP (VoIP) gateway to the phone network counts as a telco or PBX.

That trust model no longer works.  There are many more telephone companies than there once were, and there are very many VoIP gateways.  If even one doesn’t check the Caller ID asserted by its customers—and there are valid technical reasons not to, in some situations; consider the case of an employee who wants to make an expensive international call via the company PBX—it’s very easy for a malefactor to claim any number he or she wishes.  (Note that using fake Caller ID “for the purposes of defrauding or otherwise causing harm” is illegal.)  One of the accused firms here claimed to be calling from Quinnipiac University or New York City; another claimed to be from Texas, etc.

In this particular scam, the victim is asked to run a program called the “Event Viewer”.  Most computer systems log various things that take place, including mildly anomalous conditions; Event Viewer is the way to display such logs on Windows.  The information is often quite cryptic, but invaluable to support personnel.  Cryptic?  Yes, cryptic, as you can see below.

The point is that you’re not expected to understand it; it’s information for a technician if you need help.

The consumer is then directed to look for “Warnings”.  That sounds scary, right; your computer is warning you about something.  Lesson 2: Programmers use words differently.  On most computer systems, warnings are less serious than errors; you generally don’t need to do anything about a warning.  Contrast “Warning, your disk is 90% full” with “Error: no space left on disk.”  That isn’t normal usage (to the Weather Service, a storm warning is more serious than a storm watch, which is why programmers get confused when they listen to weather reports…), which gave the scammers one more thing to exploit.

Next, of course, it’s time to scare the consumer—“Jesus, did you say warning?”—followed by completely bogus cautions to avoid clicking on the warnings.  (What happens if you do click on one of those messages?  Nothing bad; you just get a new window with more information, and a URL to click on to get even more details.  That’s what the screen shot shows.)

What happens if you click on a warning or error

It’s also worth realizing that even most of the errors logged are quite irrelevant and harmless.  That isn’t always the case, but more or less any machine will experience many transient or otherwise meaningless failures, perhaps induced by things like momentary connectivity outages.

There’s also a lot of technical doubletalk, presumably intended to impress the victim with the caller’s expertise.  Most of this is pure nonsense, such as (in one call from an FTC investigator) learning that “DNS” is “dynamic network set-up”.  The DNS, of course, is really the “Domain Name System”, the Internet mechanism that translates things like www.ftc.gov into a set of numbers that the underlying hardware really understands.  My favorite was hearing that “the Javascript in your computer has been fully corrupted”.  Javascript is indeed a programming language, but its primary use is creating dynamic web pages.  It’s not normally “on” your computer in any permanent sense; rather, Javascript programs are downloaded to your  web browser when you visit most commercial web sites.  (These programs are run in what is called a “sandbox”, which in theory means that they can’t affect anything on your computer.)

Lesson 3: Just because someone can spout technical terms it doesn’t mean they’re knowledgeable or legitimate.  Of course, asking them to explain what they’re saying doesn’t prove much; they can respond with more glib doubletalk.  A legitimate support tech can probably explain things somewhat more simply; however, while lack of technical details might be a good reason for suspicion, the presence of them says very little.

The victim is then told to download and run a program from the scammer’s web site.  That’s bad, too—you should never run a program from an unknown source—but of course by this time the victim does trust the caller.  But this is really dangerous: once you run someone else’s code, it could be game over for your computer; it’s really, really hard to disinfect a machine thoroughly. The same applies to credit card numbers: once you give it out, you could be charged far more and far more often than a one-time payment to a scammer.

Where does this leave us?  Like most con artists, the callers here are trying to gain your trust before ripping you off.  The best thing is to cut them off at the start.  Remember Lesson 1—“Don’t call me; I’ll call you.” —and use a number that you’ve looked up on your own.