What is a perceptual hash function?

the Transform Technology Summits begins October 13 with Low-Code / No Code: Enabling Business Agility. Register now!

Leave the OSS Enterprise Newsletter guide your open source journey! sign up here.

When programmers need to create a shorter substitute for a larger file or block of data, they often turn to hash functions. These programmers parse a block of data and produce a short number that can act as a substitute or abbreviation for the larger collection of bytes, sometimes in an index and sometimes in a more complicated calculation.

The perception hash functions are set to produce the same result for similar images or sounds. Its goal is to mimic human perception by focusing on the types of characteristics (colors and frequencies) that drive human sight and hearing.

Many popular non-perceptual hash functions are very sensitive to the smallest changes. Simply flipping a bit, say by changing the amount of blue in a pixel from 200 to 199 units, could change half the bits in the hash functions. Perception hash functions are designed to return responses for images or sounds that a human might feel are similar. That is, small changes in the media do not affect the output.

Hash functions simplify searching and indexing across databases and other data storage. Hash tables, a popular data structure known for its fast response, rely on a good hash function as an index to quickly locate the largest block of data. Facial recognition algorithms, for example, use a perceptual hashing function to organize photos of the people in the image. The algorithms use the relative distances between facial features, such as eyes, nose, and mouth, to construct a short vector of numbers that can organize a collection of images.

Some algorithms rely on hash functions to mark changes. These approaches, often called “checksums,” started out as a quick way to search for poorly transmitted data. Both the sender and the receiver can sum all the bytes of the data and then compare the response. If they both agree, the algorithm could assume that no errors were made, an assumption that is not guaranteed. If the errors made in the transmission occurred in a certain way, for example by adding three to a byte and at the same time subtracting three from a different one, the errors would be canceled and the checksum algorithm would not detect the error.

All hash functions are vulnerable to “collisions” when two different data blocks produce the same hash value. This happens more often with hash functions that produce shorter responses because the number of possible blocks of data is much, much greater than the number of possible responses.

Some functions, such as the US government’s standard secure hashing algorithm (SHA256), are designed to make it virtually impossible for someone to find a collision. They were designed using the same principles as strong encryption routines to avoid reverse engineering. Many cryptographic algorithms rely on secure hash functions like SHA256 as a building block, and some refer to them colloquially as the “duct tape” of cryptography.

Perception hash functions cannot be that robust. They are designed so that similar data produces a similar hash value, which makes it easier to find a collision. This makes them vulnerable to spoofing and misdirection. Given one file, it is relatively easy to build a second file that looks and looks quite different but produces the same perceptual hash value.

How do perceptual hash functions work?

Perceptual hash functions remain an active field of research and there are no definitive or even dominant standards. These functions tend to divide an image or sound file into relatively large blocks and then convert similar shapes or sounds to the same value. The approximate pattern and distribution of values ​​in these blocks can be thought of as a very low resolution version and is often the same or very similar for nearby images or sounds.

A basic function for sound, for example, you can divide the file into one-second sections and then analyze the presence or absence of frequencies in each section. If there are low-frequency sounds, say between 100 Hz and 300 Hz, the function can assign a 1 to that section. You could also try other popular frequencies, such as the common range for the human voice. Some automatic functions for identifying popular music can do a good job with a simple function like this because they will detect the rhythm of the bass and the moments when someone is singing.

The size of the blocks and the frequencies that are tested can be adjusted for the application. Higher frequencies can trigger a hash function to identify bird songs. Shorter blocks offer more precision, something that may not be desirable if the goal is simply to group similar sounds together.

Image functions use similar techniques with colors and blocks. For this reason, many perception functions will often match the shapes. A photo of a person with their arms at their sides and their legs spread apart can match a photo of the Eiffel Tower because they are both the same shape.

Several common options for comparing images are ahash, dhash, and phash. The ahash calculates the average color of each block after dividing the image into an 8 × 8 grid of 64 blocks. The phash function is available as open source.

What can they do?

Perception hashes can support a diverse collection of applications:

  • Copyright Infringement – Similar hashes can detect and match images, sounds, or videos, even if they have been modified by cropping or downscaling.
  • Video tagging: Facial perception hashes can help index a video to identify when particular people are viewed.
  • Spelling mistakes: Text perception hash functions can classify words by their sounds, allowing you to detect and correct misspelled words.
  • Security: Perception hashes can find and identify people or animals in video or still images that track their movement.
  • Compliance: Some algorithms can detect what people are wearing, useful for construction sites and hospitals. An algorithm can flag people who might not be wearing the personal protective equipment required by law, for example.

How Legacy Players Use Them

Some databases, such as MySQL, Oracle, and Microsoft – use the Soundex algorithm to allow “fuzzy search” for words that sound the same even though they are spelled differently. The algorithm’s answer is made up of a letter followed by several digits. For example, both “SURE” and “SHORE” produce the same result: “S600”.

Some of the cloud companies also offer facial recognition algorithms that can be easily integrated with their database. From Microsoft AzureFor example, it offers Face, a tool that will find and group similar faces in a collection of images. The company’s API will find and return the attributes of a face, such as hair color or the presence of facial hair. An attempt will also be made to construct an estimate of the person’s age and basic emotions (anger, contempt, happiness, etc.).

Amazon Recognition it can detect faces in images, as well as other useful attributes, such as text. It works with both still images and videos, which makes it useful for many tasks, such as searching for all the scenes with a particular actor. Rekognition also maintains a database of celebrities and will identify them in their images.

Of Google Cloud Vision API detects and categorizes many parts of an image, such as text or landmarks. The tool does not offer direct facial recognition, but the API will find and measure the location of items, such as the midpoint between the eyes and the limits of the eyebrows. Celebrity recognition it is currently a restricted beta product.

How upstarts are applying them

Apple recently Announced would use perceptual hashing functions called NeuralHash to search customers’ iPhones for potentially illegal images of child sexual abuse. The results of the perceptual hashing algorithm would be compared with the values ​​of known images found in other investigations. The process would be automatic, but any match could trigger an investigation.

Various companies, such as Clearview.ai o Facebook: they are creating databases full of perceptual hashes of scanned images. In general, they do not make these databases available to other developers.

The subject is an area of ​​active exploration. Some open source versions include pHash, Block hash, and OpenCV.

Is there anything that perceptual hash functions can’t do?

While perceptual hash functions are usually quite accurate, they tend to produce false matches. Apple’s facial recognition software used to unlock an iPhone can sometimes confuse parents with children, allowing have children unlock their parents’ phones.

In general, the ability of a hash function to reduce an often large or complex data set to a small number is also the source of this weakness. Collisions are impossible to prevent because there are often dramatically fewer potential responses and a much, much greater number of inputs. While some cryptographically secure hash functions can make it difficult to find these collisions, they still exist.

In the same way, the strength of perceptual hash functions is also a major weakness. If the function does a good job of approaching human perception, it will also be easier for humans to find and even create collisions. There are several attacks that can take advantage of this aspect. Several early experimental projects (here other here), for example, offer software to help find and even create collisions.


VentureBeat’s mission is to be a digital urban plaza for technical decision makers to gain insight into transformative technology and transact. Our site offers essential information on data technologies and strategies to guide you as you run your organizations. We invite you to become a member of our community, to access:

  • updated information on the topics of your interest
  • our newsletters
  • Exclusive content from thought leaders and discounted access to our treasured events, such as Transform 2021: Learn more
  • network features and more

Become a member



Please enter your comment!
Please enter your name here