Have you ever found yourself wondering who’s that person in the image you just saw and wondered if there’s a simple way to find out? Turns out, there indeed is a really simple and elegant solution in the blooming field of AI: Computer Vision!
Computer Vision is the branch of AI that deals with teaching the computers to see and perceive the world aroud us. It allows computers to detect, identify and track objects it sees in the images.
This occurred to me a few days back, and so I thought I would write an article on it once I get it running. Now that I have it ready, let’s dive into it and see how to implement this one in Python!
The idea
The basic idea is to build a simple command line tool that lets us conveniently detect faces from an image and run a search on them. Turns out, we have an awesome tool to search by images already- Google’s reverse image search!

But, the problem is, searching for the whole picture won’t be just as fruitful, as we want to search for a specific person in the picture. This is where we will need a brilliant computer vision library- OpenCV!
OpenCV stands for “Open Source Computer Vision”. As its name suggests, it is an open source library containing a suite of functions for common computer vision tasks.
The library will allow us to automatically detect the faces in an image. We will then crop out the faces from the image (using NumPy), show them in a window and let the user choose one of the faces, by pointing and clicking.
The code
Code seems too much? Jump to the completed project directly: FaceSearch.
OR, continue reading to implement it all by yourself!
Enough of beating around the bush, let’s get down to the actual implementation. We will be doing this one in python, so ensure you have it installed already. We will need only two additional modules two get going: OpenCV and NumPy. The OpenCV (3) library for Python3 is broken somewhat. It gives you errors when you try to use some of its functions and needs some building from the source to get it working properly. Fortunately for us, there’s already an unofficial pre-compiled version of OpenCV (3) available on the PyPi (named opencv-python). Just run following to get both of these installed.
# Only if you don't have the pip for Python3, skip otherwise # sudo apt-get python3-pip pip3 install numpy opencv-python
Fire up your favorite text editor or IDE, create a new file and let’s start.
It is dangerous to go alone, take these along with you.
import numpy as np # Convention import cv2 # Importing OpenCV # Needed for file operations and passing command line arguments import sys import os # Needed to upload the image to Google import requests import webbrowser import urllib
Yes, the OpenCV is imported as cv2 (Reason here). So, we have imported all we need. Our script will take in the path of the image as its first argument on the command line (Read more about sys.argv). Your users (which is you, most of the times) are super lazy and want to have the luxury of simply dragging and dropping an image from the browser and straight onto the terminal (doing so pastes the internet URL of the image on the terminal). So, we will check if what we received is an URL or a local path at first. We will also check if the path is valid or not and will raise an error if it isn’t.
try: path = sys.argv[1] # get whatever user provides on terminal except IndexError: # if no path is provided print("Please input a path.\nUsage: python search.py path/to/file") sys.exit() # Exit # For internet URLs if path.startswith('http:') or path.startswith('https:'): try: req = urllib.request.urlopen(path) # Fetches the response # and returns a FILE-like object arr = np.asarray(bytearray(req.read()), dtype=np.uint8) # Images are matrices of unsigned 8-bit integers. # Reads the raw bytes from response and puts them in a numpy array image = cv2.imdecode(arr, -1) # 'Decode' the array to work as an image for use in OpenCV except: print('Couldn\'t load the image from given url.') sys.exit() # Exit else: # If the path is not an URL image = cv2.imread(path)
The urllib method will raise an error when it fails to load the image from the given path. OpenCV, however simply returns None instead. So, we need to check if we received None and if that’s the case, exit.
if image is None: # Check if the path is valid. print("""Image could not be loaded. 1. Make sure you typed in the path to the image correctly. 2. Make sure you have read permissions to the image file.""") sys.exit()
Okay, so now we have the image loaded into the memory. Now to detect the faces in our image, we will use the CascadeClassifier from OpenCV, which in turn uses something called ‘Haar Cascades’[1] to detect multiple objects of a given class. These cascades are encoded in an XML file. We usually need to train these Haar cascades on our own, but for common things like faces, eyes, cats etc., there are pre-trained cascades available over here, on OpenCV’s GitHub repo. We will use this cascade for our purpose. I found it was giving better predictions for bounding boxes. Click here to download the cascade. Keep it in an easily accessible path. We will need it for the next step.
We will now create a CascadeClassifier object with the path to cascade we want to use as the sole argument.
cascade = "./face_alt.xml" # Path to the cascade file we downloaded cascade = cv2.CascadeClassifier(cascade) # Load the CascadeClassifier
The CascadeClassifier object has a method called detectMultiScale that detects all the specified objects in a given image and return us the coordinates to the bounding boxes in a python list in the format , where
are the coordinates for the top left corner and
are the width and height of the bounding box respectively.
detected = cascade.detectMultiScale(image) # Detect faces
We will first check if any face was detected. If no face was detected, we will exit with an error message.
if len(faces) == 0: # If no face is detected in the image. print("No face detected.") sys.exit()
If we indeed detect faces in the image, we will then crop out the faces and put them in a python list for later use.
faces = [] for x, y, w, h in detected: faces.append(image[y:y+h, x:x+w, :]) # Crop out individual faces
We will now create a copy of faces and pad the images appropriately (using np.pad) and shape them into squares. This will help us display the detected faces appropriately. We will also add a small green quarter circle at the bottom left of each face and number each of the faces (just to add a good visual effect). To add a circle, we use cv2.circle which takes as input the image, center of the circle, radius of the circle, the color of the circle (as a tuple of three integers for values of red, green and blue respectively) and an optional argument which stands for the thickness. Giving in a value of draws a filled circle.
cv2.putText is used to draw text on the image. This one’s a bit trickier. It takes as input the image, string to be typed, coordinates of the bottom–left point of starting of text, font (see available fonts here), font scale, font color (in the RGB tuple format), thickness and the ‘line type’ (available line types with their description here).
faces_copy = faces.copy() a = 128 # To resize all faces to square of side a. Only for displaying. for i, face in enumerate(faces_copy): faces_copy[i] = cv2.resize(face, (a, a)) # Resize faces faces_copy[i] = np.pad( # Pad the faces with a white border faces_copy[i], ((2, 2), (2, 2), (0, 0)), mode='constant', constant_values=((255, 255), (255, 255), (0, 0)) ) cv2.circle( # Draw a quarter-circle at bottom-left of image. faces_copy[i], (5, a), int(0.25*a), (0, 200, 0), -1 ) cv2.putText( # Type the index of the face over the quarter circle. faces_copy[i], str(i), (0, a), cv2.FONT_HERSHEY_DUPLEX, 0.007*a, color=(255, 255, 255), thickness=1, lineType=cv2.LINE_AA )
Now that we have all the faces neatly squared and padded, let’s stack them horizontally to display them to the user in a neat way. We will use np.hstack for this purpose. Usage is pretty intuitive as shown below. Next, we will put a bit of text on the top asking user to click on the face he wants to search for. To do this, we need ample space at the top. For the phrase “Click on the face you want to search for” with current font configuration, it takes around width (in pixels) to display the complete phrase without truncation (a=128, as chosen in an earlier block of code). So, if the width is lesser, we will add some padding to get the desired width. After this, we will add some padding above and write our phrase there. Let’s do this one.
faces_copy = np.hstack(tuple(faces_copy)) # For creating a single strip if faces_copy.shape[1] < 4 * a: pad = 4 * a - faces_copy.shape[1] # Calculating required padding faces_copy = np.pad( faces_copy, ((0, 0), (pad // 2, pad // 2), (0, 0)), mode='constant', constant_values=((0, 0), (255, 255), (0, 0)) ) faces_copy = np.pad( # Padding above to write some text. faces_copy, ((a//2, 0), (0, 0), (0, 0)), mode='constant', constant_values=((255, 255), (0, 0), (0, 0)) ) cv2.putText( # Writing some text on the top padded portion. faces_copy, 'Click on the face you want to search for.', (5, a // 4), cv2.FONT_HERSHEY_DUPLEX, 0.7, (0, 200, 0), lineType=cv2.LINE_AA )
We will create an OpenCV 'window' now. We will show the detected faces in it. The function used is cv2.namedWindow. Pretty intuitive, this one.
cv2.namedWindow('Choose the face')
Now, our faces are ready to be shown to the user in a neat format. The output window will look like this:

Now, we are going to implement the click-handler to let the user simply click on the face to search (you are lazy, right? :3). There’s a method in OpenCV, cv2.setMouseCallback that allows us to do this. It takes as input the name of the window to listen to mouse events on, and the function which will handle what happens after any mouse event occurs. The handler function needs to take as input 5 things: . Description[2] of each of them.
- event: The event that took place (left mouse button pressed, left mouse button released, mouse movement, etc). OpenCV sends this to our function.
- x: The x-coordinate of the event.
- y: The y-coordinate of the event.
- flags: Any relevant flags passed by OpenCV.
- params: Any extra parameters supplied by OpenCV.
We only need to care about the first three arguments, ‘event’, ‘x’ and ‘y’ for now. You can leave the rest for now. So, let’s create our very own click-handler function. There are a couple of mouse events[3] OpenCV can catch (See footnote 3).
def handle_click(event, x, y, flags, params): """ Records clicks on the image and lets the user choose one of the detected faces by simply pointing and clicking. """ # Capture when the LClick is released if event == cv2.EVENT_LBUTTONUP and y > a // 2: # Ignore clicks on padding response = x // (faces_copy.shape[1] // len(faces)) cv2.destroyAllWindows() cv2.imwrite('_search_.png', faces[response]) try: Search() except KeyboardInterrupt: # Delete the generated image if user stops print("\nTerminated execution. Cleaning up...") # the execution. os.remove('_search_.png') sys.exit()
Let’s break it down into parts. We first check if the cv2.EVENT_LBUTTONUP has occurred. As is obvious from its name, the event is fired up when the left click of the mouse is released. Note that we do not capture the event cv2.EVENT_LBUTTONDOWN (which corresponds to Left button being pressed). This is so that the next step happens only when the left button is released. Doing this also gives the user the freedom to switch to a different face before releasing the left click. This is more intuitive. We also check if the user has clicked on the image and not on the top padding by mistake (by checking ) and ignore any clicks on the padding.
Once we know the user has indeed clicked on one of the images, we will figure out which face by simply dividing the total width into same no. of parts as there are no. of faces and then checking which part the which we got from the user, lies in. We then use cv2.destroyAllWindows. This closes the generated window. We then select the user-chosen face from our cropped faces, save it to disk and call Search() function. We will implement this function next. This function will upload the image to the Google reverse image search for us and will also open a new browser window with the search results for the user. Let’s implement this one now.
We will use requests.post to upload the image to the server and get the fetchUrl from the response. This is the URL with the search results. We will use webbrowser.open to open the URL in a new browser window/tab. At last, we will print a thank you message, clean up the generated _search_.png.
def Search(): """ Uploads the _search_.jpg file to Google and searches for it using Google Reverse Image Search. """ filePath = '_search_.png' # Don't change searchUrl = 'http://www.google.com/searchbyimage/upload' # Don't change multipart = { <span class="pl-s"><span class="pl-pds">'</span>encoded_image<span class="pl-pds">'</span></span>: (filePath, <span class="pl-c1">open</span>(filePath, <span class="pl-s"><span class="pl-pds">'</span>rb<span class="pl-pds">'</span></span>)), <span class="pl-s"><span class="pl-pds">'</span>image_content<span class="pl-pds">'</span></span>: <span class="pl-s"><span class="pl-pds">'</span><span class="pl-pds">' </span></span>} print("Uploading image..") response = requests.post(searchUrl, files=multipart, allow_redirects=False) fetchUrl = response.headers['Location'] webbrowser.open(fetchUrl) print("Thanks for using this tool! Please report any issues to github." "\nhttps://github.com/IAmSuyogJadhav/FaceSearch/issues") os.remove('_search_.png') # Removing the generated file
Phew! We are finally done with the implementation part. Now, we just need to tell OpenCV to track mouse events on our created window and show the image in our window. We will use cv2.waitKey(0) to tell OpenCV to keep the window open indefinitely. You are free to put any integer in the brackets. OpenCV will then close the window automatically after
seconds. Putting it to
keeps the window open indefinitely.
cv2.setMouseCallback('Choose the face', handle_click) cv2.imshow('Choose the face', faces_copy) cv2.waitKey(0)
That’s all. Congratulations! You just created a nice application all by yourself that lets you automatically search for a person on the internet!
All of the above code has been put up on the blog’s GitHub repo. It also contains the source code for the rest of articles on the blog. I have created an installation script for this project (only supports Ubuntu right now) that will let you run this from your terminal. I have put up this project over here. You just need to clone the repo to your PC and run
bash install.sh
to install the FaceSearch on your Ubuntu PC.
Example
Let’s test the application on an example image.

On the terminal:
<code>anon@anon-pc:~/FaceSearch$ facesearch example/test.jpg [ INFO:0] Initialize OpenCL runtime... Uploading image.. Thanks for using this tool! Please report any issues to github. https://github.com/IAmSuyogJadhav/FaceSearch/issues anon@anon-pc:~/FaceSearch$ Created new window in existing browser session. █
The output window:

Output in the browser:
That’s it for this article. If you see anything broken or not as expected, please tell us. We will make sure it gets taken care of! Thanks for tagging along and do Subscribe to our handles on Facebook, Twitter and Linkedin (Links on the sidebar) to get notified about new posts. See you in the next one!
Edit 1:
- There was a minor error in the code for the search function. Fixed now. All thanks to Harshit for this one 🙂
- cv2.CascadeClassifier takes the path to the cascade file as its argument. Defined a new variable with the path to the cascade file to remove this plausible confusion.
Footnotes
- See this paper for more details: Viola and Jones,“Rapid object detection using a boosted cascade of simple features”
-
Borrowed from an awesome tutorial: “Capturing mouse click events with Python and OpenCV” on PyImageSearch.
- As per the complete list of events given on official OpenCV documentation page.
- FaceSearch on GitHub: FaceSearch.