cv2.dnn.blobFromImage

This function perform:

  • Mean subtraction
  • Scaling
  • Channel swapping (optionally)

Mean subtraction is used to help combat illumination changes in the input images in our dataset.

Before we even begin training our deep neural network, we first compute the average pixel intensity across all images in the training set for each of the Red, Green, and Blue channels.

This implies that we end up with three variables:

$\mu_R$, $\mu_G$, and $\mu_B$

Typically the resulting values are a 3-tuple consisting of the mean of the Red, Green, and Blue channels, respectively.

When we are ready to pass an image through our network (whether for training or testing), we subtract the mean, \mu, from each input channel of the input image:

R = R - $\mu_R$

G = G - $\mu_G$

B = B - $\mu_B$

We may also have a scaling factor, $\sigma$. The value of $\sigma$ may be the standard deviation across the training set which adds in a normalization:

R = (R - $\mu_R$) / $\sigma$

G = (G - $\mu_G$) / $\sigma$

B = (B - $\mu_B$) / $\sigma$

Function signature:

blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size, mean, swapRB=True)

Where:

  • scalefactor - we can optionally scale our images by some factor. This value defaults to 1.0 (no scaling)
  • size - spatial size that the Convolutional Neural Network expects
  • mean - our mean subtraction values
  • swapRB - OpenCV assumes images are in BGR channel order; however, the mean value assumes we are using RGB order. To resolve this discrepancy we can swap the R and B channels in image by setting this value to True.

cv2::dnn::Net Class Reference This class allows to create and manipulate comprehensive artificial neural networks.

Neural network is presented as directed acyclic graph (DAG), where vertices are Layer instances, and edges specify relationships between layers inputs and outputs.

Each network layer has unique integer id and unique string name inside its network. LayerId can store either layer name or layer id.

This class supports reference counting of its instances, i. e. copies point to the same instance.

In [5]:
# load the input video and construct an input blob for every frame
# by resizing to a fixed 600x400 pixels and then normalizing it

cap = cv2.VideoCapture("SPIDER-MAN FAR FROM HOME - Official Teaser Trailer.mp4")

while(True):
    # Capture frame-by-frame
    ret, frame = cap.read()

    # Our operations on the frame come here
    frame1 = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    frame1 = cv2.resize(frame,(int(600),int(400)))

    blob = cv2.dnn.blobFromImage(cv2.resize(frame1, (300, 300)), 1.0, (300, 300), (104.0, 177.0, 123.0))
    net.setInput(blob)
    detections = net.forward()
    
    (h, w) = frame1.shape[:2]
    # loop over the detections
    for i in range(0, detections.shape[2]):
        # extract the confidence (probability) associated with the prediction
        confidence = detections[0, 0, i, 2]

        # filter out weak detections by ensuring the `confidence` is
        # greater than the minimum confidence
        if confidence > min_confidence:
            # compute the (x, y)-coordinates of the bounding box for the
            # object
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")

            # draw the bounding box of the face along with the associated
            # probability
            text = "{:.2f}%".format(confidence * 100)
            y = startY - 10 if startY - 10 > 10 else startY + 10
            cv2.rectangle(frame1, (startX, startY), (endX, endY),
                (0, 0, 255), 2)
                # (0,0,255) - red color
            cv2.putText(frame1, text, (startX, y),
                cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

    # show the output frame
    cv2.imshow("Frame", frame1)
    key = cv2.waitKey(1) & 0xFF
 
    # if the `q` key was pressed, break from the loop
    if key == ord("q"):
        break
        
# do a bit of cleanup
cv2.destroyAllWindows()