cv2.dnn.blobFromImage
This function perform:
Mean subtraction is used to help combat illumination changes in the input images in our dataset.
Before we even begin training our deep neural network, we first compute the average pixel intensity across all images in the training set for each of the Red, Green, and Blue channels.
This implies that we end up with three variables:
$\mu_R$, $\mu_G$, and $\mu_B$
Typically the resulting values are a 3-tuple consisting of the mean of the Red, Green, and Blue channels, respectively.
When we are ready to pass an image through our network (whether for training or testing), we subtract the mean, \mu, from each input channel of the input image:
R = R - $\mu_R$
G = G - $\mu_G$
B = B - $\mu_B$
We may also have a scaling factor, $\sigma$. The value of $\sigma$ may be the standard deviation across the training set which adds in a normalization:
R = (R - $\mu_R$) / $\sigma$
G = (G - $\mu_G$) / $\sigma$
B = (B - $\mu_B$) / $\sigma$
Function signature:
blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size, mean, swapRB=True)
Where:
cv2::dnn::Net Class Reference This class allows to create and manipulate comprehensive artificial neural networks.
Neural network is presented as directed acyclic graph (DAG), where vertices are Layer instances, and edges specify relationships between layers inputs and outputs.
Each network layer has unique integer id and unique string name inside its network. LayerId can store either layer name or layer id.
This class supports reference counting of its instances, i. e. copies point to the same instance.
# load the input video and construct an input blob for every frame
# by resizing to a fixed 600x400 pixels and then normalizing it
cap = cv2.VideoCapture("SPIDER-MAN FAR FROM HOME - Official Teaser Trailer.mp4")
while(True):
# Capture frame-by-frame
ret, frame = cap.read()
# Our operations on the frame come here
frame1 = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frame1 = cv2.resize(frame,(int(600),int(400)))
blob = cv2.dnn.blobFromImage(cv2.resize(frame1, (300, 300)), 1.0, (300, 300), (104.0, 177.0, 123.0))
net.setInput(blob)
detections = net.forward()
(h, w) = frame1.shape[:2]
# loop over the detections
for i in range(0, detections.shape[2]):
# extract the confidence (probability) associated with the prediction
confidence = detections[0, 0, i, 2]
# filter out weak detections by ensuring the `confidence` is
# greater than the minimum confidence
if confidence > min_confidence:
# compute the (x, y)-coordinates of the bounding box for the
# object
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
# draw the bounding box of the face along with the associated
# probability
text = "{:.2f}%".format(confidence * 100)
y = startY - 10 if startY - 10 > 10 else startY + 10
cv2.rectangle(frame1, (startX, startY), (endX, endY),
(0, 0, 255), 2)
# (0,0,255) - red color
cv2.putText(frame1, text, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
# show the output frame
cv2.imshow("Frame", frame1)
key = cv2.waitKey(1) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
break
# do a bit of cleanup
cv2.destroyAllWindows()