true meaning of precision and recall

In order to evaluate/compare the performance of classification algorithms, people  tend to use precision and recall


Basically, here are the meanings:

  1. Precision: The bigger the better. It tries to measure how well the algorithm avoids false positive. (i.e., the number of false positive is big or not). Or, it is the percentages of true positive which are correctly measured. Or, ratio of correctly true positive items in the true positive set.
  2. Recall (i.e., sensitive): How well the algorithm tries avoiding the false negative. (the number of false negative is big or not). Or, ratio of true positive according to the real/actually true positive set (training/test set)

Some preferences

1.Google Course

2. Deep Learning – A practitioner’s approach, Josh & Adam, O’reilly, 2017



Another easy to grab explanation is from Apple documentation:

Precision and recall are actually two metrics. But they are often used together. Precision answers the question: Out of the items that the classifier predicted to be true, how many are actually true? Whereas, recall answers the question: Out of all the items that are true, how many are found to be true by the classifier?



Install Gitlab on Mac using Docker

Following the official instruction from gitlab to install the docker image by this command

docker pull gitlab/gitlab-ce

However, the guides to run that docker is for Linux and does not work on Mac.  In order to make it work on Mac, the following command must be use/or modidifed according to your usage

docker run  –hostname –publish 443:443 –publish 80:80 –publish 2200:22 –name gitlab –restart always  -v logs:/var/log/gitlab -v data:/var/opt/gitlab gitlab/gitlab-ce:latest

The first run, access Gitlab y http://localhost, create new password for root account. Then, the default login is root/<your new password>.

Good extension for scikitlearn

For me, this helps to visualize scikit-learn stuffs in a nice ways, for example, confusion matrix here

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import itertools
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from mlxtend.classifier import EnsembleVoteClassifier
from import iris_data
from mlxtend.plotting import plot_decision_regions

# Initializing Classifiers
clf1 = LogisticRegression(random_state=0)
clf2 = RandomForestClassifier(random_state=0)
clf3 = SVC(random_state=0, probability=True)
eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3],
                              weights=[2, 1, 1], voting='soft')

# Loading some example data
X, y = iris_data()
X = X[:,[0, 2]]

# Plotting Decision Regions

gs = gridspec.GridSpec(2, 2)
fig = plt.figure(figsize=(10, 8))

labels = ['Logistic Regression',
          'Random Forest',
          'RBF kernel SVM',

for clf, lab, grd in zip([clf1, clf2, clf3, eclf],
                         itertools.product([0, 1],
                         repeat=2)):, y)
    ax = plt.subplot(gs[grd[0], grd[1]])
    fig = plot_decision_regions(X=X, y=y,
                                clf=clf, legend=2)

install caffe on mac el capitan

Note for my installation

tip #1: no need to do this

brew install openblas

because veclib is an alternative built-in in mac already.
for error: cblas.h not found: we have to point out where is veclib headers are

cmake -DCMAKE_CXX_FLAGS=-I/Applications/ -DCPU_ONLY=1 ..

for error vecLib not found, open CMakeCache.txt, find the place  to change line to this


Update 24/05/2017

If you get this error

Error: Segmentation fault: 11 in Caffe (PyCaffe)

Error: Segmentation fault: 11

check again the cmake outputs, make sure python interpret and python dylib is from a same python distribution (i.e, homebrew or anaconda, miniconda)

CLiqz browser – privacy

I want to know what kind of data are sent back to Cliqz owners when customers using it. So I spent some time to play with it. I used mitmproxy, a free and opensource tool to monitor all internet activities from Cliqz.

Here are some summaries:

  1. You start Cliqz, it immediately connects to cliqz to retrieve and send some data.
  2. You search on google, the requestion is sent to cliqz at the same time.

Due to the data transparency statement from Cliqz, I can confirm that they do like they said. However, it is creepy to know that your online behaviors are spying online in real-time. Whatever you search are sent back to cliqz right away! 

I did the same test on Firefox but did not see the similar patter. This does not say that Firefox is better, just says that my method mentioned above does not work on Firefox. Or at least they do it in an implicite way.

So the question is back to Google Chrome. How is it with this Google – the big brother?


During the time I played with Cliqz and mitmproxy, i learnt about SSL spinning. A technique of embedding certificate inside the application, by doing this, web browser or application can detect man-in-the-middle attack (this is method of mitm – it installs a certificate), this technique is used by twitter, google search, but not on Bing. To by pass for research purpose,  is easy. I do not write the details here because Google has all the answer ().