Tuesday, March 28, 2017

Install/Upgrade NVIDI Driver in Ubuntu for CUDA SDK

Most linux distribution comes with the Nouveau https://nouveau.freedesktop.org/wiki/ display driver configured. If you need to use NVIDIA CUDA libray in you application, example OpenCV with GPU support, then you need to install NVIDIA proprietary driver.

There is a NVIDIA-Linux-x86_64-375.39.run file  from NVIDIA; I have never got the driver updated properly with this. Even if the driver is updated, the GUI never comes up; I guess this is something to do with configuring the Xorg with this.

The thing which works consistenly in Ubuntu is given below.
In ArchLLinux /Manjaro I tried https://wiki.archlinux.org/index.php/bumblebee but could not get the display configured properly

Step 1 (Ubuntu 16.04)

Go to Additional Drivers and Select the Using NVIDIA Driver. with this you will have the nvidia-smi utility also and check the driver version

Tue Mar 28 12:06:06 2017       +------------------------------------------------------+                       | NVIDIA-SMI 340.102    Driver Version: 340.102        |                       |-------------------------------+----------------------+----------------------+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||===============================+======================+======================||   0  GeForce GT 720M     Off  | 0000:01:00.0     N/A |                  N/A || N/A   50C    P0    N/A /  N/A |    570MiB /  2047MiB |     N/A      Default |+-------------------------------+----------------------+----------------------+                                                                               +-----------------------------------------------------------------------------+| Compute processes:                                               GPU Memory ||  GPU       PID  Process name                                     Usage      ||=============================================================================||    0            Not Supported                                               |+-----------------------------------------------------------------------------+

This Driver version 340 is not suitable to run CUDA applications.

For example assuming you have installed NVDIA CUDA SDK , and not updated the driver when prompted , your CUDA programs are going to faile with the error

h:~/Coding/opencv/build2/bin$ ./opencv_test_cudaarithm terminate called after throwing an instance of 'cv::Exception'  what():  /media/alex/LENOVO/Coding/opencv/sources/modules/core/src/cuda_info.cpp:85: error: (-217) CUDA driver version is insufficient for CUDA runtime version in function getDevice

Okay now we need to update driver; Please don't use any of the NVIDIA .run files to update the driver, as after that it will leave your display in an un-configured state usually

Example : Don't run $ sudo ./cuda_8.0.61_375.26_linux.run  -driver -silent or NVIDIA-Linux-x86_64-xx.xx.run, if you want a desktop or GUI.

Instead install from Ubuntu PPA repository.
sudo apt-add-repository ppa:graphics-drivers/ppasudo apt-get updatesudo apt-get install nvidia-375 

After this restart the machine and you may need to install and run nvidia-modprobe

Note : - If you are running on a server or if you don't need to configure the XServer with the driver you can use the NVIDIA run script.  Applicable for non Ubuntu systems.

Step 2 (Download and Install NVIDIA CUDA SDK)

Note: - I would suggest that you would use a docker container  instead of this: https://hub.docker.com/r/alexcpn/nvidia-opencv/tags/ the docker file is present in readme; or in github  - https://github.com/alexcpn/cuda_opencv.
You can build for your architecture or drop me a comment

From https://developer.nvidia.com/cuda-downloads for your machine (example  cuda_8.0.61_375.26_linux.run file)

It will ask you if you need to install the updated CUDA driver. Do not do that. If you are doing this from the GUI and select to install the driver it will fail as you need to stop the GUI. For that Cntl + Alt + F1 or F2) and sudo systemctl lightdm stop . (killing xserver will start it up again).After successful run you will get something like

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing with the name of this run file:

Good ,this is what we need. We don't want to install the driver

Now let us run nvidi-smi and check if your driver is updated

alex@alex-Lenovo-G400s-Touch:~$ nvidia-smiTue Mar 28 12:44:30 2017       +-----------------------------------------------------------------------------+| NVIDIA-SMI 375.39                 Driver Version: 375.39                    ||-------------------------------+----------------------+----------------------+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||===============================+======================+======================||   0  GeForce GT 720M     Off  | 0000:01:00.0     N/A |                  N/A || N/A   47C    P8    N/A /  N/A |     93MiB /  1985MiB |     N/A      Default |+-------------------------------+----------------------+----------------------+                                                                               +-----------------------------------------------------------------------------+| Processes:                                                       GPU Memory ||  GPU       PID  Type  Process name                               Usage      ||=============================================================================||    0                  Not Supported                                         |+-----------------------------------------------------------------------------+

it is updated, and hopefully you are in the GUI environment.  Don't mind if Additional Drivers in Ubuntu is showing that this driver is not used.

Assuming you have compiled OpenCV with CUDA , let us run some test program and see if things are working;


Here is the CMakeCache.txt file which you can reuse; Please change the architecture of the NVIDIA card from Fermi. CMakeCache.txt

Sunday, February 05, 2017

Programming with Apache Spark and Cassandra -draft

Putting the knowledge gained so far in this  and frequent questions that many may ask and what we have asked ourselves.
1.1         What is the need for using Spark ?
Spark gives you horizontal scale ability in a programmer friendly way.
1.2         But what about other options ?
There are other options as well. I have listed them below, which describes and highlights Sparks place in the architecture

Level of Granularity
Request Level
(usually HTTP requests)
Works well for Request-response type client server protocols. Works also well in context of microservices in application program side
However to scale the processing insdie the application programs this is inadequate
Task Managers
(celery, other MQ based)
Task Level
Helps to scale processing in the application program.Takes care of Task handling. However the onus is on the developer to split application logic to independent tasks. Usually
only the simplest things are really split into tasks. Equally hard problem is combining the outputs
Cluster Computing
(Apache Spark,Hadoop)
Application Level,Function
Helps to scale processing in the application layer across. Takes care of all the above. The onus is still on the developer to use this properly. However if the few API*, map, foreach,reduce and groupBy/partionBy are used , the programmer can be written as if it is is running in a single node, in a single thread. The system manages shared RAM across mutiple nodes, shared cores, task scheduling, multithreading etc. *P.S - Spark has an extensive library for machine learning as well,which could be the gateway for future
Function level
Helps to scale the processing inside a single node across nodes. Usually has to be done with care to avoid the complexity of threading related problems which many programmers are unaware
Green Threads
Funciton level/Stack level
Ex Greenlets in Python ; Good for switching stack in IO bound applications ; example socket server etc; Not really parallel, but wait time in one stack frame can be used by other stacks waiting to execute. Rather specific for general purpose usage

How stable is Apache Spark and Apache Cassandra ?
Speaking from our limited experience in running the prototype, all of the Spark and Casandra JVMs survived 20 days of load runs, network problems , application exceptions we threw at them.And that too in a low end cloud lab. Looks to be well written
1.5         What is the most important thing to take care when using Apache Cassandra ?
Data modelling and connected the Primary key and partition key design. It is important to design your primary key and the partition key so that write are distributed as well as read are faster. This is explained well by the Cassandra expert here -> http://www.planetcassandra.org/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key/
The hash of the partition key is used by Cassandra to identify the node in which to store. So choosing a partition key that distributes the load equally among nodes prevent write hotspots. Example can be seem in the performance run page
P,S - There are few trivial but important things , like writing commit log and data(SSTable)  in different partition. This link gives basic info about write path.
1.6         What is the most important thing to take care when using Apache Spark?
Have not come across as single important thing as such, but couple of pointers
1.    Avoid doing any major work in Spark driver , rdd.collect() or the more better rdd.toLocalIteraror() are not good ideas and don't scale; You get OOM error soon
2.    There is no way to share state like counters etc between driver and workers, though in the code it may seem so. Only way is via accumilators ; and there workers cannot read;
3.    The way you partition the RDD may be important for performance; esp for operation like group by etc ; need to test and understand this better

Saturday, February 04, 2017

Compiling OpenCV with CUDA (GPU) using CMake (GUI)

I have a tendency to choose the exact wrong thing every time when given a choice; I have sometimes wondered why.  A good thing with doing things almost wrong is that you get to learn about things.I have a feeling that doing things wrong and getting feedback and correcting is somehow fundamental in the way learning process happens.
However hot this article  will save you some time and some hair pulling. I started out with Windows and then Linux, but most thing are common.

Before we start, just a very short introduction into the why part,trust me just the bare essentials.

OpenCV operates on images , which in computers (at least the ones we have now) is stored a pixel matrices. Various algorithms that opencv provides, for example for object detection for example does a lot of matrix operations. These operations are 'embarrassingly parallel' -data parallel and could be speeded if executed in the GPU.

Now NVDIA GPU have an  parallel programming  API called CUDA which can help in speeding up matrix multiplication. And OpenCV has support for the same; to use it however you need to compile OpenCV with CUDA. CUDA is NVDIA proprietary and it would work with only NVDIA  GPUs.

There is an open API which should work with different typed of GPU cards and that is OpenCL. However it may not be that  tuned for a particular card through. OpenCV has support for OpenCL too; however we will for now use CUDA.

Finally one more thing; CUDA uses BLAS libraries. The CUDA SDK provided by NVDIA has the cublas libraries for it. Don't ask me why I chose to compile OpenBLAS for it; as I said before CMake gives a lot of choices and if you don't know as much as above, you are sure to do some totally unnecessary but very instructive things.

Okay now to to the how;

on Windows

First check if your PC or laptop has an NVDIA card. The easiest to do is via dxdiag windows utility

Now see if your card supports CUDA. There is a good utility from TechPowerUp GPU-Z that  will show this information among other like GPU load etc; which will help later to see if the programs are really using the GPU; Or you could check the NVDIA website for the Card and see if it supports; I guess most cards do; or you could check the very detailed page in wikipedia which lists the various generations of the processors  https://en.wikipedia.org/wiki/CUDA

Next step is to download the CUDA SDK from NVDIA.https://developer.nvidia.com/cuda-downloads; If you have a 64 bit system download the 64 bit SDK. Choose defaults and install it.

Then download the OpenCV source code from GIT and download CMake tool. You need to download MS Visual Studio Community edition for C++ compiler.

The main thing in correct compilation is to choose the right settings in CMake; First these are the minimum WITH variables needed to be configured

Miss few or mess with few and you will have lot of errors coming.
I tried guessing and removed and got lot of errors while running the program. WITH_CUDA is mandatory; If you need to see the videos image in GUI make sure to select WIN32UI and FFMPEG. I am still not sure if some are needed or why they are here. Please don't feel appaled; I learn this way; I have no clue initially and I learn to figure it out the hard way. It is something to do with being stupid.Why I removed the defaults was to cut down on the compile time from better half of the day to something more reasonable.

Then I found that the best way to reduce the compile time was to limit the architecture to the number I though the GPU card was supporting. In my case for GeForce GT 720M card in the CUDA wiki page the architecture code name was Fermi and compute capability was given as 2.1 . That did not work; so I gave 2.0 and I found compile time decreased considerably.

After that you Configure and make sure you select the 64 bit Visual Studio Compiler. Select 32 bit or do some other mistake and you will be led to lot of Configurations erros

If that is the case CMake will automatically select the 64 bit libraries from the CUDA SDK. Else it will try to take the 32 bit libraies and you may get configuration error about BLAS

With that you may be able to compile your OpenCV program . Note that when I used the default ARCH_BIN setting which goes all the way from 1 to 5 I got some linker errors -

Severity Code Description Project File Line Suppression State
Error LNK2019 unresolved external symbol __cudaRegisterLinkedBinary_54_tmpxft_000028d8_00000000_15_gpu_mat_compute_37_cpp1_ii_71482d89 referenced in function "void __cdecl __sti____cudaRegisterAll_54_tmpxft_000028d8_00000000_15_gpu_mat_compute_37_cpp1_ii_71482d89(void)" (?__sti____cudaRegisterAll_54_tmpxft_000028d8_00000000_15_gpu_mat_compute_37_cpp1_ii_71482d89@@YAXXZ) opencv_core D:\build\opencv2\modules\core\cuda_compile_generated_gpu_mat.cu.obj 1

For your program using  the above built OpenCV usually most of the libraries  given below are needed. If your build of OpenCV is proper you would get these many dlls in the output folder. If some are missing try to build it from Visual Studio


If you get include errors see the link


Finally check with GPU-Z and see if running the program is really using the GPU

Note for building your OpenCV solutions (1) using these libs and the following headers have to be added to OpenCV

(1) People detection example - https://gist.github.com/alexcpn/aeb8a4b8304639d8f91cc2fbc0c1c7df

Include Directories

C/C++ --> General --> Additional Include Directories

- D:\opencv\modules\calib3d\include;D:\opencv\modules\videoio\include;D:\opencv\modules\video\include;D:\opencv\modules\imgcodecs\include;D:\opencv\modules\cudaoptflow\include;D:\opencv\modules\cudastereo\include;D:\build\opencv4;d:\opencv\modules\core\include;D:\opencv\modules\cudawarping\include;D:\opencv\include;D:\opencv\modules\cudaobjdetect\include;D:\opencv\modules\cudaimgproc\include;D:\opencv\modules\imgproc\include;D:\opencv\modules\highgui\include;D:\opencv\modules\objdetect\include;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include

Note opencv2/opencv_modules.hpp is from the opencv build folder d:\build\opencv4\opencv2
and not from opencv git source; This dir should be in include path
 (d:\build\opencv4\ is the output directory specified in CMake)

Linker-->Input--> Additional Dependencies --> opencv_calib3d320.lib;opencv_core320.lib;opencv_features2d320.lib;opencv_flann320.lib;opencv_highgui320.lib;opencv_imgcodecs320.lib;opencv_imgproc320.lib;opencv_ml320.lib;opencv_objdetect320.lib;opencv_shape320.lib;opencv_ts320.lib;opencv_video320.lib;opencv_videoio320.lib;opencv_cudaimgproc320.lib;opencv_cudaarithm320.lib;opencv_cudabgsegm320.lib;opencv_cudacodec320.lib;opencv_cudalegacy320.lib;opencv_cudaobjdetect320.lib;opencv_cudawarping320.lib;opencv_cudev320.lib;opencv_cudafilters320.lib;%(AdditionalDependencies)

Lib Directories : - D:\build\opencv4\lib\Release

Note: This is not needed for Linux as the include will be already in /usr/include. Please see a sample Cmake file for linux with OpenCV

cmake_minimum_required(VERSION 2.8)
project( XXX )
find_package( OpenCV REQUIRED )
add_executable(XXX VideoTestHaar.cpp)
add_executable(YYY ColorTracker.cpp)
target_link_libraries( XXX ${OpenCV_LIBS} )
target_link_libraries( YYY  ${OpenCV_LIBS} )

Here is what I did to install the latest OpenCV in an x86 664 bit machine running Ubuntu

Update: Have updated the Docker file and image https://github.com/alexcpn/cuda_opencv/. Using this image will simplify your tasks.

    //  The below should be done at the beginning; I did not do this and got some broken package error above; so did it; you learn the hard way :)
     sudo apt-get -y update
     sudo apt-get -y upgrade
     sudo apt-get -y autoremove
//video codecs and other libs; these many are not given in opencv site but got this from some other blog; I am not sure what is the bare minimum ; Some in red are not needed. It also depends on how you confgure in mkae for OpenCV build
  sudo   apt-get install -y cmake  pkg-config \
 zlib1g-dev ffmpeg libwebp-dev \
 libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev \
 libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
# These extra does not seem to be needed;
# qt5-default  libtiff5-dev libopenexr-dev libgdal-dev   libdc1394-22-dev  libeigen3-dev 

In Ubuntu (16.04) make sure that you install the  NVIDIA  driver for the card. Check the latest driver version from Nvidia site for your card. Then add the relevant repository and install. Please follow this http://alexpunnen.blogspot.in/2017/03/installupgrade-nvidi-driver-in-ubuntu.html

sudo apt-add-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-3xx

sudo modeporbe nvidia (also ran this before restart)

Check via nvidia-smi command

alex@alex-Lenovo-G400s-Touch:~$ nvidia-smi
Tue Feb 28 15:10:50 2017    
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GT 720M     Off  | 0000:01:00.0     N/A |                  N/A |
| N/A   51C    P0    N/A /  N/A |    271MiB /  1985MiB |     N/A      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|    0                  Not Supported                                         |

Install samples and test via deviceQuery after making the samples --> http://xcat-docs.readthedocs.io/en/stable/advanced/gpu/nvidia/verify_cuda_install.html

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 720M"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 1985 MBytes (2081685504 bytes)
  ( 2) Multiprocessors, ( 48) CUDA Cores/MP:     96 CUDA Cores
  GPU Max Clock rate:                            1550 MHz (1.55 GHz)
  Memory Clock rate:                             900 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 131072 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GT 720M
Result = PASS

Monday, January 23, 2017

Best practises - Selenium WebDriver/ Java

Intermittent failure in Slenium Test cases ?
After clicking the drill down sometimes web-elements are not found causing all summary table test cases to fail.
Common Root Causes
1) Prefer Selection By.ID className  then By.cssSelectior and only if all else fails use By.Xpath
Selection by CSS By.CSSSelector should be preferred over XPath as this is more stable as it is natively supported by browser . XPAth is an abstraction provided by Selenium and not as performant. If XPath is used make sure that you have hand written the XPath and it is performant and not generic in that it has to to brute force search through the entire DOM to find your element.
Where ever possible , use By.ID, else By.CSSSelector and in case of no other option use XPath after proper testing.
You can use FireFinder plugin for FireFox (first add FireBug) to test your CSS or Xpath (if there is no way you can select by CSS)
For example this XPath to finding the drill down element 
//*[@id='scTableTest_Site-PLMN-PLMNMRBTS-255-sitecreation_netact1']/td/div/img [@src='/SiteCreation-Table-portlet/images/openDrillDown.png']
can be reduced to this more efficient CSS - #scTableTokyo-PLMN-PLMNMRBTS-400-sitecreation_netact1 > td  > div  > img +img
2. WebElement.click may not click if element is not visible in browser view port
If an  element is not visible in the browser viewport , clicking on it should not be possible and test case should fail. Earlier versions of webdriver used to do implicit scrolling. However this is not consistent and is being debated. It is better not to rely on this. One way to make sure the element is clicked is to use the Selenium feature of directly invoking JS on the browser
 ((JavascriptExecutor) driver).executeScript("arguments[0].click();", rowElement); 
 return ;

3. Design and model your Selenium Java Code
More often there is no structure or design applied to Unit test classes. This may be okay for JUnits testing Java classes, as the design of the Java class is reflected in the test cases. But when we write integration test cases or GUI test cases with JUnit or TestNG using Selenium WebDriver writing like this leads to un-maintainable and very brittle code. The application should be logically structured . This way there is no code duplication and code bloat which otherwise keeps on growing with IDs, CSS paths or xPath's everywhere
Some good links - PageObject Pattern
4. Dont Sleep-- for Long, if you need to, do sleep for a short time wake up check and retry ( Retry pattern)
Understand and use Selenium implicit waits (common for the whole webdriver instance) or explicit waits
Example - 
WebElement rowElement = (new WebDriverWait(driver, 10).until(
Note - Wait Retry pattern is very important for Stability; All finds should be retired at least three times as a rule of thumb. Depends also on your test case and modelling context as well
Implicit and Explicit wait: check these links
In case you have to sleep create a helper that sleeps for say 100 ms , checks and sleeps (while loop with a retry count ) 50 *100 = .5 seconds , so that if an element appears before , time can be saved

Python Profiling - Some hints

No time to compose fully; Here are some links which helped me in CPU profiling

Python Profilers


 python -m cProfile -o profile_out ANR_4G_IRAT.py

 pacman -S kdesdk-kcachegrind


 runsnakerun http://wiki.wxpython.org/How%20to%20install%20wxPython

 pstats dechipering cprofile output

 yappi https://code.google.com/archive/p/yappi/

 pyprof2calltree -k -i myscript.cprof

 performance tips python

Install/Upgrade NVIDI Driver in Ubuntu for CUDA SDK

Most linux distribution comes with the Nouveau https://nouveau.freedesktop.org/wiki/ display driver configured. If you need to use NVIDIA...