Real-Time Linux for TrackCam

From Mech
Revision as of 22:41, 20 June 2010 by JamesYeung (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Overview

Project by: James Yeung, Master in Electrical and Computer Engineering, 2010.
Last updated: June 11, 2010

The goal of this project was to install and set up an operating system with real-time capabilities to work with Photonfocus' TrackCam and SiliconSoftware's MicroEnable Frame Grabber. This is the continuation of earlier work done with the same hardware but on Windows instead.

So what is a real-time operating system? A key characteristic of a real-time OS is the level of its consistency concerning the amount of time it takes to accept and complete an application's task; the variability is jitter. A hard real-time operating system has less jitter than a soft real-time operating system. The chief design goal is not high throughput, but rather a guarantee of a soft or hard performance category. A real-time OS that can usually or generally meet a deadline is a soft real-time OS, but if it can meet a deadline deterministically it is a hard real-time OS.

Implementations of Real-Time Operating Systems

There are currently two main methods of implementing a real-time operating system, a micro-kernel approach and a scheduling approach. We will be using an operating system which implements the scheduling approach because it is the operating system that the makers of TrackCam supports.

Micro-Kernel

In the micro-kernel approach, there's simply a very small, simple, real-time operating system underneath the main operating system. The main operating system becomes a task run only when there is no real-time task to run, and the micro-kernel will pre-empt the main operating system whenever a real-time task needs the processor. RTAI and RTLinux (not to be confused with the linux-rt patch) are examples of such implementation.

Scheduling

In the scheduling approach, the operating system has a scheduling policy where when a task starts running, it continues to run until it voluntarily yields the processor, blocks or is preempted by a higher-priority real-time task. This is the first-in-first-out policy. Another common policy uses a timeslice model where tasks are allotted timeslices based on their priority and run until they exhaust their timeslice. The -rt Linux patch is an example of the scheduling implementation.

The scheduling in linux-rt has a total of 139 levels. The lower the level, the higher the priority it has. Level 100 to 139 maps to the -20 to 19 niceness levels. Nice is a program that allows you to manually set the priority of a particular process, but it only gives you access to the highest 40 levels. If you want access to the lower levels (higher priority levels), you will need to use the function "sched_setscheduler" in the "sched.h" library (see example code below).

Our Setup

Since SiliconSoftware provides drivers and support for the -rt Linux operating system, we will be using it for this project. Below is a list of hardware that we will be using.

Computer

  • CPU - Intel Core 2 Quad Q8400 (2.66 GHz)
  • RAM - Crucial 4 GB DDR2 800 (PC2 6400)
  • Motherboard - Supermicro MBD-C2SBE-O
  • Hard Drive - 500 GB Seagate Barracude 7200.12, 7200 rpm
  • Video Card - EVGA 256-P2-N768-Fr GeForce 8600 GTS

Camera

  • PhotonFocus MV-D1024-TrackCam
  • SiliconSoftware MicroEnable III Frame Grabber

How To Setup Linux-rt On openSUSE

Overview

For those who are not familiar with Linux, or how an operating system works, here is a quick run down on the basics. Linux is really just a kernel. A kernel is a piece of software that handles the interaction between hardware and applications. All the different kinds of "Linux" out there like Ubuntu, Fedora, openSUSE and Debian all have basically the same kernel, with a few tweaks here and there. The main difference between them is the GUI that is on top of the kernel, which gives them each their distinct look and feel. We will be using openSUSE because it has been tested by SiliconSoftware with their MicroEnable Frame Grabber.

Instructions

  1. Use an openSUSE Live CD to install a fresh copy of openSUSE.
    • In this guide, we will be using a 64-bit version.
    • Follow on screen instructions and note the root password that you set.
    • When prompted about partition setup, be sure to use file system format “ext3” for the swap and home partitions. The version of the kernel that we will be building does not properly support “ext4” (or higher).
  2. Check what is the newest version of the kernel that is supported by linux-rt and other applications/drivers that you will be using.
  3. Download the vanilla kernel and the linux-rt patch
  4. Unpack the packages
  5. >> tar -xvjf linux-2.6.24.7.tar.bz2
    >> bunzip2 patch-2.6.24.7-rt17.bz2
    
  6. Make symbolic link to new directory
  7. >> rm -f linux
    >> ln -fs linux-2.6.24.7 linux
    
  8. Copy the config file provided with the menable driver.
  9. >> cd linux
    >> cp /home/lims/Download/menable/menable_linuxdrv_3.9.10/ 2.6.24.7-rt17/CORE2_x86_64/.config .config
    
  10. Install needed packages
  11. >> zypper in make patch gcc gtk2 gtk2-devel libglade2 libglade2-devel glib2 glib2-devel mkinitrd
    
  12. Apply the linux-rt patch. (Note that p1 has a one, not L)
  13. >> cat ../patch-2.6.29.6-rt24 | patch -p1
    
  14. Make oldconfig
  15. >> make oldconfig
    
    • When prompted, use default by pressing enter.
  16. Configure the config file through gconfig
  17. >> make gconfig
    
    • Make sure the following sections are untouched:
      1. Processor type and features
        • Except you need to set "High Resolution Timer Support" to "Yes".
      2. Bus options
      3. Kernel hacking
    • If you know which modules are needed for your hardware, enable them.
    • If you don't know which modules are needed for your hardware, compile the kernel and error messages from your attempt to install will give you a better idea of which modules need to be enabled.
  18. Save and close out of gconfig
  19. Need to change “getline” to “parseline” in scripts/unifdef.c (There are 3 getline's)
    • Type “vi scripts/unifdef.c" to view and edit the file.
    • Type “/getline” to search for “getline”
    • Hit “i” to get into insert mode
    • Change “getline” to “parseline”
    • Hit the “Escape” key to get out of insert mode
    • Search again until all getlines are changed
    • Type “:wq” to save and quit
  20. Need to change “=r” to “=q” in arch/x86/boot/boot.h
    • Type “vi arch/x86/boot/boot.h”
    • Type “112” to get to line 112
    • Hit “i” to get into insert mode
    • Change “=r” to “=q”
    • Hit the “Escape” key to get out of insert mode
    • Type “:wq” to save and quit
  21. Compile & install kernel. (-j 4 is to use all 4 CPU cores, should take 15 minutes)
  22. >> make -j 4
    >> make -j 4 modules_install
    >> make -j 4 install
    
    • If you didn't enable all required modules, you will get error messages hinting which ones you need here.
  23. Make sure “fstab” has correct paths.
  24. >> cd /etc
    >> vi fstab
    
    • The first couple of lines should look something like this:
    • /dev/sda5 swap       swap     defaults            0 0
      /dev/sda6 /          ext3     acl,user_xattr      1 1
      /dev/sda7 /home      ext3     acl,user_xattr      1 2
      
    • If not, change the part after “/dev/” to “sda#” where # is the corresponding number to “-part#”.
  25. You'll need to change the grub config file to match the changes in fstab.
    • Grub is the software that controls which OS to boot into during boot up.
    • Files to configure Grub are located at /boot/grub/
    • You may also want to edit the menu.lst file to your desire.

How To Install MicroEnable Frame Grabber Drivers/Software

  • This guide assumes that you have downloaded and untarred the following files under /home/lims/Download/menable/
    • menable_linuxdrv_3.9.10.tar.bz2
    • siso-rt3-meIII-3.2.1-2.i586.rpm
    • siso-rt-basesystem-1.0.0-1.i586.rpm
  • We will be using menable_linuxdrv_3.9.10 with the 2.6.23.7-rt17 kernel.

Drivers

  1. Go into the root of the driver folder.
  2. >> cd /home/lims/Download/menable/menable_linuxdrv_3.9.10
    
  3. We need to copy the binary objects from the subdirectory matching our kernel and architecture.
  4. >> cp 2.6.24.7-rt17/CORE2_x86_64/* .
    
  5. Compile
  6. >> ./compile.sh
    
  7. Install the compiled driver
  8. >> insmod menable.ko
    
  9. Confirm the install
  10. >> dmesg | tail
    
    • The output should look like this:
    • 0000:00:19.0: eth0: Link is Up 100 Mbps Full Duplex, Flow Control: RX
      0000:00:19.0: eth0: 10/100 speed: disabling TSO
      ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
      martian source 255.255.255.255 from 129.105.69.13, on dev eth0
      ll header: ff:ff:ff:ff:ff:ff:d4:9a:20:d0:13:88:08:00
      eth0: no IPv6 routers present
      devkit-disks-da[3482]: segfault at 18 rip 41cb5e rsp 7fff78425850 error 4
      ACPI: PCI Interrupt 0000:11:00.0[A] -> GSI 20 (level, low) -> IRQ 20
      menable 0000:11:00.0: microEnable III card will be called menable0
      menable menable0: allocated dummy DMA area of 128 kiB
      

Software

  1. Go into the direction with the RPM packages.
  2. >> cd /home/lims/Download/menable
    
  3. Install the two rpm packages.
  4. >> rpm -i siso-rt-basesystem-1.0.0-1.i586.rpm
    >> rpm -i siso-rt3-meIII-3.2.1-2.i586.rpm
    
  5. Confirm the install
    • Use the following commands to see where the files should have been installed.
    • >> rpm -qlp siso-rt-basesystem-1.0.0-1.i586.rpm
      >> rpm -qlp siso-rt3-meIII-3.2.1-2.i586.rpm
      
    • Go into the directories and see if they are there.

How To Install OpenCV on Linux

  1. Make sure the following packages are installed.
    • gcc (version 4.x)
    • cmake (version 2.6 or higher)
    • pkg-config
  2. Download module “FindOpenCV.cmake” and add it to cmake.
  3. Download OpenCV source code, configure, compile and install.
  4. (Optional) Go through the “Hello World” tutorial to make sure it works.

Benchmarking

This section was an attempt to see how "real-time" the operating system is.

Cyclictest

Cyclictest is program available on the rt-wiki for testing the latency of commands on your operating system. All the code and instructions are available on their website.

http://rt.wiki.kernel.org/index.php/Cyclictest

Homemade Test

I found a "Hello World" code for real-time applications on the -rt Linux patch online. I then modified the code such that it would have the highest possible priority (1) and that it would take a snapshot of the timer every 10ms.

To compile: g++ main.cpp -o main -lrt

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
#include <sched.h>
#include <sys/mman.h>
#include <string.h>

#define MY_PRIORITY (1 /* we use 1 as the PRREMPT_RT use 50
                            as the priority of kernel tasklets
                            and interrupt handler by default */

#define MAX_SAFE_STACK (8*1024) /* The maximum stack size which is
                                   guranteed safe to access without
                                   faulting */

#define NSEC_PER_SEC (1000000000) /* The number of nsecs per sec. */

void stack_prefault(void) {
    unsigned char dummy[MAX_SAFE_STACK];
    memset(&dummy, 0, MAX_SAFE_STACK);
    return;
}

int main(int argc, char* argv[]) {

    //int time[1002];
    //int rc[1002];

    struct timespec t;
    struct timespec ti[1002];
    struct timespec tt[1002];
    struct timespec tf[1002];
    struct sched_param param;
    int interval = 10000000; /* 10ms*/

    /* Declare ourself as a real time task */

    param.sched_priority = MY_PRIORITY;
    if(sched_setscheduler(0, SCHED_FIFO, &param) == -1) {
        perror("sched_setscheduler failed");
        exit(-1);
    }

    /* Lock memory */
 
    if(mlockall(MCL_CURRENT|MCL_FUTURE) == -1) {
        perror("mlockall failed");
        exit(-2);
    }

    /* Pre-fault our stack */
 
    stack_prefault();

    clock_gettime(CLOCK_MONOTONIC ,&t);
    /* start after one second */
    t.tv_sec++;
 
    int ii = 0;
    while(ii < 1002) {
    
        tt[ii].tv_nsec = t.tv_nsec; //target time stamp
        clock_gettime(CLOCK_MONOTONIC ,&ti[ii]); //time before releasing CPU

        /* release CPU until target time stamp is reached */
        clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &t, NULL);
        
        /* do the stuff */
        clock_gettime(CLOCK_MONOTONIC ,&tf[ii]); //time right after waking up
        ii++;

        /* calculate next target */
        t.tv_nsec += interval;
        while (t.tv_nsec >= NSEC_PER_SEC) {
            t.tv_nsec -= NSEC_PER_SEC;
            t.tv_sec++;
        }
    }
    for(int ii = 1; ii < 1002; ii++){
        printf("%d %d %d %d\t%d\n", ti[ii].tv_nsec, tt[ii].tv_nsec, tf[ii].tv_nsec, tf[ii].tv_nsec - tt[ii].tv_nsec, tf[ii].tv_nsec - tf[ii-1].tv_nsec);
    }
}

PThread Test

This is a test that uses multiple threads where only one is set with a high priority and the rest are normal priority. The high priority thread is also reading one analog signal from a data acquisition card and writing one analog signal to an output card. In this particular test I had four normal priority threads, each continuously performing of the the basic simple math operation (+, -, *, /). This was then tested on a system with a quad core CPU. The four normal priority threads is to ensure that there are more than four active tasks for the CPU to handle. In a true setup, a total of two threads may be enough for your needs. More details about the cards used in this test can be found here.

The output of the results will be in the form:
interval = nanoseconds per interrupt
duration = number of times to trigger
starttime = system time of first trigger
maxDelay = maximum delay in microseconds experienced in this setup (>99 shows up as 99)
bins[x] = number of occurrences with x microseconds of delay in this setup
earlytime[for the xth trigger] = actual wakeup time (t = target wakeup time) value = microseconds of delay
badtime[xth >30us delay] = actual wakeup time (t = target wakeup time) value = microseconds of delay in this setup

You can download a sample of results here.

To compile: gcc main.c -o main -lrt -lnidaqmx

#define _REENTRANT
#define NSEC_PER_SEC (1000000000) /* The number of nsecs per sec. */
#define MY_PRIORITY (2) /* we use 49 as the PRREMPT_RT use 50
			as the priority of kernel tasklets
			and interrupt handler by default */
#define MAX_SAFE_STACK (8*1024) /* The maximum stack size which is
				guranteed safe to access without
				faulting */
#define DAQmxErrChk(functionCall) if(DAQmxFailed(error=functionCall)) goto Error; else
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sched.h>
#include <sys/mman.h>
#include <string.h>
#include <NIDAQmx.h>

/* function prototypes */
void* rtTask1( );
void* Task2( );
void* Task3( );
void* Task4( );
void* Task5( );

void stack_prefault(void) {
	unsigned char dummy[MAX_SAFE_STACK];
	memset(&dummy, 0, MAX_SAFE_STACK);
	return;
}

/* mutex variables */
pthread_mutex_t printfLock = PTHREAD_MUTEX_INITIALIZER;

int end2 = 0;
int end3 = 0;
int end4 = 0;
int end5 = 0;
double temp = 0;

int interval = 100000;		//100us "interrupt" interval
int duration = 10000*60*60*1;	//10khz for 1 hours

/* global variables for input/output */
TaskHandle inputTaskHandle=0;
TaskHandle outputTaskHandle=0;
int32 error=0;
int32 read;
float64 data;
char errBuff[2048]={'\0'};

int main( void ){
	pthread_t thr1, thr2, thr3, thr4, thr5;
	struct timespec t[14];

	/*********************************************/
	// DAQmx Configure Code
	/*********************************************/
	DAQmxErrChk(DAQmxCreateTask("AnalogIn",&inputTaskHandle));
	DAQmxErrChk(DAQmxCreateTask("AnalogOut",&outputTaskHandle));
	DAQmxErrChk(DAQmxCreateAIVoltageChan(inputTaskHandle,"AnalogIn/ai0","TestChannel",DAQmx_Val_RSE,-10.0,10.0,DAQmx_Val_Volts,NULL));
	DAQmxErrChk(DAQmxCreateAOVoltageChan(outputTaskHandle,"AnalogOut/ao6","",-10.0,10.0,DAQmx_Val_Volts,""));

	/*********************************************/
	// DAQmx Start Code
	/*********************************************/
	DAQmxErrChk(DAQmxStartTask(inputTaskHandle));
	DAQmxErrChk(DAQmxStartTask(outputTaskHandle));

	clock_gettime(CLOCK_MONOTONIC ,&t[0]);
	pthread_create( &thr1, NULL, rtTask1, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[1]);
	pthread_create( &thr2, NULL, Task2, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[2]);
	pthread_create( &thr3, NULL, Task3, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[3]);
	pthread_create( &thr4, NULL, Task4, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[4]);
	pthread_create( &thr5, NULL, Task5, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[5]);

	pthread_join( thr1, NULL );
	interval = 100000;		//100us "interrupt" interval
	duration = 10000*60*60*5;	//10khz for 5 hours
	clock_gettime(CLOCK_MONOTONIC ,&t[6]);
	pthread_create( &thr1, NULL, rtTask1, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[7]);

	pthread_join( thr1, NULL );
	interval = 200000;		//200us "interrupt" interval
	duration = 5000*60*60*5;	//5khz for 5 hours
	clock_gettime(CLOCK_MONOTONIC ,&t[8]);
	pthread_create( &thr1, NULL, rtTask1, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[9]);

	pthread_join( thr1, NULL );
	interval = 500000;		//500us "interrupt" interval
	duration = 2000*60*60*5;	//2khz for 5 hours
	clock_gettime(CLOCK_MONOTONIC ,&t[10]);
	pthread_create( &thr1, NULL, rtTask1, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[11]);

	pthread_join( thr1, NULL );
	interval = 1000000;		//1ms "interrupt" interval
	duration = 1000*60*60*5;	//1khz for 5 hours
	clock_gettime(CLOCK_MONOTONIC ,&t[12]);
	pthread_create( &thr1, NULL, rtTask1, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[13]);

	pthread_join( thr1, NULL );

	end2 = 1;
	end3 = 1;
	end4 = 1;
	end5 = 1;

	pthread_join( thr2, NULL );
	pthread_join( thr3, NULL );
	pthread_join( thr4, NULL );
	pthread_join( thr5, NULL );

	FILE *file;
	file = fopen("results","a+");

	int ii;
	for(ii = 1; ii < 14; ii++){
		fprintf(file, "Task%d %9d %9d\n", ii, t[ii].tv_nsec, t[ii].tv_nsec - t[ii-1].tv_nsec);
	}
	fprintf(file, "\n========================================\n");

	return 0;

Error:
	if( DAQmxFailed(error) ){
		DAQmxGetExtendedErrorInfo(errBuff,2048);
	}
	if( inputTaskHandle!=0 ){
		DAQmxStopTask(inputTaskHandle);
		DAQmxClearTask(inputTaskHandle);
	}
	if( outputTaskHandle!=0){
		DAQmxStopTask(outputTaskHandle);
		DAQmxClearTask(outputTaskHandle);
	}
	if( DAQmxFailed(error) ){
		printf("DAQmx Error: %s\n",errBuff);
	}
	return 0;
}

void* rtTask1(){

	int bins[100];
	int i, ii, jj;
	for( i = 0; i < 100; i++){
		bins[i] = 0;
	}
	struct timespec t;
	struct timespec timestamp;
	struct timespec starttime;
	struct timespec earlytime[100];
	struct timespec earlytime2[100];
	int earlyvalue[100];
	struct timespec badtime[10000];
	struct timespec badtime2[10000];
	int badvalue[10000];
	struct sched_param param;

	FILE *file;
	file = fopen("results","a+");

	/* Declare ourself as a real time task */
	param.sched_priority = MY_PRIORITY;
	if(sched_setscheduler(0, SCHED_FIFO, &param) == -1) {
		perror("sched_setscheduler failed");
		exit(-1);
	}

	/* Lock memory */
	if(mlockall(MCL_CURRENT|MCL_FUTURE) == -1) {
		perror("mlockall failed");
		exit(-2);
	}

	/* Pre-fault our stack */
	stack_prefault();

	clock_gettime(CLOCK_MONOTONIC,&t);
	/* start after one second */
	t.tv_sec++;
	starttime = t;
	jj = 0;

	for( i = 0; i < duration; ++i ) {
		/* wait until next shot */
		clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &t, NULL);
		clock_gettime(CLOCK_MONOTONIC ,&timestamp);
		if(timestamp.tv_sec == t.tv_sec){
			ii = (int)(((timestamp.tv_nsec - t.tv_nsec)/1000) - 1);
		}else{
			ii = (int)((((timestamp.tv_sec - t.tv_sec) * NSEC_PER_SEC - t.tv_nsec + timestamp.tv_nsec)/1000) - 1);
		}
		// store timestamps if latency > 30
		if(ii > 30){
			if(jj < 10000){
				badtime[jj] = timestamp;
				badtime2[jj] = t;
				badvalue[jj] = ii;
				jj++;
			}
		}
		// store timestamps for first 100 wakeups
		if(i < 100){
			earlytime[i] = timestamp;
			earlytime2[i] = t;
			earlyvalue[i] = ii;
		}
		// cap delay for bins[ii]
		if(ii > 99){
			ii = 99;
		}else if(ii < 0){
			ii = 0;
		}
		bins[ii]++;

		DAQmxErrChk(DAQmxReadAnalogScalarF64(inputTaskHandle,10.0,&data,NULL));
		DAQmxErrChk(DAQmxWriteAnalogScalarF64(outputTaskHandle,1,10.0,data,NULL));

		t.tv_nsec += interval;
		while (t.tv_nsec >= NSEC_PER_SEC) {
			t.tv_nsec -= NSEC_PER_SEC;
			t.tv_sec++;
		}
	}

	fprintf(file, "interval = %d\n", interval);
	fprintf(file, "duration = %d\n", duration);
	fprintf(file, "starttime = %ds\t%dns\n", starttime.tv_sec, starttime.tv_nsec);
	int maxDelay = 0;
	for( i = 0; i < 100; i++){
		if( bins[i] > 0 ){
			maxDelay = i;
		}
	}
	fprintf(file, "maxDelay = %d\n", maxDelay);
	for( i = 0; i < 100; i++){
		fprintf(file, "bins[%d] = %d\n", i, bins[i]);
	}
	for( i = 0; i < 100; i++){
		fprintf(file, "earlytime[%d] = %ds\t%dns (t = %ds\t%dns)\tvalue = %d\n", i, earlytime[i].tv_sec, earlytime[i].tv_nsec, earlytime2[i].tv_sec, earlytime2[i].tv_nsec, earlyvalue[i]);
	}
	for( i = 0; i < jj; i++){
		fprintf(file, "badtime[%d] = %ds\t%dns (t = %ds\t%dns)\tvalue = %d\n", i, badtime[i].tv_sec, badtime[i].tv_nsec, badtime2[i].tv_sec, badtime2[i].tv_nsec, badvalue[i]);
	}
	fprintf(file, "\n");
	fclose(file);
	return NULL;

Error:
	if( DAQmxFailed(error) ){
		DAQmxGetExtendedErrorInfo(errBuff,2048);
	}
	if( inputTaskHandle!=0 ){
		DAQmxStopTask(inputTaskHandle);
		DAQmxClearTask(inputTaskHandle);
	}
	if( outputTaskHandle!=0){
		DAQmxStopTask(outputTaskHandle);
		DAQmxClearTask(outputTaskHandle);
	}
	if( DAQmxFailed(error) ){
		printf("DAQmx Error: %s\n",errBuff);
	}
	return NULL;
}

void* Task2(){
	while( end2 == 0 ) {
		temp = temp + rand();
	}
	return NULL;
}

void* Task3(){
	while( end3 == 0 ) {
		temp = temp - rand();
	}
	return NULL;
}

void* Task4(){
	while( end4 == 0 ) {
		temp = temp * rand();
	}
	return NULL;
}

void* Task5(){
	while( end4 == 0 ) {
		temp = temp / rand();
	}
	return NULL;
}

Results

To ensure that the operating system is truly "real-time", I ran the two tests (Cyclictest and Homemade Test) under two conditions, load and no load. The no load condition is simply where except for system processes, only the test programs are running. The load condition is where I compile the source code of OpenCV and the Linux kernel using a niceness of -20.

As for the PThread Test, it was run under the no load condition.

Cyclictest

Using the Cyclictest program, I was able to get some very nice results.

Arguments:
t = number of threads to work
p = priority level to set
n = use nano_sleep
i = interval in microseconds
l = number of times to loop

Outputs:
T = ?
P = priority level
I = interval in microseconds
C = count (number of tasks performed, this number counts up during the test)
Min = minimum latency experienced
Act = current latency (this number changes during the test)
Avg = average latency experienced
Max = maximum latency experienced

  • No Load
    • Priority Level = 1
    • sudo cyclictest -t1 -p 1 -n -i 1000 -l 10000
      policy: fifo: loadavg: 1.14 0.46 0.18 1/291 5891          
      T: 0 ( 5791) P: 1 I:1000 C:  10000 Min:      1 Act:    1 Avg:    1 Max:     712
      
    • Priority Level = 2
    • sudo cyclictest -t1 -p 2 -n -i 1000 -l 10000
      policy: fifo: loadavg: 0.11 0.59 0.44 1/294 16748          
      T: 0 (16624) P: 2 I:1000 C:  10000 Min:      1 Act:    2 Avg:    1 Max:       5
      
    • Priority Level = 80
    • sudo cyclictest -t1 -p 80 -n -i 1000 -l 10000
      policy: fifo: loadavg: 0.19 0.66 0.46 1/293 16414           
      T: 0 (16289) P:80 I:1000 C:  10000 Min:      2 Act:    2 Avg:    2 Max:       8
      
  • Under Load
    • Priority Level = 1
    • sudo cyclictest -t1 -p 1 -n -i 1000 -l 10000
      policy: fifo: loadavg: 3.52 1.16 0.44 16/331 11374          
      T: 0 ( 8812) P: 1 I:1000 C:  10000 Min:      1 Act:    3 Avg:   10 Max:    3093
      
    • Priority Level = 2
    • sudo cyclictest -t1 -p 2 -n -i 1000 -l 10000
      policy: fifo: loadavg: 4.56 1.47 0.74 14/333 23801          
      T: 0 (22748) P: 2 I:1000 C:  10000 Min:      1 Act:    5 Avg:    6 Max:      17
      
    • Priority Level = 80
    • sudo cyclictest -t1 -p 80 -n -i 1000 -l 10000
      policy: fifo: loadavg: 5.40 2.38 1.15 12/332 28779          
      T: 0 (27402) P:80 I:1000 C:  10000 Min:      1 Act:    6 Avg:    5 Max:      14
      

The most important number to note in the results is the max because in order to assure hard real-time performance, the system should never have a latency greater than a certain threshold. That threshold would be determined by the frequency of the task that you are trying to perform. For example, if you wanted to perform a task at 100kHz and the worst case latency is 5us, then your task should take no more than 5us since there are only 10us between each consecutive tasks.

It is interesting to note that when using a priority level of 1 in Cyclictest, we get a much higher max than a priority level of 2 or 80. I still don't have a reasoning for this, but this phenomenon is also apparent in the Homemade Test and PThread Test.

Homemade Test

The results of this test can be summarized and better visualized in histograms. Below are two histograms from each of the two conditions. It is important to note that these tests were performed with priority level set at 1. The PThread test is a better measure of how real-time the system is.

Latencies between target time stamp and recorded time stamp under no-load conditions.
Intervals between consecutive time stamps under no-load conditions.
Latencies between target time stamp and recorded time stamp under load conditions.
Intervals between consecutive time stamps under load conditions.





























.

PThread Test

The PThread was run on four different frequencies, 1kHz, 2kHz, 5kHz and 10kHz. Each frequency was tested for 5 hours. I have decided to run the test with the priority level set at 2 due to the unexplained phenomenon of a higher max latency at priority level 1 experienced in the Cyclictest. The graphs below show a histogram of latencies experienced in microseconds. I only kept count of latencies of up to 100us because a latency of 100us or greater would be considered catastrophic. Furthermore, the maximum latency experienced at the start of the test are generally high, so I have decided to discard the first hour as warm up time for the system. However, a one second warm up will be enough in practice since the max latency is experienced within the first one millisecond. The results from test are shown below.

Linux-rt delay from wakeup target time at 1kHz.
Linux-rt delay from wakeup target time at 2kHz.
Linux-rt delay from wakeup target time at 5kHz.
Linux-rt delay from wakeup target time at 10kHz.


































.

Plotting Data in Real-Time

Here's a version of the PThread Test code that reads the encoder counts at 10kHz for 1 minute and then 5kHz at 1 minute. While it is reading, it will output motor angle at 10Hz. If you pipe this into the driveGnuPlotStream.pl script, it will plot the motor angle in real-time at 10Hz.

Click here to download or for more details about the driveGnuPlotStream.pl script.

To compile: gcc main.c -o main -lrt -lnidaqmx
To run:

sudo ./main | perl ./driveGnuPlotStreams.pl 1 1 50 25000 45000 500x300+0+0 'MotorAngle' 0

Note: You will need to have the script in the same directory.

#define _REENTRANT
#define NSEC_PER_SEC (1000000000) /* The number of nsecs per sec. */
#define MY_PRIORITY (2) /* we use 49 as the PRREMPT_RT use 50
			as the priority of kernel tasklets
			and interrupt handler by default */
#define MAX_SAFE_STACK (8*1024) /* The maximum stack size which is
				guranteed safe to access without
				faulting */
#define DAQmxErrChk(functionCall) if(DAQmxFailed(error=functionCall)) goto Error; else
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sched.h>
#include <sys/mman.h>
#include <string.h>
#include <NIDAQmx.h>
#include <math.h>

/* function prototypes */
void* rtTask1( );
void* Task2( );
void* Task3( );
void* Task4( );
void* Task5( );

void stack_prefault(void) {
	unsigned char dummy[MAX_SAFE_STACK];
	memset(&dummy, 0, MAX_SAFE_STACK);
	return;
}

/* mutex variables */
pthread_mutex_t printfLock = PTHREAD_MUTEX_INITIALIZER;

int end2 = 0;
int end3 = 0;
int end4 = 0;
int end5 = 0;
int aa = 0;
double temp = 0;

int interval = 100000;		//100us "interrupt" interval
int duration = 10000;		//10khz for 1 second

/* global variables for input/output */
TaskHandle encoderTaskHandle=0;
int32 error=0;
int32 read;
float64 motorAngle;
char errBuff[2048]={'\0'};

int main( void ){
	pthread_t thr1, thr2, thr3, thr4, thr5;
	struct timespec t[10];

	/*********************************************/
	// DAQmx Configure Code
	/*********************************************/
	DAQmxErrChk(DAQmxCreateTask("EncoderTask",&encoderTaskHandle));
	DAQmxErrChk(DAQmxCreateCIAngEncoderChan(encoderTaskHandle,"AnalogIn/ctr0","Counter",DAQmx_Val_X4,0,0.0,DAQmx_Val_AHighBHigh,DAQmx_Val_Degrees,600,36000.0,""));

	/*********************************************/
	// DAQmx Start Code
	/*********************************************/
	DAQmxErrChk(DAQmxStartTask(encoderTaskHandle));

	clock_gettime(CLOCK_MONOTONIC ,&t[0]);
	pthread_create( &thr1, NULL, rtTask1, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[1]);
	pthread_create( &thr2, NULL, Task2, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[2]);
	pthread_create( &thr3, NULL, Task3, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[3]);
	pthread_create( &thr4, NULL, Task4, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[4]);
	pthread_create( &thr5, NULL, Task5, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[5]);

	pthread_join( thr1, NULL );
	interval = 100000;		//100us "interrupt" interval
	duration = 10000*60;		//10khz for 1 minute
	clock_gettime(CLOCK_MONOTONIC ,&t[6]);
	pthread_create( &thr1, NULL, rtTask1, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[7]);

	pthread_join( thr1, NULL );
	interval = 200000;		//200us "interrupt" interval
	duration = 5000*60;		//5khz for 1 minute
	clock_gettime(CLOCK_MONOTONIC ,&t[8]);
	pthread_create( &thr1, NULL, rtTask1, NULL );
	clock_gettime(CLOCK_MONOTONIC ,&t[9]);

	pthread_join( thr1, NULL );

	end2 = 1;
	end3 = 1;
	end4 = 1;
	end5 = 1;

	pthread_join( thr2, NULL );
	pthread_join( thr3, NULL );
	pthread_join( thr4, NULL );
	pthread_join( thr5, NULL );

	FILE *file;
	file = fopen("encodertiming","a+");

	int ii;
	for(ii = 1; ii < 10; ii++){
		fprintf(file, "Task%d %9d %9d\n", ii, t[ii].tv_nsec, t[ii].tv_nsec - t[ii-1].tv_nsec);
	}
	fprintf(file, "\n========================================\n");

	return 0;

Error:
	if( DAQmxFailed(error) ){
		DAQmxGetExtendedErrorInfo(errBuff,2048);
	}
	if( encoderTaskHandle!=0 ){
		DAQmxStopTask(encoderTaskHandle);
		DAQmxClearTask(encoderTaskHandle);
	}
	if( DAQmxFailed(error) ){
		printf("DAQmx Error: %s\n",errBuff);
	}
	return 0;
}

void* rtTask1(){

	int bins[100];
	int i, ii, jj;
	for( i = 0; i < 100; i++){
		bins[i] = 0;
	}
	struct timespec t;
	struct timespec timestamp;
	struct timespec starttime;
	struct timespec earlytime[100];
	struct timespec earlytime2[100];
	int earlyvalue[100];
	struct timespec badtime[10000];
	struct timespec badtime2[10000];
	int badvalue[10000];
	struct sched_param param;

	FILE *file;
	file = fopen("encodertiming","a+");
	int plotInterval = NSEC_PER_SEC / interval / 10;

	/* Declare ourself as a real time task */
	param.sched_priority = MY_PRIORITY;
	if(sched_setscheduler(0, SCHED_FIFO, &param) == -1) {
		perror("sched_setscheduler failed");
		exit(-1);
	}

	/* Lock memory */
	if(mlockall(MCL_CURRENT|MCL_FUTURE) == -1) {
		perror("mlockall failed");
		exit(-2);
	}

	/* Pre-fault our stack */
	stack_prefault();

	clock_gettime(CLOCK_MONOTONIC,&t);
	/* start after one second */
	t.tv_sec++;
	starttime = t;
	jj = 0;

	for( i = 0; i < duration; ++i ) {
		/* wait until next shot */
		clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &t, NULL);
		clock_gettime(CLOCK_MONOTONIC ,&timestamp);
		if(timestamp.tv_sec == t.tv_sec){
			ii = (int)(((timestamp.tv_nsec - t.tv_nsec)/1000) - 1);
		}else{
			ii = (int)((((timestamp.tv_sec - t.tv_sec) * NSEC_PER_SEC - t.tv_nsec + timestamp.tv_nsec)/1000) - 1);
		}
		// store timestamps if latency > 30
		if(ii > 30){
			if(jj < 10000){
				badtime[jj] = timestamp;
				badtime2[jj] = t;
				badvalue[jj] = ii;
				jj++;
			}
		}
		// store timestamps for first 100 wakeups
		if(i < 100){
			earlytime[i] = timestamp;
			earlytime2[i] = t;
			earlyvalue[i] = ii;
		}
		// cap delay for bins[ii]
		if(ii > 99){
			ii = 99;
		}else if(ii < 0){
			ii = 0;
		}
		bins[ii]++;

		DAQmxErrChk(DAQmxReadCounterF64(encoderTaskHandle,1,10.0,&motorAngle,1,&read,NULL));

		if(i % plotInterval == 0){
			printf("0:%f\n",motorAngle);
			fflush(stdout);
		}

		t.tv_nsec += interval;
		while (t.tv_nsec >= NSEC_PER_SEC) {
			t.tv_nsec -= NSEC_PER_SEC;
			t.tv_sec++;
		}
	}

	fprintf(file, "interval = %d\n", interval);
	fprintf(file, "duration = %d\n", duration);
	fprintf(file, "starttime = %ds\t%dns\n", starttime.tv_sec, starttime.tv_nsec);
	int maxDelay = 0;
	for( i = 0; i < 100; i++){
		if( bins[i] > 0 ){
			maxDelay = i;
		}
	}
	fprintf(file, "maxDelay = %d\n", maxDelay);
	for( i = 0; i < 100; i++){
		fprintf(file, "bins[%d] = %d\n", i, bins[i]);
	}
	for( i = 0; i < 100; i++){
		fprintf(file, "earlytime[%d] = %ds\t%dns (t = %ds\t%dns)\tvalue = %d\n", i, earlytime[i].tv_sec, earlytime[i].tv_nsec, earlytime2[i].tv_sec, earlytime2[i].tv_nsec, earlyvalue[i]);
	}
	for( i = 0; i < jj; i++){
		fprintf(file, "badtime[%d] = %ds\t%dns (t = %ds\t%dns)\tvalue = %d\n", i, badtime[i].tv_sec, badtime[i].tv_nsec, badtime2[i].tv_sec, badtime2[i].tv_nsec, badvalue[i]);
	}
	fprintf(file, "\n");
	fclose(file);
	return NULL;

Error:
	if( DAQmxFailed(error) ){
		DAQmxGetExtendedErrorInfo(errBuff,2048);
	}
	if( encoderTaskHandle!=0 ){
		DAQmxStopTask(encoderTaskHandle);
		DAQmxClearTask(encoderTaskHandle);
	}
	if( DAQmxFailed(error) ){
		printf("DAQmx Error: %s\n",errBuff);
	}
	return NULL;
}

void* Task2(){
	while( end2 == 0 ) {
		temp = temp + rand();
	}
	return NULL;
}

void* Task3(){
	while( end3 == 0 ) {
		temp = temp - rand();
	}
	return NULL;
}

void* Task4(){
	while( end4 == 0 ) {
		temp = temp * rand();
	}
	return NULL;
}

void* Task5(){
	while( end4 == 0 ) {
		temp = temp / rand();
	}
	return NULL;
}

Future Work

Now that we have an operating system with sub 100us latency, our next step is to port the code for the camera over to the Linux side. Once that is done, we would like to be able to capture images at 1000fps or better and track the location of bright spots in the image.

Useful Links

Micro-Kernel Approach

Scheduling Approach

Wiki on -rt Linux

A "Hello World" Example

Personal tools