Main Archive Specials IRC Wiki | FAQ Links Submit Forum
Search

Analysis

Effects

Filters

Other

Synthesis

All

Entries in this category

VST SDK GUI Switch without
16-Point Fast Integer Sinc Interpolator.
16-to-8-bit first-order dither
3rd order Spline interpollation
5-point spline interpollation
Allocating aligned memory
Antialiased Lines
Automatic PDC system
Base-2 exp
Bit-Reversed Counting
Block/Loop Benchmarking
Branchless Clipping
Calculate notes (java)
Center separation in a stereo mixdown
Center separation in a stereo mixdown
Cheap pseudo-sinusoidal lfo
Clipping without branching
Constant-time exponent of 2 detector
Conversion and normalization of 16-bit sample to a floating point number
Conversions on a PowerPC
Copy-protection schemes
Cubic interpollation
Cure for malicious samples
Denormal DOUBLE variables, macro
Denormal numbers
Denormal numbers, the meta-text
Denormalization preventer
Denormalization preventer
Dither code
Dithering
Double to Int
Envelope Follower
Exponential curve for
Exponential parameter mapping
fast abs/neg/sign for 32bit floats
Fast binary log approximations
Fast cube root, square root, and reciprocal for x86/SSE CPUs.
fast exp() approximations
Fast exp2 approximation
Fast Float Random Numbers
Fast log2
fast power and root estimates for 32bit floats
Fast rounding functions in pascal
Fast sign for 32 bit floats
Float to int
Float to int (more intel asm)
Float to integer conversion
Float-to-int, coverting an array of floats
Gaussian dithering
Gaussian random numbers
Hermite Interpolator (x86 ASM)
Hermite interpollation
Linear interpolation
Lock free fifo
Matlab Time Domain Impulse Response Inverter/Divider
MATLAB-Tools for SNDAN
MIDI note/frequency conversion
Millimeter to DB (faders...)
Motorola 56300 Disassembler
Noise Shaping Class
Nonblocking multiprocessor/multithread algorithms in C++
Piecewise quadratic approximate exponential function
please add it as a comment to the Denormalization preventer entry (no comments are allowed now) thanks
pow(x,4) approximation
rational tanh approximation
Reading the compressed WA! parts in gigasampler files
Real basic DSP with Matlab (+ GUI) ...
real value vs display value
Really fast x86 floating point sin/cos
Reasonably accurate/fastish tanh approximation
resampling
Saturation
Sin(x) Aproximation (with SSE code)
Sin, Cos, Tan approximation

VST SDK GUI Switch without

References : Posted by quintosardo[AT]yahoo[DOT]it

Notes :
In VST GUI an on-vaue is represented by 1.0 and off by 0.0.

Code :
Say you have two signals you want to switch between when the user changes a switch.
You could do:

if(fSwitch == 0.f) //fSwitch is either 0.0 or 1.0
    output = input1
else
    output = input2

However, you can avoid the branch by doing:

output = input1 * (1.f - fSwitch) + input2 * fSwitch

Which would be like a quick mix. You could make the change clickless by adding a simple one-pole filter:

smooth = filter(fSwitch)
output = input1 * (1.f - smooth) + input2 * smooth


4 comment(s) | add a comment | nofrills version


16-Point Fast Integer Sinc Interpolator.

References : Posted by mumart[at]gmail[dot]com

Notes :
This is designed for fast upsampling with good quality using only a 32-bit accumulator. Sound quality is very good. Conceptually it resamples the input signal 32768x and performs nearest-neighbour to get the requested sample rate. As a result downsampling will result in aliasing.

The provided Sinc table is Blackman-Harris windowed with a slight lowpass. The table entries are 16-bit and are 16x linear-oversampled. It should be pretty easy to figure out how to make your own table for it.

Code provided is in Java. Converting to C/MMX etc. should be pretty trivial.

Remember the interpolator requires a number of samples before and after the sample to be interpolated, so you can't resample the whole of a passed input buffer in one go.

Have fun,
Martin


Code :

public class SincResampler {
    private final int FP_SHIFT = 15;
    private final int FP_ONE = 1 << FP_SHIFT;
    private final int FP_MASK = FP_ONE - 1;


    private final int POINT_SHIFT = 4; // 16 points

    private final int OVER_SHIFT = 4; // 16x oversampling

    private final short[] table = {

         0, -7,  27, -71,  142, -227,  299,  32439,   299,  -227,  142,  -71,  27,  -7,  0,  0,

         0,  0,  -5,  36, -142,  450, -1439, 32224,  2302,  -974,  455, -190,  64, -15,  2,  0,

         0,  6, -33, 128, -391, 1042, -2894, 31584,  4540, -1765,  786, -318, 105, -25,  3,  0,

         0, 10, -55, 204, -597, 1533, -4056, 30535,  6977, -2573, 1121, -449, 148, -36,  5,  0,

        -1, 13, -71, 261, -757, 1916, -4922, 29105,  9568, -3366, 1448, -578, 191, -47,  7,  0,

        -1, 15, -81, 300, -870, 2185, -5498, 27328, 12263, -4109, 1749, -698, 232, -58,  9,  0,

        -1, 15, -86, 322, -936, 2343, -5800, 25249, 15006, -4765, 2011, -802, 269, -68, 10,  0,

        -1, 15, -87, 328, -957, 2394, -5849, 22920, 17738, -5298, 2215, -885, 299, -77, 12,  0,

         0, 14, -83, 319, -938, 2347, -5671, 20396, 20396, -5671, 2347, -938, 319, -83, 14,  0,

         0, 12, -77, 299, -885, 2215, -5298, 17738, 22920, -5849, 2394, -957, 328, -87, 15, -1,

         0, 10, -68, 269, -802, 2011, -4765, 15006, 25249, -5800, 2343, -936, 322, -86, 15, -1,

         0,  9, -58, 232, -698, 1749, -4109, 12263, 27328, -5498, 2185, -870, 300, -81, 15, -1,

         0,  7, -47, 191, -578, 1448, -3366,  9568, 29105, -4922, 1916, -757, 261, -71, 13, -1,

         0,  5, -36, 148, -449, 1121, -2573,  6977, 30535, -4056, 1533, -597, 204, -55, 10,  0,

         0,  3, -25, 105, -318,  786, -1765,  4540, 31584, -2894, 1042, -391, 128, -33,  6,  0,

         0,  2, -15,  64, -190,  455,  -974,  2302, 32224, -1439,  450, -142,  36,  -5,  0,  0,

         0,  0,  -7,  27,  -71,  142,  -227,   299, 32439,   299, -227,  142, -71,  27, -7,  0

    };



    /*

    private final int POINT_SHIFT = 1; // 2 points

    private final int OVER_SHIFT = 0; // 1x oversampling

    private final short[] table = {    

        32767,     0,

        0    , 32767

    };

    */



    private final int POINTS = 1 << POINT_SHIFT;

    private final int INTERP_SHIFT = FP_SHIFT - OVER_SHIFT;

    private final int INTERP_BITMASK = ( 1 << INTERP_SHIFT ) - 1;


    /*
        input - array of input samples
        inputPos - sample position ( must be at least POINTS/2 + 1, ie. 7 )
        inputFrac - fractional sample position ( 0 <= inputFrac < FP_ONE )
        step - number of input samples per output sample * FP_ONE
        lAmp - left output amplitude ( 1.0 = FP_ONE )
        lBuf - left output buffer
        rAmp - right output amplitude ( 1.0 = FP_ONE )
        rBuf - right output buffer    
        pos - offset into output buffers
        count - number of output samples to produce
    */

    public void resample( short[] input, int inputPos, int inputFrac, int step,

            int lAmp, int[] lBuf, int rAmp, int[] rBuf, int pos, int count ) {

        for( int p = 0; p < count; p++ ) {

            int tabidx1 = ( inputFrac >> INTERP_SHIFT ) << POINT_SHIFT;

            int tabidx2 = tabidx1 + POINTS;

            int bufidx = inputPos - POINTS/2 + 1;

            int a1 = 0, a2 = 0;

            for( int t = 0; t < POINTS; t++ ) {

                a1 += table[ tabidx1++ ] * input[ bufidx ] >> 15;

                a2 += table[ tabidx2++ ] * input[ bufidx ] >> 15;

                bufidx++;

            }

            int out = a1 + ( ( a2 - a1 ) * ( inputFrac & INTERP_BITMASK ) >> INTERP_SHIFT );

            lBuf[ pos ] += out * lAmp >> FP_SHIFT;

            rBuf[ pos ] += out * rAmp >> FP_SHIFT;

            pos++;

            inputFrac += step;


            inputPos += inputFrac >> FP_SHIFT;

            inputFrac &= FP_MASK;

        }

    }

}



no comments on this item | add a comment | nofrills version


16-to-8-bit first-order dither

Type : First order error feedforward dithering code
References : Posted by Jon Watte

Notes :
This is about as simple a dithering algorithm as you can implement, but it's likely to sound better than just truncating to N bits.

Note that you might not want to carry forward the full difference for infinity. It's probably likely that the worst performance hit comes from the saturation conditionals, which can be avoided with appropriate instructions on many DSPs and integer SIMD type instructions, or CMOV.

Last, if sound quality is paramount (such as when going from > 16 bits to 16 bits) you probably want to use a higher-order dither function found elsewhere on this site.


Code :
// This code will down-convert and dither a 16-bit signed short
// mono signal into an 8-bit unsigned char signal, using a first
// order forward-feeding error term dither.

#define uchar unsigned char

void dither_one_channel_16_to_8( short * input, uchar * output, int count, int * memory )
{
  int m = *memory;
  while( count-- > 0 ) {
    int i = *input++;
    i += m;
    int j = i + 32768 - 128;
    uchar o;
    if( j < 0 ) {
      o = 0;
    }
    else if( j > 65535 ) {
      o = 255;
    }
    else {
      o = (uchar)((j>>8)&0xff);
    }
    m = ((j-32768+128)-i);
    *output++ = o;
  }
  *memory = m;
}


no comments on this item | add a comment | nofrills version


3rd order Spline interpollation

References : Posted by Dave from Muon Software, originally from Josh Scholar

Notes :
(from Joshua Scholar about Spline interpollation in general...)
According to sampling theory, a perfect interpolation could be found by replacing each sample with a sinc function centered on that sample, ringing at your target nyquest frequency, and at each target point you just sum all of contributions from the sinc functions of every single point in source.
The sinc function has ringing that dies away very slowly, so each target sample will have to have contributions from a large neighborhood of source samples. Luckily, by definition the sinc function is bandwidth limited, so once we have a source that is prefilitered for our target nyquest frequency and reasonably oversampled relative to our nyquest frequency, ordinary interpolation techniques are quite fruitful even though they would be pretty useless if we hadn't oversampled.

We want an interpolation routine that at very least has the following characteristics:

1. Obviously it's continuous. But since finite differencing a signal (I don't really know about true differentiation) is equivalent to a low frequency attenuator that drops only about 6 dB per octave, continuity at the higher derivatives is important too.

2. It has to be stiff enough to find peaks when our oversampling missed them. This is where what I said about the combination the sinc function's limited bandwidth and oversampling making interpolation possible comes into play.

I've read some papers on splines, but most stuff on splines relates to graphics and uses a control point descriptions that is completely irrelevant to our sort of interpolation. In reading this stuff I quickly came to the conclusion that splines:

1. Are just piecewise functions made of polynomials designed to have some higher order continuity at the transition points.

2. Splines are highly arbitrary, because you can choose arbitrary derivatives (to any order) at each transition. Of course the more you specify the higher order the polynomials will be.

3. I already know enough about polynomials to construct any sort of spline. A polynomial through 'n' points with a derivative specified at 'm[1]' points and second derivatives specified at 'm[2]' points etc. will be a polynomial of the order n-1+m[1]+m[2]...

A way to construct third order splines (that admittedly doesn't help you construct higher order splines), is to linear interpolate between two parabolas. At each point (they are called knots) you have a parabola going through that point, the previous and the next point. Between each point you linearly interpolate between the polynomials for each point. This may help you imagine splines.

As a starting point I used a polynomial through 5 points for each knot and used MuPad (a free Mathematica like program) to derive a polynomial going through two points (knots) where at each point it has the same first two derivatives as a 4th order polynomial through the surrounding 5 points. My intuition was that basing it on polynomials through 3 points wouldn't be enough of a neighborhood to get good continuity. When I tested it, I found that not only did basing it on 5 point polynomials do much better than basing it on 3 point ones, but that 7 point ones did nearly as badly as 3 point ones. 5 points seems to be a sweet spot.

However, I could have set the derivatives to a nearly arbitrary values - basing the values on those of polynomials through the surrounding points was just a guess.

I've read that the math of sampling theory has different interpretation to the sinc function one where you could upsample by making a polynomial through every point at the same order as the number of points and this would give you the same answer as sinc function interpolation (but this only converges perfectly when there are an infinite number of points). Your head is probably spinning right now - the only point of mentioning that is to point out that perfect interpolation is exactly as stiff as a polynomial through the target points of the same order as the number of target points.


Code :
//interpolates between L0 and H0 taking the previous (L1) and next (H1)
points into account
inline float ThirdInterp(const float x,const float L1,const float L0,const
float H0,const float H1)
{
    return
    L0 +
    .5f*
    x*(H0-L1 +
       x*(H0 + L0*(-2) + L1 +
          x*( (H0 - L0)*9 + (L1 - H1)*3 +
             x*((L0 - H0)*15 + (H1 -  L1)*5 +
                x*((H0 - L0)*6 + (L1 - H1)*2 )))));
}


6 comment(s) | add a comment | nofrills version


5-point spline interpollation

Type : interpollation
References : Joshua Scholar, posted by David Waugh
Code :
//nMask = sizeofwavetable-1 where sizeofwavetable is a power of two.
double interpolate(double* wavetable, int nMask, double location)
{
    /* 5-point spline*/

    int nearest_sample = (int) location;
    double x = location - (double) nearest_sample;

    double p0=wavetable[(nearest_sample-2)&nMask];
    double p1=wavetable[(nearest_sample-1)&nMask];
    double p2=wavetable[nearest_sample];
    double p3=wavetable[(nearest_sample+1)&nMask];
    double p4=wavetable[(nearest_sample+2)&nMask];
    double p5=wavetable[(nearest_sample+3)&nMask];

    return p2 + 0.04166666666*x*((p3-p1)*16.0+(p0-p4)*2.0
    + x *((p3+p1)*16.0-p0-p2*30.0- p4
    + x *(p3*66.0-p2*70.0-p4*33.0+p1*39.0+ p5*7.0- p0*9.0
    + x *( p2*126.0-p3*124.0+p4*61.0-p1*64.0- p5*12.0+p0*13.0
    + x *((p3-p2)*50.0+(p1-p4)*25.0+(p5-p0)*5.0)))));
};


4 comment(s) | add a comment | nofrills version


Allocating aligned memory

Type : memory allocation
References : Posted by Benno Senoner

Notes :
we waste up to align_size + sizeof(int) bytes when we alloc a memory area.
We store the aligned_ptr - unaligned_ptr delta in an int located before the aligned area.
This is needed for the free() routine since we need to free all the memory not only the aligned area.
You have to use aligned_free() to free the memory allocated with aligned_malloc() !


Code :
/* align_size has to be a power of two !! */
void *aligned_malloc(size_t size, size_t align_size) {

  char *ptr,*ptr2,*aligned_ptr;
  int align_mask = align_size - 1;

  ptr=(char *)malloc(size + align_size + sizeof(int));
  if(ptr==NULL) return(NULL);

  ptr2 = ptr + sizeof(int);
  aligned_ptr = ptr2 + (align_size - ((size_t)ptr2 & align_mask));


  ptr2 = aligned_ptr - sizeof(int);
  *((int *)ptr2)=(int)(aligned_ptr - ptr);

  return(aligned_ptr);
}

void aligned_free(void *ptr) {

  int *ptr2=(int *)ptr - 1;
  ptr -= *ptr2;
  free(ptr);
}


no comments on this item | add a comment | nofrills version


Antialiased Lines

Type : A slow, ugly, and unoptimized but short method to perform antialiased lines in a framebuffer
References : Posted by arguru[AT]smartelectronix[DOT]com

Notes :
Simple code to perform antialiased lines in a 32-bit RGBA (1 byte/component) framebuffer.

pframebuffer <- unsigned char* to framebuffer bytes (important: Y flipped line order! [like in the way Win32 CreateDIBSection works...])

client_height=framebuffer height in lines
client_width=framebuffer width in pixels (not in bytes)

This doesnt perform any clip checl so it fails if coordinates are set out of bounds.

sorry for the engrish



Code :
//
// By Arguru
//
void PutTransPixel(int const x,int const y,UCHAR const r,UCHAR const g,UCHAR const b,UCHAR const a)
{
    unsigned char* ppix=pframebuffer+(x+(client_height-(y+1))*client_width)*4;
    ppix[0]=((a*b)+(255-a)*ppix[0])/256;
    ppix[1]=((a*g)+(255-a)*ppix[1])/256;
    ppix[2]=((a*r)+(255-a)*ppix[2])/256;
}

void LineAntialiased(int const x1,int const y1,int const x2,int const y2,UCHAR const r,UCHAR const g,UCHAR const b)
{
    // some useful constants first
    double const dw=x2-x1;
    double const dh=y2-y1;
    double const slx=dh/dw;
    double const sly=dw/dh;

    // determine wichever raster scanning behaviour to use
    if(fabs(slx)<1.0)
    {
        // x scan
        int tx1=x1;
        int tx2=x2;
        double raster=y1;

        if(x1>x2)
        {
            tx1=x2;
            tx2=x1;
            raster=y2;
        }

        for(int x=tx1;x<=tx2;x++)
        {
            int const ri=int(raster);

            double const in_y0=1.0-(raster-ri);
            double const in_y1=1.0-(ri+1-raster);

            PutTransPixel(x,ri+0,r,g,b,in_y0*255.0);
            PutTransPixel(x,ri+1,r,g,b,in_y1*255.0);

            raster+=slx;
        }
    }
    else
    {
        // y scan
        int ty1=y1;
        int ty2=y2;
        double raster=x1;

        if(y1>y2)
        {
            ty1=y2;
            ty2=y1;
            raster=x2;
        }

        for(int y=ty1;y<=ty2;y++)
        {
            int const ri=int(raster);

            double const in_x0=1.0-(raster-ri);
            double const in_x1=1.0-(ri+1-raster);

            PutTransPixel(ri+0,y,r,g,b,in_x0*255.0);
            PutTransPixel(ri+1,y,r,g,b,in_x1*255.0);

            raster+=sly;
        }
    }
}


7 comment(s) | add a comment | nofrills version


Automatic PDC system

Type : the type that actually works, completely
References : Posted by Tebello Thejane
Linked file : pdc.pdf

Notes :
No, people, implementing PDC is actually not as difficult as you might think it is.

This paper presents a solution to the problem of latency inherent in audio effects processors, and the two appendices give examples of the method being applied on Cubase SX (with an example which its native half-baked PDC fails to solve properly) as well as a convoluted example in FL Studio (taking advantage of the flexible routing capabilities introduced in version 6 of the software). All that's necessary to understand it is a grasp of basic algebra and an intermediate understanding of how music production software works (no need to understand the Laplace transform, linear processes, sigma and integral notation... YAY!).

Please do send me any feedback (kudos, errata, flames, job offers, questions, comments) you might have - my email address is included in the paper - or simply use musicdsp.org's own commenting system.

Tebello Thejane.


Code :
(I have sent the PDF to Bram as he suggested)

3 comment(s) | add a comment | nofrills version


Base-2 exp

References : Posted by Laurent de Soras

Notes :
Linear approx. between 2 integer values of val. Uses 32-bit integers. Not very efficient but fastest than exp()
This code was designed for x86 (little endian), but could be adapted for big endian processors.
Laurent thinks you just have to change the (*(1 + (int *) &ret)) expressions and replace it by (*(int *) &ret). However, He didn't test it.


Code :
inline double fast_exp2 (const double val)
{
   int    e;
   double ret;

   if (val >= 0)
   {
      e = int (val);
      ret = val - (e - 1);
      ((*(1 + (int *) &ret)) &= ~(2047 << 20)) += (e + 1023) << 20;
   }
   else
   {
      e = int (val + 1023);
      ret = val - (e - 1024);
      ((*(1 + (int *) &ret)) &= ~(2047 << 20)) += e << 20;
   }
   return (ret);
}


1 comment(s) | add a comment | nofrills version


Bit-Reversed Counting

References : Posted by mailbjl[AT]yahoo[DOT]com

Notes :
Bit-reversed ordering comes up frequently in FFT implementations. Here is a non-branching algorithm (given in C) that increments the variable "s" bit-reversedly from 0 to N-1, where N is a power of 2.

Code :
int r = 0;        // counter
int s = 0;        // bit-reversal of r/2
int N = 256;      // N can be any power of 2
int N2 = N << 1;  // N<<1 == N*2

do {
  printf("%u ", s);
  r += 2;
  s ^= N - (N / (r&-r));
}
while (r < N2);


3 comment(s) | add a comment | nofrills version


Block/Loop Benchmarking

Type : Benchmarking Tool
References : Posted by arguru[AT]smartelectronix[DOT]com

Notes :
Requires CPU with RDTSC support

Code :
// Block-Process Benchmarking Code using rdtsc
// useful for measure DSP block stuff
// (based on Intel papers)
// 64-bit precission
// VeryUglyCode(tm) by Arguru

// globals
UINT time,time_low,time_high;

// call this just before enter your loop or whatever
void bpb_start()
{
    // read time stamp to EAX
    __asm rdtsc;
    __asm mov time_low,eax;
    __asm mov time_high,edx;
}

// call the following function just after your loop
// returns average cycles wasted per sample
UINT bpb_finish(UINT const num_samples)
{
    __asm rdtsc
    __asm sub eax,time_low;
    __asm sub edx,time_high;
    __asm div num_samples;
    __asm mov time,eax;
    return time;
}


2 comment(s) | add a comment | nofrills version


Branchless Clipping

Type : Clipping at 0dB, with none of the usual 'if..then..'
References : Posted by musicdsp[AT]dsparsons[DOT]co[DOT]uk

Notes :
I was working on something that I wanted to ensure that the signal never went above 0dB, and a branchless solution occurred to me.

It works by playing with the structure of a single type, shifting the sign bit down to make a new mulitplicand.

calling MaxZerodB(mydBSample) will ensure that it will never stray over 0dB.
By playing with signs or adding/removing offsets, this offers a complete branchless limiting solution, no matter whether dB or not (after all, they're all just numbers...).

eg:
Limit to <=0 : sample:=MaxZerodB(sample);
Limit to <=3 : sample:=MaxZerodB(sample-3)+3;
Limit to <=-4 : sample:=MaxZerodB(sample+4)-4;

Limit to >=0 : sample:=-MaxZerodB(-sample);
Limit to >=2 : sample:=-MaxZerodB(-sample+2)+2;
Limit to >=-1.5: sample:=-MaxZerodB(-sample-1.5)-1.5;

Whether it actually saves any CPU cycles remains to be seen, but it was an interesting diversion for half an hour :)

[Translating from pascal to other languages shouldn't be too hard, and for doubles, you'll need to fiddle it abit :)]


Code :
function MaxZerodB(dBin:single):single;
var tmp:longint;
begin
     //given that leftmost bit of a longint indicates the negative,
     //  if we shift that down to bit0, and multiply dBin by that
     //  it will return dBin, or zero :)
     tmp:=(longint((@dBin)^) and $80000000) shr 31;
     result:=dBin*tmp;
end;


6 comment(s) | add a comment | nofrills version


Calculate notes (java)

Type : Java class for calculating notes with different in params
References : Posted by larsby[AT]elak[DOT]org
Linked file : Frequency.java

Notes :
Converts between string notes and frequencies and back. I vaguely remember writing bits of it, and I got it off the net somwhere so dont ask me

- Larsby




no comments on this item | add a comment | nofrills version


Center separation in a stereo mixdown

References : Posted by Thiburce BELAVENTURE

Notes :
One year ago, i found a little trick to isolate or remove the center in a stereo mixdown.

My method use the time-frequency representation (FFT). I use a min fuction between left and right channels (for each bin) to create the pseudo center. I apply a phase correction, and i substract this signal to the left and right signals.

Then, we can remix them after treatments (or without) to produce a stereo signal in output.

This algorithm (I called it "TBIsolator") is not perfect, but the result is very nice, better than the phase technic (L substract R...). I know that it is not mathematically correct, but as an estimation of the center, the exact match is very hard to obtain. So, it is not so bad (just listen the result and see).

My implementation use a 4096 FFT size, with overlap-add method (factor 2). With a lower FFT size, the sound will be more dirty, and with a 16384 FFT size, the center will have too much high frequency (I don't explore why this thing appears).

I just post the TBIsolator code (see FFTReal in this site for implement the FFT engine).

pIns and pOuts buffers use the representation of the FFTReal class (0 to N/2-1: real parts, N/2 to N-1: imaginary parts).

Have fun with the TBIsolator algorithm ! I hope you enjoy it and if you enhance it, contact me (it's my baby...).

P.S.: the following function is not optimized.


Code :
/* ============================================================= */
/* nFFTSize must be a power of 2                                 */
/* ============================================================= */
/* Usage examples:                                               */
/* - suppress the center:  fAmpL = 1.f, fAmpC = 0.f, fAmpR = 1.f */
/* - keep only the center: fAmpL = 0.f, fAmpC = 1.f, fAmpR = 0.f */
/* ============================================================= */

void processTBIsolator(float *pIns[2], float *pOuts[2], long nFFTSize, float fAmpL, float fAmpC, float fAmpR)
{
    float fModL, fModR;
    float fRealL, fRealC, fRealR;
    float fImagL, fImagC, fImagR;
    double u;

    for ( long i = 0, j = nFFTSize / 2; i < nFFTSize / 2; i++ )
    {
        fModL = pIns[0][i] * pIns[0][i] + pIns[0][j] * pIns[0][j];
        fModR = pIns[1][i] * pIns[1][i] + pIns[1][j] * pIns[1][j];

        // min on complex numbers
        if ( fModL > fModR )
        {
            fRealC = fRealR;
            fImagC = fImagR;
        }
        else
        {
            fRealC = fRealL;
            fImagC = fImagL;
        }

        // phase correction...
        u = fabs(atan2(pIns[0][j], pIns[0][i]) - atan2(pIns[1][j], pIns[1][i])) / 3.141592653589;

        if ( u >= 1 ) u -= 1.;

        u = pow(1 - u*u*u, 24);

        fRealC *= (float) u;
        fImagC *= (float) u;

        // center extraction...
        fRealL = pIns[0][i] - fRealC;
        fImagL = pIns[0][j] - fImagC;

        fRealR = pIns[1][i] - fRealC;
        fImagR = pIns[1][j] - fImagC;

        // You can do some treatments here...

        pOuts[0][i] = fRealL * fAmpL + fRealC * fAmpC;
        pOuts[0][j] = fImagL * fAmpL + fImagC * fAmpC;

        pOuts[1][i] = fRealR * fAmpR + fRealC * fAmpC;
        pOuts[1][j] = fImagR * fAmpR + fImagC * fAmpC;
    }
}


1 comment(s) | add a comment | nofrills version


Center separation in a stereo mixdown

References : Posted by Thiburce BELAVENTURE

Notes :
One year ago, i found a little trick to isolate or remove the center in a stereo mixdown.

My method use the time-frequency representation (FFT). I use a min fuction between left and right channels (for each bin) to create the pseudo center. I apply a phase correction, and i substract this signal to the left and right signals.

Then, we can remix them after treatments (or without) to produce a stereo signal in output.

This algorithm (I called it "TBIsolator") is not perfect, but the result is very nice, better than the phase technic (L substract R...). I know that it is not mathematically correct, but as an estimation of the center, the exact match is very hard to obtain. So, it is not so bad (just listen the result and see).

My implementation use a 4096 FFT size, with overlap-add method (factor 2). With a lower FFT size, the sound will be more dirty, and with a 16384 FFT size, the center will have too much high frequency (I don't explore why this thing appears).

I just post the TBIsolator code (see FFTReal in this site for implement the FFT engine).

pIns and pOuts buffers use the representation of the FFTReal class (0 to N/2-1: real parts, N/2 to N-1: imaginary parts).

Have fun with the TBIsolator algorithm ! I hope you enjoy it and if you enhance it, contact me (it's my baby...).

P.S.: the following function is not optimized.


Code :
/* ============================================================= */
/* nFFTSize must be a power of 2                                 */
/* ============================================================= */
/* Usage examples:                                               */
/* - suppress the center:  fAmpL = 1.f, fAmpC = 0.f, fAmpR = 1.f */
/* - keep only the center: fAmpL = 0.f, fAmpC = 1.f, fAmpR = 0.f */
/* ============================================================= */

void processTBIsolator(float *pIns[2], float *pOuts[2], long nFFTSize, float fAmpL, float fAmpC, float fAmpR)
{
    float fModL, fModR;
    float fRealL, fRealC, fRealR;
    float fImagL, fImagC, fImagR;
    double u;

    for ( long i = 0, j = nFFTSize / 2; i < nFFTSize / 2; i++ )
    {
        fModL = pIns[0][i] * pIns[0][i] + pIns[0][j] * pIns[0][j];
        fModR = pIns[1][i] * pIns[1][i] + pIns[1][j] * pIns[1][j];

        // min on complex numbers
        if ( fModL > fModR )
        {
            fRealC = fRealR;
            fImagC = fImagR;
        }
        else
        {
            fRealC = fRealL;
            fImagC = fImagL;
        }
    
        // phase correction...
        u = fabs(atan2(pIns[0][j], pIns[0][i]) - atan2(pIns[1][j], pIns[1][i])) / 3.141592653589;
    
        if ( u >= 1 ) u -= 1.;
    
        u = pow(1 - u*u*u, 24);
    
        fRealC *= (float) u;
        fImagC *= (float) u;
    
        // center extraction...
        fRealL = pIns[0][i] - fRealC;
        fImagL = pIns[0][j] - fImagC;
    
        fRealR = pIns[1][i] - fRealC;
        fImagR = pIns[1][j] - fImagC;
    
        // You can do some treatments here...
    
        pOuts[0][i] = fRealL * fAmpL + fRealC * fAmpC;
        pOuts[0][j] = fImagL * fAmpL + fImagC * fAmpC;
    
        pOuts[1][i] = fRealR * fAmpR + fRealC * fAmpC;
        pOuts[1][j] = fImagR * fAmpR + fImagC * fAmpC;
    }
}


no comments on this item | add a comment | nofrills version


Cheap pseudo-sinusoidal lfo

References : Posted by fumminger[AT]umminger[DOT]com

Notes :
Although the code is written in standard C++, this algorithm is really better suited for dsps where one can take advantage of multiply-accumulate instructions and where the required phase accumulator can be easily implemented by masking a counter.

It provides a pretty cheap roughly sinusoidal waveform that is good enough for an lfo.


Code :
// x should be between -1.0 and 1.0
inline
double pseudo_sine(double x)
{
    // Compute 2*(x^2-1.0)^2-1.0
    x *= x;
    x -= 1.0;
    x *= x;
    // The following lines modify the range to lie between -1.0 and 1.0.
   // If a range of between 0.0 and 1.0 is acceptable or preferable
   // (as in a modulated delay line) then you can save some cycles.
    x *= 2.0;
    x -= 1.0;
}


2 comment(s) | add a comment | nofrills version


Clipping without branching

Type : Min, max and clip
References : Posted by Laurent de Soras

Notes :
It may reduce accuracy for small numbers. I.e. if you clip to [-1; 1], fractional part of the result will be quantized to 23 bits (or more, depending on the bit depth of the temporary results). Thus, 1e-20 will be rounded to 0. The other (positive) side effect is the denormal number elimination.

Code :
float max (float x, float a)
{
   x -= a;
   x += fabs (x);
   x *= 0.5;
   x += a;
   return (x);
}

float min (float x, float b)
{
   x = b - x;
   x += fabs (x)
   x *= 0.5;
   x = b - x;
   return (x);
}

float clip (float x, float a, float b)
{
   x1 = fabs (x-a);
   x2 = fabs (x-b);
   x = x1 + (a+b);
   x -= x2;
   x *= 0.5;
   return (x);
}


6 comment(s) | add a comment | nofrills version


Constant-time exponent of 2 detector

References : Posted by Brent Lehman (mailbjl[AT]yahoo.com)

Notes :
In your common FFT program, you want to make sure that the frame you're working with has a size that is a power of 2. This tells you in just a few operations. Granted, you won't be using this algorithm inside a loop, so the savings aren't that great, but every little hack helps ;)

Code :
// Quit if size isn't a power of 2
if ((-size ^ size) & size) return;

// If size is an unsigned int, the above might not compile.
// You'd want to use this instead:
if (((~size + 1) ^ size) & size) return;


1 comment(s) | add a comment | nofrills version


Conversion and normalization of 16-bit sample to a floating point number

References : Posted by George Yohng
Code :
float out;
signed short in;

// This code does the same as
//   out = ((float)in)*(1.0f/32768.0f);
//
// Depending on the architecture and conversion settings,
// it might be more optimal, though it is always
// advisable to check its speed against genuine
// algorithms.

((unsigned &)out)=0x43818000^in;
out-=259.0f;


2 comment(s) | add a comment | nofrills version


Conversions on a PowerPC

Type : motorola ASM conversions
References : Posted by James McCartney
Code :
double ftod(float x) { return (double)x;
00000000: 4E800020  blr
    // blr == return from subroutine, i.e. this function is a noop

float dtof(double x) { return (float)x;
00000000: FC200818  frsp       fp1,fp1
00000004: 4E800020  blr

int ftoi(float x) { return (int)x;
00000000: FC00081E  fctiwz     fp0,fp1
00000004: D801FFF0  stfd       fp0,-16(SP)
00000008: 8061FFF4  lwz        r3,-12(SP)
0000000C: 4E800020  blr

int dtoi(double x) { return (int)x;
00000000: FC00081E  fctiwz     fp0,fp1
00000004: D801FFF0  stfd       fp0,-16(SP)
00000008: 8061FFF4  lwz        r3,-12(SP)
0000000C: 4E800020  blr

double itod(int x) { return (double)x;
00000000: C8220000  lfd        fp1,@1558(RTOC)
00000004: 6C608000  xoris      r0,r3,$8000
00000008: 9001FFF4  stw        r0,-12(SP)
0000000C: 3C004330  lis        r0,17200
00000010: 9001FFF0  stw        r0,-16(SP)
00000014: C801FFF0  lfd        fp0,-16(SP)
00000018: FC200828  fsub       fp1,fp0,fp1
0000001C: 4E800020  blr

float itof(int x) { return (float)x;
00000000: C8220000  lfd        fp1,@1558(RTOC)
00000004: 6C608000  xoris      r0,r3,$8000
00000008: 9001FFF4  stw        r0,-12(SP)
0000000C: 3C004330  lis        r0,17200
00000010: 9001FFF0  stw        r0,-16(SP)
00000014: C801FFF0  lfd        fp0,-16(SP)
00000018: EC200828  fsubs      fp1,fp0,fp1
0000001C: 4E800020  blr


no comments on this item | add a comment | nofrills version


Copy-protection schemes

References : Posted by Moyer, Andy

Notes :
This post of Andy sums up everything there is to know about copy-protection schemes:

"Build a great product and release improvements regularly so that people will
be willing to spend the money on it, thus causing anything that is cracked
to be outdated quickly. Build a strong relationship with your customers,
because if they've already paid for one of your products, and were
satisfied, chances are, they will be more likely to buy another one of your
products. Make your copy protection good enough so that somebody can't just
do a search in Google and enter in a published serial number, but don't make
registered users jump through flaming hoops to be able to use the product.
Also use various approaches to copy protection within a release, and vary
those approaches over multiple releases so that a hacker that cracked your
app's version 1.0 can't just run a recorded macro in a text editor to crack
your version 2.0 software [this being simplified]."




1 comment(s) | add a comment | nofrills version


Cubic interpollation

Type : interpollation
References : Posted by Olli Niemitalo
Linked file : other001.gif

Notes :
(see linkfile)
finpos is the fractional, inpos the integer part.


Code :
xm1 = x [inpos - 1];
x0  = x [inpos + 0];
x1  = x [inpos + 1];
x2  = x [inpos + 2];
a = (3 * (x0-x1) - xm1 + x2) / 2;
b = 2*x1 + xm1 - (5*x0 + x2) / 2;
c = (x1 - xm1) / 2;
y [outpos] = (((a * finpos) + b) * finpos + c) * finpos + x0;


no comments on this item | add a comment | nofrills version


Cure for malicious samples

Type : Filters Denormals, NaNs, Infinities
References : Posted by urs[AT]u-he[DOT]com

Notes :
A lot of incidents can happen during processing samples. A nasty one is denormalization, which makes cpus consume insanely many cycles for easiest instructions.

But even worse, if you have NaNs or Infinities inside recursive structures, maybe due to division by zero, all subsequent samples that are multiplied with these values will get "infected" and become NaN or Infinity. Your sound makes BLIPPP and that was it, silence from the speakers.

Thus I've written a small function that sets all of these cases to 0.0f.

You'll notice that I treat a buffer of floats as unsigned integers. And I avaoid branches by using comparison results as 0 or 1.

When compiled with GCC, this function should not create any "hidden" branches, but you should verify the assembly code anyway. Sometimes some parenthesis do the trick...

;) Urs


Code :
#ifndef UInt32
#define UInt32 unsigned int
#endif

void erase_All_NaNs_Infinities_And_Denormals( float* inSamples, int& inNumberOfSamples )
{
    UInt32* inArrayOfFloats = (UInt32*) inSamples;

    for ( int i = 0; i < inNumberOfSamples; i++ )
    {
        UInt32 sample = *inArrayOfFloats;
        UInt32 exponent = sample & 0x7F800000;
        
            // exponent < 0x7F800000 is 0 if NaN or Infinity, otherwise 1
            // exponent > 0 is 0 if denormalized, otherwise 1
            
        int aNaN = exponent < 0x7F800000;
        int aDen = exponent > 0;
            
        *inArrayOfFloats++ = sample * ( aNaN & aDen );
    
    }
}


5 comment(s) | add a comment | nofrills version


Denormal DOUBLE variables, macro

References : Posted by Jon Watte

Notes :
Use this macro if you want to find denormal numbers and you're using doubles...

Code :
#if PLATFORM_IS_BIG_ENDIAN
#define INDEX 0
#else
#define INDEX 1
#endif
inline bool is_denormal( double const & d ) {
  assert( sizeof( d ) == 2*sizeof( int ) );
  int l = ((int *)&d)[INDEX];
  return (l&0x7fe00000) != 0;
}


1 comment(s) | add a comment | nofrills version


Denormal numbers

References : Compiled by Merlijn Blaauw
Linked file : other001.txt

Notes :
this text describes some ways to avoid denormalisation. Denormalisation happens when FPU's go mad processing very small numbers



1 comment(s) | add a comment | nofrills version


Denormal numbers, the meta-text

References : Laurent de Soras
Linked file : denormal.pdf

Notes :
This very interesting paper, written by Laurent de Soras (www.ohmforce.com) has everything you ever wanted to know about denormal numbers! And it obviously descibes how you can get rid of them too!

(see linked file)




no comments on this item | add a comment | nofrills version


Denormalization preventer

References : Posted by gol

Notes :
A fast tiny numbers sweeper using integer math. Only for 32bit floats. Den_Thres is your 32bit (normalized) float threshold, something small enough but big enough to prevent future denormalizations.

EAX=input buffer
EDX=length
(adapt to your compiler)


Code :
    MOV   ECX,EDX
    LEA   EDI,[EAX+4*ECX]
    NEG   ECX
    MOV   EDX,Den_Thres
    SHL   EDX,1
    XOR   ESI,ESI

    @Loop:
    MOV   EAX,[EDI+4*ECX]
    LEA   EBX,[EAX*2]
    CMP   EBX,EDX
    CMOVB EAX,ESI
    MOV   [EDI+4*ECX],EAX

    INC   ECX
    JNZ   @Loop


no comments on this item | add a comment | nofrills version


Denormalization preventer

References : Posted by didid[AT]skynet[DOT]be

Notes :
Because I still see people adding noise or offset to their signal to avoid slow denormalization, here's a piece of code to zero out (near) tiny numbers instead.

Why zeroing out is better? Because a fully silent signal is better than a little offset, or noise. A host or effect can detect silent signals and choose not to process them in a safe way.
Plus, adding an offset or noise reduces huge packets of denormalization, but still leaves some behind.
Also, truncating is what the DAZ (Denormals Are Zero) SSE flag does.

This code uses integer comparison, and a CMOV, so you need a Pentium Pro or higher.
There's no need for an SSE version, as if you have SSE code you're probably already using the DAZ flag instead (but I advise plugins not to mess with the SSE flags, as the host is likely to have DAZ switched on already). This is for FPU code. Should work much faster than crap FPU comparison.

Den_Thres is your threshold, it cannot be denormalized (would be pointless). The function is Delphi, if you want to adapt, just make sure EAX is the buffer and EDX is length (Delphi register calling convention - it's not the same in C++).


Code :
const  Den_Thres:Single=1/$1000000;

procedure PrevFPUDen_Buffer(Buffer:Pointer;Length:Integer);
asm
    PUSH  ESI
    PUSH  EDI
    PUSH  EBX

    MOV   ECX,EDX
    LEA   EDI,[EAX+4*ECX]
    NEG   ECX
    MOV   EDX,Den_Thres
    SHL   EDX,1
    XOR   ESI,ESI

    @Loop:
    MOV   EAX,[EDI+4*ECX]
    LEA   EBX,[EAX*2]
    CMP   EBX,EDX
    CMOVB EAX,ESI
    MOV   [EDI+4*ECX],EAX

    INC   ECX
    JNZ   @Loop

    POP   EBX
    POP   EDI
    POP   ESI
End;


4 comment(s) | add a comment | nofrills version


Dither code

Type : Dither with noise-shaping
References : Posted by Paul Kellett

Notes :
This is a simple implementation of highpass triangular-PDF dither (a good general-purpose dither) with optional 2nd-order noise shaping (which lowers the noise floor by 11dB below 0.1 Fs).
The code assumes input data is in the range +1 to -1 and doesn't check for overloads!

To save time when generating dither for multiple channels you can re-use lower bits of a previous random number instead of calling rand() again. e.g. r3=(r1 & 0x7F)<<8;


Code :
  int   r1, r2;                //rectangular-PDF random numbers
  float s1, s2;                //error feedback buffers
  float s = 0.5f;              //set to 0.0f for no noise shaping
  float w = pow(2.0,bits-1);   //word length (usually bits=16)
  float wi= 1.0f/w;
  float d = wi / RAND_MAX;     //dither amplitude (2 lsb)
  float o = wi * 0.5f;         //remove dc offset
  float in, tmp;
  int   out;

//for each sample...

  r2=r1;                               //can make HP-TRI dither by
  r1=rand();                           //subtracting previous rand()

  in += s * (s1 + s1 - s2);            //error feedback
  tmp = in + o + d * (float)(r1 - r2); //dc offset and dither

  out = (int)(w * tmp);                //truncate downwards
  if(tmp<0.0f) out--;                  //this is faster than floor()

  s2 = s1;
  s1 = in - wi * (float)out;           //error


no comments on this item | add a comment | nofrills version


Dithering

References : Paul Kellett
Linked file : nsdither.txt

Notes :
(see linked file)



no comments on this item | add a comment | nofrills version


Double to Int

Type : pointer cast (round to zero, or 'trunctate')
References : Posted by many people, implementation by Andy M00cho

Notes :
-Platform independant, literally. You have IEEE FP numbers, this will work, as long as your not expecting a signed integer back larger than 16bits :)
-Will only work correctly for FP numbers within the range of [-32768.0,32767.0]
-The FPU must be in Double-Precision mode


Code :
typedef double lreal;
typedef float  real;
typedef unsigned long uint32;
typedef long int32;

   //2^36 * 1.5, (52-_shiftamt=36) uses limited precision to floor
   //16.16 fixed point representation

const lreal _double2fixmagic = 68719476736.0*1.5;
const int32 _shiftamt        = 16;

#if BigEndian_
        #define iexp_                           0
        #define iman_                           1
#else
        #define iexp_                           1
        #define iman_                           0
#endif //BigEndian_

// Real2Int
inline int32 Real2Int(lreal val)
{
   val= val + _double2fixmagic;
   return ((int32*)&val)[iman_] >> _shiftamt;
}

// Real2Int
inline int32 Real2Int(real val)
{
   return Real2Int ((lreal)val);
}

For the x86 assembler freaks here's the assembler equivalent:

__double2fixmagic    dd 000000000h,042380000h

fld    AFloatingPoint Number
fadd   QWORD PTR __double2fixmagic
fstp   TEMP
movsx  eax,TEMP+2


2 comment(s) | add a comment | nofrills version


Envelope Follower

References : Posted by ers
Code :
#define V_ENVELOPE_FOLLOWER_NUM_POINTS    2000
class vEnvelopeFollower :
{
      public:
        vEnvelopeFollower();
        virtual ~vEnvelopeFollower();
        inline void Calculate(float *b)
        {
            envelopeVal -= *buff;
            if (*b < 0)
                envelopeVal += *buff = -*b;
            else
                envelopeVal += *buff = *b;
            if (buff++ == bufferEnd)
                buff = buffer;
        }
        void SetBufferSize(float value);
        void GetControlValue(){return envelopeVal / (float)bufferSize;}

    private:
        float    buffer[V_ENVELOPE_FOLLOWER_NUM_POINTS];
        float    *bufferEnd, *buff, envelopeVal;
        int    bufferSize;
      float    val;
};

vEnvelopeFollower::vEnvelopeFollower()
{
    bufferEnd = buffer + V_ENVELOPE_FOLLOWER_NUM_POINTS-1;
    buff = buffer;
    val = 0;
    float *b = buffer;
    do
    {
        *b++ = 0;
    }while (b <= bufferEnd);
    bufferSize = V_ENVELOPE_FOLLOWER_NUM_POINTS;
    envelopeVal= 0;
}

vEnvelopeFollower::~vEnvelopeFollower()
{

}

void vEnvelopeFollower::SetBufferSize(float value)
{

    bufferEnd = buffer + (bufferSize = 100 + (int)(value * ((float)V_ENVELOPE_FOLLOWER_NUM_POINTS-102)));
    buff = buffer;
    float val =  envelopeVal / bufferSize;
    do
    {
        *buff++ = val;
    }while (buff <= bufferEnd);
    buff = buffer;
}


1 comment(s) | add a comment | nofrills version


Exponential curve for

Type : Exponential curve
References : Posted by neolit123 [at] gmail [dot] com

Notes :
When you design a frequency control for your filters you may need an exponential curve to give the lower frequencies more resolution.

F=20-20000hz
x=0-100%

Case (control middle point):
x=50%
F=1135hz

Ploted diagram with 5 points:
http://img151.imageshack.us/img151/9938/expplotur3.jpg


Code :
//tweak - 14.15005 to change middle point and F(max)
F = 19+floor(pow(4,x/14.15005))+x*20;



3 comment(s) | add a comment | nofrills version


Exponential parameter mapping

References : Posted by Russell Borogove

Notes :
Use this if you want to do an exponential map of a parameter (mParam) to a range (mMin - mMax).
Output is in mData...


Code :
float logmax = log10f( mMax );
float logmin = log10f( mMin );
float logdata = (mParam * (logmax-logmin)) + logmin;

mData = powf( 10.0f, logdata );
if (mData < mMin)
{
  mData = mMin;
}
if (mData > mMax)
{
  mData = mMax;
}


4 comment(s) | add a comment | nofrills version


fast abs/neg/sign for 32bit floats

Type : floating point functions
References : Posted by tobybear[AT]web[DOT]de

Notes :
Haven't seen this elsewhere, probably because it is too obvious? Anyway, these functions are intended for 32-bit floating point numbers only and should work a bit faster than the regular ones.

fastabs() gives you the absolute value of a float
fastneg() gives you the negative number (faster than multiplying with -1)
fastsgn() gives back +1 for 0 or positive numbers, -1 for negative numbers

Comments are welcome (tobybear[AT]web[DOT]de)

Cheers

Toby (www.tobybear.de)


Code :
// C/C++ code:
float fastabs(float f)
{int i=((*(int*)&f)&0x7fffffff);return (*(float*)&i);}

float fastneg(float f)
{int i=((*(int*)&f)^0x80000000);return (*(float*)&i);}

int fastsgn(float f)
{return 1+(((*(int*)&f)>>31)<<1);}

//Delphi/Pascal code:
function fastabs(f:single):single;
begin i:=longint((@f)^) and $7FFFFFFF;result:=single((@i)^) end;

function fastneg(f:single):single;
begin i:=longint((@f)^) xor $80000000;result:=single((@i)^) end;

function fastsgn(f:single):longint;
begin result:=1+((longint((@f)^) shr 31)shl 1) end;



6 comment(s) | add a comment | nofrills version


Fast binary log approximations

Type : C code
References : Posted by musicdsp.org[AT]mindcontrol.org

Notes :
This code uses IEEE 32-bit floating point representation knowledge to quickly compute approximations to the log2 of a value. Both functions return under-estimates of the actual value, although the second flavour is less of an under-estimate than the first (and might be sufficient for using in, say, a dBV/FS level meter).

Running the test program, here's the output:

0.1: -4 -3.400000
1: 0 0.000000
2: 1 1.000000
5: 2 2.250000
100: 6 6.562500


Code :
// Fast logarithm (2-based) approximation
// by Jon Watte

#include <assert.h>

int floorOfLn2( float f ) {
  assert( f > 0. );
  assert( sizeof(f) == sizeof(int) );
  assert( sizeof(f) == 4 );
  return (((*(int *)&f)&0x7f800000)>>23)-0x7f;
}

float approxLn2( float f ) {
  assert( f > 0. );
  assert( sizeof(f) == sizeof(int) );
  assert( sizeof(f) == 4 );
  int i = (*(int *)&f);
  return (((i&0x7f800000)>>23)-0x7f)+(i&0x007fffff)/(float)0x800000;
}

// Here's a test program:

#include <stdio.h>

// insert code from above here

int
main()
{
  printf( "0.1: %d  %f\n", floorOfLn2( 0.1 ), approxLn2( 0.1 ) );
  printf( "1:   %d  %f\n", floorOfLn2( 1. ), approxLn2( 1. ) );
  printf( "2:   %d  %f\n", floorOfLn2( 2. ), approxLn2( 2. ) );
  printf( "5:   %d  %f\n", floorOfLn2( 5. ), approxLn2( 5. ) );
  printf( "100: %d  %f\n", floorOfLn2( 100. ), approxLn2( 100. ) );
  return 0;
}


1 comment(s) | add a comment | nofrills version


Fast cube root, square root, and reciprocal for x86/SSE CPUs.

References : Posted by kaleja[AT]estarcion[DOT]com

Notes :
All of these methods use SSE instructions or bit twiddling tricks to get a rough approximation to cube root, square root, or reciprocal, which is then refined with one or more Newton-Raphson approximation steps.

Each is named to indicate its approximate level of accuracy and a comment describes its performance relative to the conventional versions.



Code :
// These functions approximate reciprocal, square root, and
// cube root to varying degrees of precision substantially
// faster than the straightforward methods 1/x, sqrtf(x),
// and powf( x, 1.0f/3.0f ). All require SSE-enabled CPU & OS.
// Unlike the powf() solution, the cube roots also correctly
// handle negative inputs.  

#define REALLY_INLINE __forceinline

// ~34 clocks on Pentium M vs. ~360 for powf
REALLY_INLINE float cuberoot_sse_8bits( float x )
{
    float z;
    static const float three = 3.0f;
    _asm
    {
        mov        eax, x                // x as bits
        movss    xmm2, x                // x2: x
        movss    xmm1, three            // x1: 3
        // Magic floating point cube root done with integer math.
        // The exponent is divided by three in such a way that
        // remainder bits get shoved into the top of the normalized
        // mantissa.
        mov        ecx, eax            // copy of x
        and        eax, 0x7FFFFFFF        // exponent & mantissa of x in biased-127
        sub     eax, 0x3F800000        // exponent & mantissa of x in 2's comp
        sar     eax, 10                //
        imul    eax, 341            // 341/1024 ~= .333
        add        eax, 0x3F800000        // back to biased-127
        and     eax, 0x7FFFFFFF        // remask
        and        ecx, 0x80000000        // original sign and mantissa
        or      eax, ecx            // masked new exponent, old sign and mantissa
        mov        z, eax                //
        
        // use SSE to refine the first approximation
        movss    xmm0, z                ;// x0: z
        movss    xmm3, xmm0            ;// x3: z
        mulss    xmm3, xmm0            ;// x3: z*z
        movss    xmm4, xmm3            ;// x4: z*z
        mulss   xmm3, xmm1            ;// x3: 3*z*z
        rcpss    xmm3, xmm3            ;// x3: ~ 1/(3*z*z)
        mulss   xmm4, xmm0            ;// x4: z*z*z
        subss    xmm4, xmm2            ;// x4: z*z*z-x
        mulss    xmm4, xmm3            ;// x4: (z*z*z-x) / (3*z*z)
        subss    xmm0, xmm4            ;// x0: z' accurate to within about 0.3%
        movss    z, xmm0
    }

    return z;
}

// ~60 clocks on Pentium M vs. ~360 for powf
REALLY_INLINE float cuberoot_sse_16bits( float x )
{
    float z;
    static const float three = 3.0f;
    _asm
    {
        mov        eax, x                // x as bits
        movss    xmm2, x                // x2: x
        movss    xmm1, three            // x1: 3
        // Magic floating point cube root done with integer math.
        // The exponent is divided by three in such a way that
        // remainder bits get shoved into the top of the normalized
        // mantissa.
        mov        ecx, eax            // copy of x
        and        eax, 0x7FFFFFFF        // exponent & mantissa of x in biased-127
        sub     eax, 0x3F800000        // exponent & mantissa of x in 2's comp
        sar     eax, 10                //
        imul    eax, 341            // 341/1024 ~= .333
        add        eax, 0x3F800000        // back to biased-127
        and     eax, 0x7FFFFFFF        // remask
        and        ecx, 0x80000000        // original sign and mantissa
        or      eax, ecx            // masked new exponent, old sign and mantissa
        mov        z, eax                //
        
        // use SSE to refine the first approximation
        movss    xmm0, z                ;// x0: z
        movss    xmm3, xmm0            ;// x3: z
        mulss    xmm3, xmm0            ;// x3: z*z
        movss    xmm4, xmm3            ;// x4: z*z
        mulss   xmm3, xmm1            ;// x3: 3*z*z
        rcpss    xmm3, xmm3            ;// x3: ~ 1/(3*z*z)
        mulss   xmm4, xmm0            ;// x4: z*z*z
        subss    xmm4, xmm2            ;// x4: z*z*z-x
        mulss    xmm4, xmm3            ;// x4: (z*z*z-x) / (3*z*z)
        subss    xmm0, xmm4            ;// x0: z' accurate to within about 0.3%

        movss    xmm3, xmm0            ;// x3: z
        mulss    xmm3, xmm0            ;// x3: z*z
        movss    xmm4, xmm3            ;// x4: z*z
        mulss   xmm3, xmm1            ;// x3: 3*z*z
        rcpss    xmm3, xmm3            ;// x3: ~ 1/(3*z*z)
        mulss   xmm4, xmm0            ;// x4: z*z*z
        subss    xmm4, xmm2            ;// x4: z*z*z-x
        mulss    xmm4, xmm3            ;// x4: (z*z*z-x) / (3*z*z)
        subss    xmm0, xmm4            ;// x0: z'' accurate to within about 0.001%

        movss    z, xmm0
    }
    
    return z;
}

// ~77 clocks on Pentium M vs. ~360 for powf
REALLY_INLINE float cuberoot_sse_22bits( float x )
{
    float z;
    static const float three = 3.0f;
    _asm
    {
        mov        eax, x                // x as bits
        movss    xmm2, x                // x2: x
        movss    xmm1, three            // x1: 3
        // Magic floating point cube root done with integer math.
        // The exponent is divided by three in such a way that
        // remainder bits get shoved into the top of the normalized
        // mantissa.
        mov        ecx, eax            // copy of x
        and        eax, 0x7FFFFFFF        // exponent & mantissa of x in biased-127
        sub     eax, 0x3F800000        // exponent & mantissa of x in 2's comp
        sar     eax, 10                //
        imul    eax, 341            // 341/1024 ~= .333
        add        eax, 0x3F800000        // back to biased-127
        and     eax, 0x7FFFFFFF        // remask
        and        ecx, 0x80000000        // original sign and mantissa
        or      eax, ecx            // masked new exponent, old sign and mantissa
        mov        z, eax                //
        
        // use SSE to refine the first approximation
        movss    xmm0, z                // x0: z
        movss    xmm3, xmm0            // x3: z
        mulss    xmm3, xmm0            // x3: z*z
        movss    xmm4, xmm3            // x4: z*z
        mulss   xmm3, xmm1            // x3: 3*z*z
        rcpss    xmm3, xmm3            // x3: ~ 1/(3*z*z)
        mulss   xmm4, xmm0            // x4: z*z*z
        subss    xmm4, xmm2            // x4: z*z*z-x
        mulss    xmm4, xmm3            // x4: (z*z*z-x) / (3*z*z)
        subss    xmm0, xmm4            // x0: z' accurate to within about 0.3%
        
        movss    xmm3, xmm0            // x3: z
        mulss    xmm3, xmm0            // x3: z*z
        movss    xmm4, xmm3            // x4: z*z
        mulss   xmm3, xmm1            // x3: 3*z*z
        rcpss    xmm3, xmm3            // x3: ~ 1/(3*z*z)
        mulss   xmm4, xmm0            // x4: z*z*z
        subss    xmm4, xmm2            // x4: z*z*z-x
        mulss    xmm4, xmm3            // x4: (z*z*z-x) / (3*z*z)
        subss    xmm0, xmm4            // x0: z'' accurate to within about 0.001%

        movss    xmm3, xmm0            // x3: z
        mulss    xmm3, xmm0            // x3: z*z
        movss    xmm4, xmm3            // x4: z*z
        mulss   xmm3, xmm1            // x3: 3*z*z
        rcpss    xmm3, xmm3            // x3: ~ 1/(3*z*z)
        mulss   xmm4, xmm0            // x4: z*z*z
        subss    xmm4, xmm2            // x4: z*z*z-x
        mulss    xmm4, xmm3            // x4: (z*z*z-x) / (3*z*z)
        subss    xmm0, xmm4            // x0: z''' accurate to within about 0.000012%
        
        movss    z, xmm0
    }
    
    return z;
}

// ~6 clocks on Pentium M vs. ~24 for single precision sqrtf
REALLY_INLINE float squareroot_sse_11bits( float x )
{
    float z;
    _asm
    {
        rsqrtss xmm0, x
        rcpss    xmm0, xmm0
        movss    z, xmm0            // z ~= sqrt(x) to 0.038%
    }
    return z;
}

// ~19 clocks on Pentium M vs. ~24 for single precision sqrtf
REALLY_INLINE float squareroot_sse_22bits( float x )
{
    static float half = 0.5f;
    float z;
    _asm
    {
        movss    xmm1, x            // x1: x
        rsqrtss xmm2, xmm1        // x2: ~1/sqrt(x) = 1/z
        rcpss    xmm0, xmm2        // x0: z == ~sqrt(x) to 0.05%
        
        movss    xmm4, xmm0        // x4: z
        movss    xmm3, half
        mulss    xmm4, xmm4        // x4: z*z
        mulss    xmm2, xmm3        // x2: 1 / 2z
        subss    xmm4, xmm1        // x4: z*z-x
        mulss    xmm4, xmm2        // x4: (z*z-x)/2z
        subss    xmm0, xmm4        // x0: z' to 0.000015%
        
        movss    z, xmm0          
    }
    return z;
}

// ~12 clocks on Pentium M vs. ~16 for single precision divide
REALLY_INLINE float reciprocal_sse_22bits( float x )
{
    float z;
    _asm
    {
        rcpss    xmm0, x            // x0: z ~= 1/x
        movss    xmm2, x            // x2: x
        movss    xmm1, xmm0        // x1: z ~= 1/x
        addss    xmm0, xmm0        // x0: 2z
        mulss    xmm1, xmm1        // x1: z^2
        mulss    xmm1, xmm2        // x1: xz^2
        subss    xmm0, xmm1        // x0: z' ~= 1/x to 0.000012%

        movss    z, xmm0          
    }
    return z;
}


5 comment(s) | add a comment | nofrills version


fast exp() approximations

Type : Taylor series approximation
References : Posted by scoofy[AT]inf[DOT]elte[DOT]hu

Notes :
I needed a fast exp() approximation in the -3.14..3.14 range, so I made some approximations based on the tanh() code posted in the archive by Fuzzpilz. Should be pretty straightforward, but someone may find this useful.

The increasing numbers in the name of the function means increasing precision. Maximum error in the -1..1 range:
fastexp3: 0.05 (1.8%)
fastexp4: 0.01 (0.36%)
fastexp5: 0.0016152 (0.59%)
fastexp6: 0.0002263 (0.0083%)
fastexp7: 0.0000279 (0.001%)
fastexp8: 0.0000031 (0.00011%)
fastexp9: 0.0000003 (0.000011%)

Maximum error in the -3.14..3.14 range:
fastexp3: 8.8742 (38.4%)
fastexp4: 4.8237 (20.8%)
fastexp5: 2.28 (9.8%)
fastexp6: 0.9488 (4.1%)
fastexp7: 0.3516 (1.5%)
fastexp8: 0.1172 (0.5%)
fastexp9: 0.0355 (0.15%)

These were done using the Taylor series, for example I got fastexp4 by using:
exp(x) = 1 + x + x^2/2 + x^3/6 + x^4/24 + ...
= (24 + 24x + x^2*12 + x^3*4 + x^4) / 24
(using Horner-scheme:)
= (24 + x * (24 + x * (12 + x * (4 + x)))) * 0.041666666f



Code :
inline float fastexp3(float x) {
    return (6+x*(6+x*(3+x)))*0.16666666f;
}

inline float fastexp4(float x) {
    return (24+x*(24+x*(12+x*(4+x))))*0.041666666f;
}

inline float fastexp5(float x) {
    return (120+x*(120+x*(60+x*(20+x*(5+x)))))*0.0083333333f;
}

inline float fastexp6(float x) {
    return 720+x*(720+x*(360+x*(120+x*(30+x*(6+x))))))*0.0013888888f;
}

inline float fastexp7(float x) {
    return (5040+x*(5040+x*(2520+x*(840+x*(210+x*(42+x*(7+x)))))))*0.00019841269f;
}

inline float fastexp8(float x) {
    return (40320+x*(40320+x*(20160+x*(6720+x*(1680+x*(336+x*(56+x*(8+x))))))))*2.4801587301e-5;
}

inline float fastexp9(float x) {
  return (362880+x*(362880+x*(181440+x*(60480+x*(15120+x*(3024+x*(504+x*(72+x*(9+x)))))))))*2.75573192e-6;
}


13 comment(s) | add a comment | nofrills version


Fast exp2 approximation

References : Posted by Laurent de Soras

Notes :
Partial approximation of exp2 in fixed-point arithmetic. It is exactly :
[0 ; 1[ -> [0.5 ; 1[
f : x |-> 2^(x-1)
To get the full exp2 function, you have to separate the integer and fractionnal part of the number. The integer part may be processed by bitshifting. Process the fractionnal part with the function, and multiply the two results.
Maximum error is only 0.3 % which is pretty good for two mul ! You get also the continuity of the first derivate.

-- Laurent


Code :
// val is a 16-bit fixed-point value in 0x0 - 0xFFFF ([0 ; 1[)
// Returns a 32-bit fixed-point value in 0x80000000 - 0xFFFFFFFF ([0.5 ; 1[)
unsigned int fast_partial_exp2 (int val)
{
   unsigned int   result;

   __asm
   {
      mov eax, val
      shl eax, 15           ; eax = input [31/31 bits]
      or  eax, 080000000h   ; eax = input + 1  [32/31 bits]
      mul eax
      mov eax, edx          ; eax = (input + 1) ^ 2 [32/30 bits]
      mov edx, 2863311531   ; 2/3 [32/32 bits], rounded to +oo
      mul edx               ; eax = 2/3 (input + 1) ^ 2 [32/30 bits]
      add edx, 1431655766   ; + 4/3 [32/30 bits] + 1
      mov result, edx
   }

   return (result);
}


no comments on this item | add a comment | nofrills version


Fast Float Random Numbers

References : Posted by dominik.ries[AT]gmail[DOT]com

Notes :
a small and fast implementation for random float numbers in the range [-1,1], usable as white noise oscillator.

compared to the naive usage of the rand() function it gives a speedup factor of 9-10.

compared to a direct implementation of the rand() function (visual studio implementation) it still gives a speedup by a factor of 2-3.

apart from beeing faster it also provides more precision for the resulting floats since its base values use full 32bit precision.



Code :
// set the initial seed to whatever you like
static int RandSeed = 1;

// using rand() (16bit precision)
// takes about 110 seconds for 2 billion calls
float RandFloat1()
{  
    return ((float)rand()/RAND_MAX) * 2.0f - 1.0f;  
}

// direct implementation of rand() (16 bit precision)
// takes about 32 seconds for 2 billion calls
float RandFloat2()
{  
    return ((float)(((RandSeed = RandSeed * 214013L + 2531011L) >> 16) & 0x7fff)/RAND_MAX) * 2.0f - 1.0f;  
}  

// fast rand float, using full 32bit precision
// takes about 12 seconds for 2 billion calls
float Fast_RandFloat()
{
    RandSeed *= 16807;
    return (float)RandSeed * 4.6566129e-010f;
}


5 comment(s) | add a comment | nofrills version


Fast log2

References : Posted by Laurent de Soras
Code :
inline float fast_log2 (float val)
{
   assert (val > 0);

   int * const  exp_ptr = reinterpret_cast <int *> (&val);
   int          x = *exp_ptr;
   const int    log_2 = ((x >> 23) & 255) - 128;
   x &= ~(255 << 23);
   x += 127 << 23;
   *exp_ptr = x;

   return (val + log_2);
}


6 comment(s) | add a comment | nofrills version


fast power and root estimates for 32bit floats

Type : floating point functions
References : Posted by tobybear[AT]web[DOT]de

Notes :
Original code by Stefan Stenzel (also in this archive, see "pow(x,4) approximation") - extended for more flexibility.

fastpow(f,n) gives a rather *rough* estimate of a float number f to the power of an integer number n (y=f^n). It is fast but result can be quite a bit off, since we directly mess with the floating point exponent.-> use it only for getting rough estimates of the values and where precision is not that important.

fastroot(f,n) gives the n-th root of f. Same thing concerning precision applies here.

Cheers

Toby (www.tobybear.de)


Code :
//C/C++ source code:
float fastpower(float f,int n)
{
long *lp,l;
lp=(long*)(&f);
l=*lp;l-=0x3F800000l;l<<=(n-1);l+=0x3F800000l;
*lp=l;
return f;
}

float fastroot(float f,int n)
{
long *lp,l;
lp=(long*)(&f);
l=*lp;l-=0x3F800000l;l>>=(n-1);l+=0x3F800000l;
*lp=l;
return f;
}

//Delphi/Pascal source code:
function fastpower(i:single;n:integer):single;
var l:longint;
begin
l:=longint((@i)^);
l:=l-$3F800000;l:=l shl (n-1);l:=l+$3F800000;
result:=single((@l)^);
end;

function fastroot(i:single;n:integer):single;
var l:longint;
begin
l:=longint((@i)^);
l:=l-$3F800000;l:=l shr (n-1);l:=l+$3F800000;
result:=single((@l)^);
end;


no comments on this item | add a comment | nofrills version


Fast rounding functions in pascal

Type : round/ceil/floor/trunc
References : Posted by bouba@hotmail.com

Notes :
Original documentation with cpp samples:
http://ldesoras.free.fr/prod.html#doc_rounding


Code :
Pascal translation of the functions (accurates ones) :

function ffloor(f:double):integer;
var
  i:integer;
  t : double;
begin
t := -0.5 ;
  asm
    fld   f
    fadd  st,st(0)
    fadd  t
    fistp i
    sar   i, 1
end;
result:= i
end;

function fceil(f:double):integer;
var
  i:integer;
  t : double;
begin
t := -0.5 ;
  asm
    fld   f
    fadd  st,st(0)
    fsubr t
    fistp i
    sar   i, 1
end;
result:= -i
end;

function ftrunc(f:double):integer;
var
  i:integer;
  t : double;
begin
t := -0.5 ;
  asm
    fld   f
    fadd  st,st(0)
    fabs
    fadd t
    fistp i
    sar   i, 1
end;
if f<0 then i := -i;
result:= i
end;

function fround(f:double):integer;
var
  i:integer;
  t : double;
begin
t := 0.5 ;
  asm
    fld   f
    fadd  st,st(0)
    fadd t
    fistp i
    sar   i, 1
end;
result:= i
end;


1 comment(s) | add a comment | nofrills version


Fast sign for 32 bit floats

References : Posted by Peter Schoffhauzer

Notes :
Fast functions which give the sign of a 32 bit floating point number by checking the sign bit. There are two versions, one which gives the value as a float, and the other gives an int.

The _nozero versions are faster, but they give incorrect 1 or -1 for zero (depending on the sign bit set in the number). The int version should be faster than the Tobybear one in the archive, since this one doesn't have an addition, just bit operations.


Code :
/*------------------------------------------------------------------------------------
   fast sign, returns float
------------------------------------------------------------------------------------*/

// returns 1.0f for positive floats, -1.0f for negative floats
// for zero it does not work (may gives 1.0f or -1.0f), but it's faster
inline float fast_sign_nozero(float f) {
    float r = 1.0f;
    (int&)r |= ((int&)f & 0x80000000); // mask sign bit in f, set it in r if necessary
    return r;
}

// returns 1.0f for positive floats, -1.0f for negative floats, 0.0f for zero
inline float fast_sign(float f) {
    if (((int&)f & 0x7FFFFFFF)==0) return 0.f; // test exponent & mantissa bits: is input zero?
    else {
        float r = 1.0f;
        (int&)r |= ((int&)f & 0x80000000); // mask sign bit in f, set it in r if necessary
        return r;
    }
}

/*------------------------------------------------------------------------------------
   fast sign, returns int
------------------------------------------------------------------------------------*/

// returns 1 for positive floats, -1 for negative floats
// for 0.0f input it does not work (may give 1 or -1), but it's faster
inline int fast_sign_int_nozero(float f) {
      return (signed((int&)f & 0x80000000) >> 31 ) | 1;
}

// returns 1 for positive floats, -1 for negative floats, 0 for 0.0f
inline int fast_sign_int(float f) {
    if (((int&)f & 0x7FFFFFFF)==0) return 0; // test exponent & mantissa bits: is input zero?
    return (signed((int&)f & 0x80000000) >> 31 ) | 1;
}


2 comment(s) | add a comment | nofrills version


Float to int

References : Posted by Ross Bencina

Notes :
intel only

Code :
int truncate(float flt)
{
  int i;
  static const double half = 0.5f;
  _asm
  {
     fld flt
     fsub half
     fistp i
  }
  return i
}


3 comment(s) | add a comment | nofrills version


Float to int (more intel asm)

References : Posted by Laurent de Soras (via flipcode)

Notes :
[Found this on flipcode, seemed worth posting here too, hopefully Laurent will approve :) -- Ross]

Here is the code I often use. It is not a _ftol replacement, but it provides useful functions. The current processor rounding mode should be "to nearest" ; it is the default setting for most of the compilers.

The [fadd st, st (0) / sar i,1] trick fixes the "round to nearest even number" behaviour. Thus, round_int (N+0.5) always returns N+1 and floor_int function is appropriate to convert floating point numbers into fixed point numbers.

---------------------

Laurent de Soras
Audio DSP engineer & software designer
Ohm Force - Digital audio software
http://www.ohmforce.com


Code :
inline int round_int (double x)
{
   int      i;
   static const float round_to_nearest = 0.5f;
   __asm
   {
      fld      x
      fadd     st, st (0)
      fadd     round_to_nearest
      fistp    i
      sar      i, 1
   }

   return (i);
}

inline int floor_int (double x)
{
   int      i;
   static const float round_toward_m_i = -0.5f;
   __asm
    {
      fld      x
      fadd     st, st (0)
      fadd     round_toward_m_i
      fistp    i
      sar      i, 1
   }

   return (i);
}

inline int ceil_int (double x)
{
   int      i;
   static const float round_toward_p_i = -0.5f;
   __asm
   {
      fld      x
      fadd     st, st (0)
      fsubr    round_toward_p_i
      fistp    i
      sar      i, 1
   }

   return (-i);
}


5 comment(s) | add a comment | nofrills version


Float to integer conversion

References : Posted by Peter Schoffhauzer

Notes :
The fld x/fistp i instructions can be used for fast conversion of floating point numbers to integers. By default, the number is rounded to the nearest integer. However, sometimes that's not what we need.

Bits 12 and 11 if the FPU control word determine the way the FPU rounds numbers. The four rounding states are:

00 = round to nearest (this is the default)
01 = round down (towards -infinity)
10 = round up(towards +infinity)
11 = truncate up(towards zero)

The status word is loaded/stored using the fldcw/fstcw instructions. After setting the rounding mode as desired, the float2int() function will use that rounding mode until the control mode is reset. The ceil() and floor() functions set the rounding mode for every instruction, which is very slow. Therefore, it is a lot faster to set the rounding mode (down or up) before processing a block, and use float2int() instead.

References: SIMPLY FPU by Raymond Filiatreault
http://www.website.masmforum.com/tutorials/fptute/index.html

MSVC (and probably other compilers too) has functions defined in <float.h> for changing the FPU control word: _control87/_controlfp. The equivalent instructions are:

_controlfp(_RC_CHOP, _MCW_RC); // set rounding mode to truncate
_controlfp(_RC_UP _MCW_RC); // set rounding mode to up (ceil)
_controlfp(_RC_DOWN, _MCW_RC); // set rounding mode to down (floor)
_controlfp(_RC_NEAR, _MCW_RC); // set rounding mode to near (default)

Note that the FPU rounding mode affects other calculations too, so the same code may give different results.

Alternatively, single precision floating point numbers can be truncated or rounded to integers by using SSE instructions cvtss2si, cvttss2si, cvtps2pi or cvttps2pi, where SSE instructions are available (which means probably on all modern CPUs). These are not discussed here in detail, but I provided the function truncateSSE which always truncates a single precision floating point number to integer, regardless of the current rounding mode.

(Also I think the SSE rounding mode can differ from the rounding mode set in the FPU control word, but I haven't tested it so far.)


Code :
#ifndef __FPU_ROUNDING_H_
#define __FPU_ROUNDING_H_

static short control_word;
static short control_word2;

// round a float to int using the actual rounding mode
inline int float2int(float x) {
    int i;
    __asm {
        fld x
        fistp i
    }
    return i;
}

// set rounding mode to nearest
inline void set_round2near() {
    __asm      {
        fstcw    control_word        // store fpu control word
        mov     dx, word ptr [control_word]
        and        dx,0xf9ff            // round towards nearest (default)
        mov        control_word2, dx
        fldcw    control_word2        // load modfied control word
    }
}

// set rounding mode to round down
inline void set_round_down() {
    __asm  {
        fstcw    control_word        // store fpu control word
        mov     dx, word ptr [control_word]
        or        dx,0x0400            // round towards -inf
        and        dx,0xf7ff            
        mov        control_word2, dx
        fldcw    control_word2        // load modfied control word
    }
}

// set rounding mode to round up
inline void set_round_up() {
    __asm  {
        fstcw    control_word        // store fpu control word
        mov     dx, word ptr [control_word]
        or        dx,0x0800            // round towards +inf
        and        dx,0xfbff            
        mov        control_word2, dx
        fldcw    control_word2        // load modfied control word
    }
}

// set rounding mode to truncate
inline void set_truncate() {
    __asm  {
        fstcw    control_word        // store fpu control word
        mov     dx, word ptr [control_word]
        or        dx,0x0c00            // rounding: truncate
        mov        control_word2, dx
        fldcw    control_word2        // load modfied control word
    }
}

// restore original rounding mode
inline void restore_cw() {
    __asm fldcw    control_word
}

// truncate to integer using SSE
inline long truncateSSE(float x) {
    __asm cvttss2si eax,x
}

#endif


1 comment(s) | add a comment | nofrills version


Float-to-int, coverting an array of floats

References : Posted by Stefan Stenzel

Notes :
intel only

Code :
void f2short(float *fptr,short *iptr,int n)
{
_asm {
    mov     ebx,n
    mov     esi,fptr
    mov     edi,iptr
    lea     ebx,[esi+ebx*4]   ; ptr after last
    mov     edx,0x80008000    ; damn endianess confuses...
    mov     ecx,0x4b004b00    ; damn endianess confuses...
    mov     eax,[ebx]         ; get last value
    push    eax
    mov     eax,0x4b014B01
    mov     [ebx],eax              ; mark end
    mov     ax,[esi+2]
    jmp     startf2slp

;   Pad with nops to make loop start at address divisible
;   by 16 + 2, e.g. 0x01408062, don't ask why, but this
;   gives best performance. Unfortumately "align 16" does
;   not seem to work with my VC.
;   below I noted the measured execution times for different
;   nop-paddings on my Pentium Pro, 100 conversions.
;   saturation:  off pos neg


   nop         ;355 546 563 <- seems to be best
;   nop         ;951 547 544
;   nop         ;444 646 643
;   nop         ;444 646 643
;   nop         ;944 951 950
;   nop         ;358 447 644
;   nop         ;358 447 643
;   nop         ;358 544 643
;   nop         ;543 447 643
;   nop         ;643 447 643
;   nop         ;1047 546 746
;   nop         ;545 954 1253
;   nop         ;545 547 661
;   nop         ;544 547 746
;   nop         ;444 947 1147
;   nop         ;444 548 545
in_range:
    mov     eax,[esi]
    xor     eax,edx
saturate:
    lea     esi,[esi+4]
    mov     [edi],ax
    mov     ax,[esi+2]
    add     edi,2
startf2slp:
    cmp     ax,cx
    je      in_range
    mov     eax,edx
    js      saturate     ; saturate neg -> 0x8000
    dec     eax          ; saturate pos -> 0x7FFF
    cmp     esi,ebx      ; end reached ?
    jb      saturate
    pop     eax
    mov     [ebx],eax    ; restore end flag
    }
}


1 comment(s) | add a comment | nofrills version


Gaussian dithering

Type : Dithering
References : Posted by Aleksey Vaneev (picoder[AT]mail[DOT]ru)

Notes :
It is a more sophisticated dithering than simple RND. It gives the most low noise floor for the whole spectrum even without noise-shaping. You can use as big N as you can afford (it will not hurt), but 4 or 5 is generally enough.

Code :
Basically, next value is calculated this way (for RND going from -0.5 to 0.5):

dither = (RND+RND+...+RND) / N.
           \           /
            \_________/
              N times

If your RND goes from 0 to 1, then this code is applicable:

dither = (RND+RND+...+RND - 0.5*N) / N.


no comments on this item | add a comment | nofrills version


Gaussian random numbers

Type : random number generation
References : Posted by tobybear[AT]web[DOT]de

Notes :
// Gaussian random numbers
// This algorithm (adapted from "Natur als fraktale Grafik" by
// Reinhard Scholl) implements a generation method for gaussian
// distributed random numbers with mean=0 and variance=1
// (standard gaussian distribution) mapped to the range of
// -1 to +1 with the maximum at 0.
// For only positive results you might abs() the return value.
// The q variable defines the precision, with q=15 the smallest
// distance between two numbers will be 1/(2^q div 3)=1/10922
// which usually gives good results.

// Note: the random() function used is the standard random
// function from Delphi/Pascal that produces *linear*
// distributed numbers from 0 to parameter-1, the equivalent
// C function is probably rand().


Code :
const q=15;
      c1=(1 shl q)-1;
      c2=(c1 div 3)+1;
      c3=1/c1;

function GRandom:single;
begin
result:=(2*(random(c2)+random(c2)+random(c2))-3*(c2-1))*c3;
end;


no comments on this item | add a comment | nofrills version


Hermite Interpolator (x86 ASM)

Type : Hermite interpolator in x86 assembly (for MS VC++)
References : Posted by robert[DOT]bielik[AT]rbcaudio[DOT]com

Notes :
An "assemblified" variant of Laurent de Soras hermite interpolator. I tried to do calculations as parallell as I could muster, but there is almost certainly room for improvements. Right now, it works about 5.3 times (!) faster, not bad to start with...

Parameter explanation:
frac_pos: fractional value [0.0f - 1.0f] to interpolator
pntr: pointer to float array where:
pntr[0] = previous sample (idx = -1)
pntr[1] = current sample (idx = 0)
pntr[2] = next sample (idx = +1)
pntr[3] = after next sample (idx = +2)

The interpolation takes place between pntr[1] and pntr[2].

Regards,
/Robert Bielik
RBC Audio



Code :
const float c_half = 0.5f;

__declspec(naked) float __hermite(float frac_pos, const float* pntr)
{
    __asm
    {
        push    ecx;        
        mov        ecx, dword ptr[esp + 12]; //////////////////////////////////////////////////////////////////////////////////////////////////
        add        ecx, 0x04;            //    ST(0)        ST(1)        ST(2)        ST(3)        ST(4)        ST(5)        ST(6)        ST(7)        
        fld        dword ptr [ecx+4];    //    x1
        fsub    dword ptr [ecx-4];    //    x1-xm1
        fld        dword ptr [ecx];    //    x0            x1-xm1
        fsub    dword ptr [ecx+4];    //    v            x1-xm1
        fld        dword ptr [ecx+8];    //    x2            v            x1-xm1
        fsub    dword ptr [ecx];    //    x2-x0        v            x1-xm1
        fxch    st(2);                //    x1-m1        v            x2-x0
        fmul    c_half;                //    c            v            x2-x0
        fxch    st(2);                //    x2-x0        v            c
        fmul    c_half;                //    0.5*(x2-x0)    v            c
        fxch    st(2);                //    c            v            0.5*(x2-x0)
        fst        st(3);                //    c            v            0.5*(x2-x0)    c
        fadd    st(0), st(1);        //    w            v            0.5*(x2-x0)    c
        fxch    st(2);                //    0.5*(x2-x0)    v            w            c    
        faddp    st(1), st(0);        //  v+.5(x2-x0)    w            c    
        fadd    st(0), st(1);        //    a            w            c
        fadd    st(1), st(0);        //    a            b_neg        c
        fmul    dword ptr [esp+8];    //    a*frac        b_neg        c
        fsubp    st(1), st(0);        //    a*f-b        c
        fmul    dword ptr [esp+8];    //    (a*f-b)*f    c
        faddp    st(1), st(0);        //    res-x0/f
        fmul    dword ptr [esp+8];    //    res-x0
        fadd    dword ptr [ecx];    //    res
        pop        ecx;
        ret;
    }
}


4 comment(s) | add a comment | nofrills version


Hermite interpollation

References : Posted by various

Notes :
These are all different ways to do the same thing : hermite interpollation. Try'm all and benchmark.

Code :
// original
inline float hermite1(float x, float y0, float y1, float y2, float y3)
{
    // 4-point, 3rd-order Hermite (x-form)
    float c0 = y1;
    float c1 = 0.5f * (y2 - y0);
    float c2 = y0 - 2.5f * y1 + 2.f * y2 - 0.5f * y3;
    float c3 = 1.5f * (y1 - y2) + 0.5f * (y3 - y0);

    return ((c3 * x + c2) * x + c1) * x + c0;
}

// james mccartney
inline float hermite2(float x, float y0, float y1, float y2, float y3)
{
    // 4-point, 3rd-order Hermite (x-form)
    float c0 = y1;
    float c1 = 0.5f * (y2 - y0);
    float c3 = 1.5f * (y1 - y2) + 0.5f * (y3 - y0);
    float c2 = y0 - y1 + c1 - c3;

    return ((c3 * x + c2) * x + c1) * x + c0;
}

// james mccartney
inline float hermite3(float x, float y0, float y1, float y2, float y3)
{
        // 4-point, 3rd-order Hermite (x-form)
        float c0 = y1;
        float c1 = 0.5f * (y2 - y0);
        float y0my1 = y0 - y1;
        float c3 = (y1 - y2) + 0.5f * (y3 - y0my1 - y2);
        float c2 = y0my1 + c1 - c3;

        return ((c3 * x + c2) * x + c1) * x + c0;
}

// laurent de soras
inline float hermite4(float frac_pos, float xm1, float x0, float x1, float x2)
{
   const float    c     = (x1 - xm1) * 0.5f;
   const float    v     = x0 - x1;
   const float    w     = c + v;
   const float    a     = w + v + (x2 - x0) * 0.5f;
   const float    b_neg = w + a;

   return ((((a * frac_pos) - b_neg) * frac_pos + c) * frac_pos + x0);
}


9 comment(s) | add a comment | nofrills version


Linear interpolation

Type : Linear interpolators for oversampled audio
References : Posted by scoofy[AT]inf[DOT]elte[DOT]hu

Notes :
Simple, fast linear interpolators for upsampling a signal by a factor of 2,4,8,16 or 32. Not very usable on their own since they introduce aliasing (but still better than zero order hold). These are best used with already oversampled signals.

-- Peter Schoffhauzer


Code :
#ifndef __LIN_INTERPOLATOR_H_
#define __LIN_INTERPOLATOR_H_

/************************************************************************
*    Linear interpolator class                                            *
************************************************************************/

class interpolator_linear
{
public:
    interpolator_linear() {
        reset_hist();
    }
    
    // reset history
    void reset_hist() {
        d1 = 0.f;
    }

    // 2x interpolator
    // out: pointer to float[2]
    inline void process2x(float const in, float *out) {
        out[0] = d1 + 0.5f*(in-d1);    // interpolate
        out[1] = in;
        d1 = in; // store delay
    }

    // 4x interpolator
    // out: pointer to float[4]
    inline void process4x(float const in, float *out) {
        float y = in-d1;
        out[0] = d1 + 0.25f*y;    // interpolate
        out[1] = d1 + 0.5f*y;
        out[2] = d1 + 0.75f*y;
        out[3] = in;
        d1 = in; // store delay
    }

    // 8x interpolator
    // out: pointer to float[8]
    inline void process8x(float const in, float *out) {
        float y = in-d1;
        out[0] = d1 + 0.125f*y;    // interpolate
        out[1] = d1 + 0.25f*y;
        out[2] = d1 + 0.375f*y;
        out[3] = d1 + 0.5f*y;
        out[4] = d1 + 0.625f*y;
        out[5] = d1 + 0.75f*y;
        out[6] = d1 + 0.875f*y;
        out[7] = in;
        d1 = in; // store delay
    }

    // 16x interpolator
    // out: pointer to float[16]
    inline void process16x(float const in, float *out) {
        float y = in-d1;
        out[0] = d1 + (1.0f/16.0f)*y;    // interpolate
        out[1] = d1 + (2.0f/16.0f)*y;    
        out[2] = d1 + (3.0f/16.0f)*y;    
        out[3] = d1 + (4.0f/16.0f)*y;    
        out[4] = d1 + (5.0f/16.0f)*y;    
        out[5] = d1 + (6.0f/16.0f)*y;    
        out[6] = d1 + (7.0f/16.0f)*y;    
        out[7] = d1 + (8.0f/16.0f)*y;    
        out[8] = d1 + (9.0f/16.0f)*y;    
        out[9] = d1 + (10.0f/16.0f)*y;    
        out[10] = d1 + (11.0f/16.0f)*y;    
        out[11] = d1 + (12.0f/16.0f)*y;    
        out[12] = d1 + (13.0f/16.0f)*y;    
        out[13] = d1 + (14.0f/16.0f)*y;    
        out[14] = d1 + (15.0f/16.0f)*y;    
        out[15] = in;        
        d1 = in; // store delay
    }

    // 32x interpolator
    // out: pointer to float[32]
    inline void process32x(float const in, float *out) {
        float y = in-d1;
        out[0] = d1 + (1.0f/32.0f)*y;    // interpolate
        out[1] = d1 + (2.0f/32.0f)*y;    
        out[2] = d1 + (3.0f/32.0f)*y;    
        out[3] = d1 + (4.0f/32.0f)*y;    
        out[4] = d1 + (5.0f/32.0f)*y;    
        out[5] = d1 + (6.0f/32.0f)*y;    
        out[6] = d1 + (7.0f/32.0f)*y;    
        out[7] = d1 + (8.0f/32.0f)*y;    
        out[8] = d1 + (9.0f/32.0f)*y;    
        out[9] = d1 + (10.0f/32.0f)*y;    
        out[10] = d1 + (11.0f/32.0f)*y;    
        out[11] = d1 + (12.0f/32.0f)*y;    
        out[12] = d1 + (13.0f/32.0f)*y;    
        out[13] = d1 + (14.0f/32.0f)*y;    
        out[14] = d1 + (15.0f/32.0f)*y;    
        out[15] = d1 + (16.0f/32.0f)*y;    
        out[16] = d1 + (17.0f/32.0f)*y;    
        out[17] = d1 + (18.0f/32.0f)*y;    
        out[18] = d1 + (19.0f/32.0f)*y;    
        out[19] = d1 + (20.0f/32.0f)*y;    
        out[20] = d1 + (21.0f/32.0f)*y;    
        out[21] = d1 + (22.0f/32.0f)*y;    
        out[22] = d1 + (23.0f/32.0f)*y;    
        out[23] = d1 + (24.0f/32.0f)*y;    
        out[24] = d1 + (25.0f/32.0f)*y;    
        out[25] = d1 + (26.0f/32.0f)*y;    
        out[26] = d1 + (27.0f/32.0f)*y;    
        out[27] = d1 + (28.0f/32.0f)*y;    
        out[28] = d1 + (29.0f/32.0f)*y;    
        out[29] = d1 + (30.0f/32.0f)*y;    
        out[30] = d1 + (31.0f/32.0f)*y;    
        out[31] = in;        
        d1 = in; // store delay
    }

private:
    float d1; // previous input
};

#endif


2 comment(s) | add a comment | nofrills version


Lock free fifo

References : Posted by Timo
Linked file : LockFreeFifo.h

Notes :
Simple implementation of a lock free FIFO.



2 comment(s) | add a comment | nofrills version


Matlab Time Domain Impulse Response Inverter/Divider

References : Posted by arcane[AT]arcanemethods[DOT]com

Notes :
Matlab code for time domain inversion of an impulse response or the division of two of them (transfer function.) The main teoplitz function is given both as a .m file and as a .c file for Matlab'w MEX compilation. The latter is much faster.



Code :
function inv=invimplms(den,n,d)
% syntax inv=invimplms(den,n,d)
%         den - denominator impulse
%         n   - length of result
%         d   - delay of result
%         inv - inverse impulse response of length n with delay d
%
% Levinson-Durbin algorithm from Proakis and Manolokis p.865
%
% Author: Bob Cain, May 1, 2001 arcane[AT]arcanemethods[DOT]com

    m=xcorr(den,n-1);
    m=m(n:end);
    b=[den(d+1:-1:1);zeros(n-d-1,1)];
    inv=toepsolve(m,b);



function quo=divimplms(num,den,n,d)
%Syntax quo=divimplms(num,den,n,d)
%       num - numerator impulse
%       den - denominator impulse
%       n   - length of result
%       d   - delay of result
%        quo - quotient impulse response of length n delayed by d
%
% Levinson-Durbin algorithm from Proakis and Manolokis p.865
%
% Author: Bob Cain, May 1, 2001 arcane@arcanemethods.com

    m=xcorr(den,n-1);
    m=m(n:end);
    b=xcorr([zeros(d,1);num],den,n-1);
    b=b(n:-1:1);
    quo=toepsolve(m,b);


function hinv=toepsolve(r,q)
% Solve Toeplitz system of equations.
%    Solves R*hinv = q, where R is the symmetric Toeplitz matrix
%    whos first column is r
%    Assumes all inputs are real
%    Inputs:  
%       r - first column of Toeplitz matrix, length n
%       q - rhs vector, length n
%    Outputs:
%       hinv - length n solution
%
%   Algorithm from Roberts & Mullis, p.233
%
%   Author: T. Krauss, Sept 10, 1997
%
%   Modified: R. Cain, Dec 16, 2004 to remove a pair of transposes
%             that caused errors.

    n=length(q);
    a=zeros(n+1,2);
    a(1,1)=1;

    hinv=zeros(n,1);
    hinv(1)=q(1)/r(1);

    alpha=r(1);
    c=1;
    d=2;

    for k=1:n-1,
       a(k+1,c)=0;
       a(1,d)=1;
       beta=0;
       j=1:k;
       beta=sum(r(k+2-j).*a(j,c))/alpha;
       a(j+1,d)=a(j+1,c)-beta*a(k+1-j,c);
       alpha=alpha*(1-beta^2);
       hinv(k+1,1)=(q(k+1)-sum(r(k+2-j).*hinv(j,1)))/alpha;
       hinv(j)=hinv(j)+a(k+2-j,d)*hinv(k+1);
       temp=c;
       c=d;
       d=temp;
    end


-----What follows is the .c version of toepsolve--------

#include <math.h>
#include "mex.h"

/*  function hinv = toepsolve(r,q);
*  TOEPSOLVE  Solve Toeplitz system of equations.
*    Solves R*hinv = q, where R is the symmetric Toeplitz matrix
*    whos first column is r
*    Assumes all inputs are real
*    Inputs:  
*       r - first column of Toeplitz matrix, length n
*       q - rhs vector, length n
*    Outputs:
*       hinv - length n solution
*
*   Algorithm from Roberts & Mullis, p.233
*
*   Author: T. Krauss, Sept 10, 1997
*
*   Modified: R. Cain, Dec 16, 2004 to replace unnecessasary
*             n by n matrix allocation for a with an n by 2 rotating
*             buffer and to more closely match the .m function.
*/
void mexFunction(
    int nlhs,
    mxArray *plhs[],
    int nrhs,
    const mxArray *prhs[]
)
{
   double (*a)[2],*hinv,alpha,beta;
   int c,d,temp,j,k;
  
   double eps = mxGetEps();
   int n = (mxGetN(prhs[0])>=mxGetM(prhs[0])) ? mxGetN(prhs[0]) : mxGetM(prhs[0]) ;
   double *r = mxGetPr(prhs[0]);
   double *q = mxGetPr(prhs[1]);
  
   a = (double (*)[2])mxCalloc((n+1)*2,sizeof(double));
   if (a == NULL) {
       mexErrMsgTxt("Sorry, failed to allocate buffer.");
   }
   a[0][0]=1.0;
  
   plhs[0] = mxCreateDoubleMatrix(n,1,0);
   hinv = mxGetPr(plhs[0]);
   hinv[0] = q[0]/r[0];
  
   alpha=r[0];
   c=0;
   d=1;
  
   for (k = 1; k < n; k++) {
       a[k][c] = 0;
       a[0][d] = 1.0;
       beta = 0.0;
       for (j = 1; j <= k; j++) {
           beta += r[k+1-j]*a[j-1][c];
       }
       beta /= alpha;
       for (j = 1; j <= k; j++) {
           a[j][d] = a[j][c] - beta*a[k-j][c];
       }
       alpha *= (1 - beta*beta);
       hinv[k] = q[k];
       for (j = 1; j <= k; j++) {
           hinv[k] -= r[k+1-j]*hinv[j-1];
       }
       hinv[k] /= alpha;
       for (j = 1; j <= k; j++) {
           hinv[j-1] += a[k+1-j][d]*hinv[k];
       }
       temp=c;
       c=d;
       d=temp;
   } /* loop over k */
  
   mxFree(a);

   return;
}





no comments on this item | add a comment | nofrills version


MATLAB-Tools for SNDAN

References : Posted by Markus Sapp
Linked file : other001.zip

Notes :
(see linkfile)



no comments on this item | add a comment | nofrills version


MIDI note/frequency conversion

Type : -
References : Posted by tobybear[AT]web[DOT]de

Notes :
I get often asked about simple things like MIDI note/frequency conversion, so I thought I could as well post some source code about this.
The following is Pascal/Delphi syntax, but it shouldn't be a problem to convert it to almost any language in no time.

Uses for this code are mainly for initializing oscillators to the right frequency based upon a given MIDI note, but you might also check what MIDI note is closest to a given frequency for pitch detection etc.
In realtime applications it might be a good idea to get rid of the power and log2 calculations and generate a lookup table on initialization.

A full Pascal/Delphi unit with these functions (including lookup table generation) and a simple demo application can be downloaded here:
http://tobybear.phreque.com/dsp_conv.zip

If you have any comments/suggestions, please send them to: tobybear@web.de


Code :
// MIDI NOTE/FREQUENCY CONVERSIONS

const notes:array[0..11] of string= ('C ','C#','D ','D#','E ','F ','F#','G ','G#','A ','A#','B ');
const base_a4=440; // set A4=440Hz

// converts from MIDI note number to frequency
// example: NoteToFrequency(12)=32.703
function NoteToFrequency(n:integer):double;
begin
if (n>=0)and(n<=119) then
  result:=base_a4*power(2,(n-57)/12)
else result:=-1;
end;

// converts from MIDI note number to string
// example: NoteToName(12)='C 1'
function NoteToName(n:integer):string;
begin
if (n>=0)and(n<=119) then
  result:=notes[n mod 12]+inttostr(n div 12)
else result:='---';
end;

// converts from frequency to closest MIDI note
// example: FrequencyToNote(443)=57 (A 4)
function FrequencyToNote(f:double):integer;
begin
result:=round(12*log2(f/base_a4))+57;
end;

// converts from string to MIDI note
// example: NameToNote('A4')=57
function NameToNote(s:string):integer;
var c,i:integer;
begin
if length(s)=2 then s:=s[1]+' '+s[2];
if length(s)<>3 then begin result:=-1;exit end;
s:=uppercase(s);
c:=-1;
for i:=0 to 11 do
  if notes[i]=copy(s,1,2) then
  begin
   c:=i;
   break
  end;
try
  i:=strtoint(s[3]);
  result:=i*12+c;
except
  result:=-1;
end;
if c<0 then result:=-1;
end;


3 comment(s) | add a comment | nofrills version


Millimeter to DB (faders...)

References : Posted by James McCartney

Notes :
These two functions reproduce a traditional professional
mixer fader taper.

MMtoDB converts millimeters of fader travel from the
bottom of the fader for a 100 millimeter fader into
decibels. DBtoMM is the inverse.

The taper is as follows from the top:
The top of the fader is +10 dB
100 mm to 52 mm : -5 dB per 12 mm
52 mm to 16 mm : -10 dB per 12 mm
16 mm to 4 mm : -20 dB per 12 mm
4 mm to 0 mm : fade to zero. (in these functions I go to -200dB
which is effectively zero for up to 32 bit audio.)


Code :
float MMtoDB(float mm)
{
        float db;

        mm = 100. - mm;

        if (mm <= 0.) {
                db = 10.;
        } else if (mm < 48.) {
                db = 10. - 5./12. * mm;
        } else if (mm < 84.) {
                db = -10. - 10./12. * (mm - 48.);
        } else if (mm < 96.) {
                db = -40. - 20./12. * (mm - 84.);
        } else if (mm < 100.) {
                db = -60. - 35. * (mm - 96.);
        } else db = -200.;
        return db;
}

float DBtoMM(float db)
{
        float mm;
        if (db >= 10.) {
                mm = 0.;
        } else if (db > -10.) {
                mm = -12./5. * (db - 10.);
        } else if (db > -40.) {
                mm = 48. - 12./10. * (db + 10.);
        } else if (db > -60.) {
                mm = 84. - 12./20. * (db + 40.);
        } else if (db > -200.) {
                mm = 96. - 1./35. * (db + 60.);
        } else mm = 100.;

        mm = 100. - mm;

        return mm;
}


2 comment(s) | add a comment | nofrills version


Motorola 56300 Disassembler

Type : disassembler
References : Posted by chris_townsend[AT]digidesign[DOT]com
Linked file : Disassemble56k.zip

Notes :
This code contains functions to disassemble Motorola 56k opcodes. The code was originally created by Stephen Davis at Stanford. I made minor modifications to support many 56300 opcodes, although it would nice to add them all at some point. Specifically, I added support for CLB, NORMF, immediate AND, immediate OR, multi-bit ASR/ASL, multi-bit LSL/LSR, MAX, MAXM, BRA, Bcc, BSR, BScc, DMAC, MACsu, MACuu, and conditional ALU instructions.



3 comment(s) | add a comment | nofrills version


Noise Shaping Class

Type : Dithering with 9th order noise shaping
References : Posted by cshei[AT]indiana.edu
Linked file : NS9dither16.h

Notes :
This is an implemetation of a 9th order noise shaping & dithering class, that runs quite fast (it has one function that uses Intel x86 assembly, but you can replace it with a different rounding function if you are running on a non-Intel platform). _aligned_malloc and _aligned_free require the MSVC++ Processor Pack, available from www.microsoft.com. You can replace them with "new" and "delete," but allocating aligned memory seems to make it run faster. Also, you can replace ZeroMemory with a memset that sets the memory to 0 if you aren't using Win32.
Input should be floats from -32768 to 32767 (processS will clip at these points for you, but clipping is bad when you are trying to convert floats to shorts). Note to reviewer - it would probably be better if you put the code in a file such as NSDither.h and have a link to it - it's rather long.

(see linked file)




2 comment(s) | add a comment | nofrills version


Nonblocking multiprocessor/multithread algorithms in C++

Type : queue, stack, garbage collection, memory allocation, templates for atomic algorithms and types
References : Posted by joshscholar[AT]yahoo[DOT]com
Linked file : ATOMIC.H

Notes :
see linked file...



2 comment(s) | add a comment | nofrills version


Piecewise quadratic approximate exponential function

Type : Approximation of base-2 exponential function
References : Posted by Johannes M.-R.

Notes :
The code assumes round-to-zero mode, and ieee 754 float.
To achieve other bases, multiply the input by the logarithmus dualis of the base.


Code :
inline float fpow2(const float y)
{
    union
    {
        float f;
        int i;
    } c;

    int integer = (int)y;
    if(y < 0)
        integer = integer-1;

    float frac = y - (float)integer;

    c.i = (integer+127) << 23;
    c.f *= 0.33977f*frac*frac + (1.0f-0.33977f)*frac + 1.0f;

    return c.f;
}


no comments on this item | add a comment | nofrills version


please add it as a comment to the Denormalization preventer entry (no comments are allowed now) thanks

Type : quantization
References : Posted by scoofy[AT]inf[DOT]elte[DOT]hu

Notes :
You can zero out denormals by adding and subtracting a small number.

void kill_denormal_by_quantization(float &val)
{
static const float anti_denormal = 1e-18;
val += anti_denormal;
val -= anti_denormal;
}

Reference: Laurent de Soras' great article on denormal numbers:
http://ldesoras.free.fr/doc/articles/denormal-en.pdf

I tend to add DC because it is faster than quantization. A slight DC offset (0.000000000000000001) won't hurt. That's -360 decibels...




3 comment(s) | add a comment | nofrills version


pow(x,4) approximation

References : Posted by Stefan Stenzel

Notes :
Very hacked, but it gives a rough estimate of x**4 by modifying exponent and mantissa.

Code :
float p4fast(float in)
{
  long *lp,l;

  lp=(long *)(&in);
  l=*lp;

  l-=0x3F800000l; /* un-bias */
  l<<=2;          /* **4 */
  l+=0x3F800000l; /* bias */
  *lp=l;

  /* compiler will read this from memory since & operator had been used */
  return in;
}


no comments on this item | add a comment | nofrills version


rational tanh approximation

Type : saturation
References : Posted by cschueler

Notes :
This is a rational function to approximate a tanh-like soft clipper. It is based on the pade-approximation of the tanh function with tweaked coefficients.

The function is in the range x=-3..3 and outputs the range y=-1..1. Beyond this range the output must be clamped to -1..1.

The first to derivatives of the function vanish at -3 and 3, so the transition to the hard clipped region is C2-continuous.



Code :
float rational_tanh(x)
{
    if( x < -3 )
        return -1;
    else if( x > 3 )
        return 1;
    else
        return x * ( 27 + x * x ) / ( 27 + 9 * x * x );
}


5 comment(s) | add a comment | nofrills version


Reading the compressed WA! parts in gigasampler files

References : Paul Kellett
Linked file : gigxpand.zip

Notes :
(see linkfile)
Code to read the .WA! (compressed .WAV) parts of GigaSampler .GIG files.
For related info on reading .GIG files see http://www.linuxdj.com/evo




no comments on this item | add a comment | nofrills version


Real basic DSP with Matlab (+ GUI) ...

Type : Like effects racks, made with Matlab !
References : Posted by guillaume[DOT]carniato[AT]meletu[DOT]univ-valenciennes[DOT]fr
Linked file : http://www.xenggeng.fr.st/ici/guitou/Matlab Music.zip

Notes :
You need Matlab v6.0 or more to run this stuff...

Code :
take a look at http://www.xenggeng.fr.st/ici/guitou/Matlab Music.zip

I'm now working on a Matlab - sequencer, which will certainly use 'Matlab Music'. I'm interested in integrating WaveWarp in this project; it's a toolbox that allow you to make real time DSP with Matlab.

If you're ok to improve this version (add effects, improve effects quality,anything else...) let's go ! Email me if you're interested in developing this beginner work...


2 comment(s) | add a comment | nofrills version


real value vs display value

Type : Macro
References : Posted by emil[AT]arpanet[DOT]no

Notes :

REALVAL converts the vst param at given ranges to a display value.
VSTVAL does the opposite.
a = start
b = end


Code :
#define REALVAL(a, b, vstval) (a + (vstval)*(b-a))
#define VSTVAL(a, b, realval) ((realval-a)/(b-a))


2 comment(s) | add a comment | nofrills version


Really fast x86 floating point sin/cos

References : Posted by rerdavies[AT]msn[DOT]com
Linked file : sincos.zip

Notes :
Frightful code from the Intel Performance optimization front. Not for the squeamish.

The following code calculates sin and cos of a floating point value on x86 platforms to 20 bits precision with 2 multiplies and two adds. The basic principle is to use sin(x+y) and cos(x+y) identities to generate the result from lookup tables. Each lookup table takes care of 10 bits of precision in the input. The same principle can be used to generate sin/cos to full (! Really. Full!) 24-bit float precision using two 8-bit tables, and one 10 bit table (to provide guard bits), for a net speed gain of about 4x over fsin/fcos, and 8x if you want both sin and cos. Note that microsoft compilers have trouble keeping doubles aligned properly on the stack (they must be 8-byte aligned in order not to incur a massive alignment penalty). As a result, this class should NOT be allocated on the stack. Add it as a member variable to any class that uses it.

e.g.
class CSomeClass {
CQuickTrig m_QuickTrig;
...
mQuickTrig.QuickSinCos(dAngle,fSin,fCos);
...
}


Code :
(see attached file)

no comments on this item | add a comment | nofrills version


Reasonably accurate/fastish tanh approximation

References : Posted by Fuzzpilz

Notes :
Fairly obvious, but maybe not obvious enough, since I've seen calls to tanh() in code snippets here.

It's this, basically:

tanh(x) = sinh(x)/cosh(x)
= (exp(x) - exp(-x))/(exp(x) + exp(-x))
= (exp(2x) - 1)/(exp(2x) + 1)

Combine this with a somewhat less accurate approximation for exp than usual (I use a third-order Taylor approximation below), and you're set. Useful for waveshaping.

Notes on the exp approximation:

It only works properly for input values above x, but since tanh is odd, that isn't a problem.

exp(x) = 1 + x + x^2/(2!) + x^3/(3!) + ...

Breaking the Taylor series off after the third term, I get

1 + x + x^2/2 + x^3/6.

I can save some multiplications by using

6 + x * (6 + x * (3 + x))

instead; a division by 6 becomes necessary, but is lumped into the additions in the tanh part:

(a/6 - 1)/(a/6 + 1) = (a - 6)/(a + 6).

Accuracy:

I haven't looked at this in very great detail, but it's always <= the real tanh (>= for x<0), and the greatest deviation occurs at about +/- 1.46, where the result is ca. .95 times the correct value.


This is still faster than tanh if you use a better approximation for the exponential, even if you simply call exp.
There are probably additional ways of improving parts of this, and naturally if you're going to use it you'll want to figure out whether your particular application offers additional ways of simplifying it, but it's a good start.


Code :
/* single precision absolute value, a lot faster than fabsf() (if you use MSVC++ 6 Standard - others' implementations might be less slow) */
float sabs(float a)
{
int b=(*((int *)(&a)))&0x7FFFFFFF;
return *((float *)(&b));
}

/* approximates tanh(x/2) rather than tanh(x) - depending on how you're using this, fixing that could well be wasting a multiplication (though that isn't much, and it could be done with an integer addition in sabs instead)  */
float tanh2(float x)
{
float a=sabs(x);
a=6+a*(6+a*(3+a));
return ((x<0)?-1:1)*(a-6)/(a+6); /* instead of using <, you can also check directly whether the sign bit is set ((*((int *)(&x)))&0x80000000), but it's not really worth it */
}


16 comment(s) | add a comment | nofrills version


resampling

Type : linear interpolated aliased resampling of a wave file
References : Posted by mail[AT]mroc[DOT]de

Notes :
som resampling stuff. the code is heavily used in MSynth, but do not lough about ;-)

perhaps, prefiltering would reduce aliasing.


Code :
signed short* pSample = ...;
unsigned int sampleLength = ...;

// stretch sample to length of one bar...
float playPosDelta = sampleLength / ( ( 240.0f / bpm ) * samplingRate );

// requires for position calculation...
float playpos1 = 0.0f;
unsigned int iter = 0;

// required for interpolation...
unsigned int i1, i2;

float* pDest = ....;
float* pDestStop = pDest + len;
for( float *pt=pDest;pt<pDestStop;++pt )
{
    // linear interpolation...
    i1 = (unsigned int)playpos;
    i2 = i1 + 1;
    (*pt) = ((pSample[i2]-pSample[i1]) * (playpos - i1) + pSample[i1]);

    // position calculation preventing float sumation error...
    playpos1 = (++iter) * playposIncrement;
}
...


4 comment(s) | add a comment | nofrills version


Saturation

Type : Waveshaper
References : Posted by Bram

Notes :
when the input is below a certain threshold (t) these functions return the input, if it goes over that threshold, they return a soft shaped saturation.
Neigther claims to be fast ;-)


Code :
float saturate(float x, float t)
{
    if(fabs(x)<t)
        return x
    else
    {
        if(x > 0.f);
            return t + (1.f-t)*tanh((x-t)/(1-t));
        else
            return -(t + (1.f-t)*tanh((-x-t)/(1-t)));
}
}

or

float sigmoid(x)
{
    if(fabs(x)<1)
        return x*(1.5f - 0.5f*x*x);
    else
        return x > 0.f ? 1.f : -1.f;
}

float saturate(float x, float t)
{
    if(abs(x)<t)
        return x
    else
    {
        if(x > 0.f);
            return t + (1.f-t)*sigmoid((x-t)/((1-t)*1.5f));
        else
            return -(t + (1.f-t)*sigmoid((-x-t)/((1-t)*1.5f)));
    }
}


2 comment(s) | add a comment | nofrills version


Sin(x) Aproximation (with SSE code)

References : Posted by williamk[AT]wusik[DOT]com

Notes :
Sin Aproximation: sin(x) = x + ( x * (-x * x / 6));

This is very handy and fast, but not precise. Below you will find a simple SSE code.

Remember that all movaps command requires 16 bit aligned variables.


Code :
SSE code for computing only ONE value (scalar)
Replace all "ss" with "ps" if you want to calculate 4 values. And instead of "movps" use "movaps".

movss    xmm1,    xmm0                        ; xmm0 = x
mulss    xmm1,    Filter_GenVal[k_n1]            ; * -1
mulss    xmm1,    xmm0                        ; -x * x
divss    xmm1,    Filter_GenVal[k_6]            ; / 6
mulss    xmm1,    xmm0
addss    xmm0,    xmm1


2 comment(s) | add a comment | nofrills version


Sin, Cos, Tan approximation

References : http://www.wild-magic.com
Linked file : approx.h

Notes :
Code for approximation of cos, sin, tan and inv sin, etc.
Surprisingly accurate and very usable.

[edit by bram]
this code is taken literaly from
http://www.wild-magic.com/SourceCode.html
Go have a look at the MgcMath.h and MgcMath.cpp files in their library...
[/edit]




2 comment(s) | add a comment | nofrills version