|
|
|
|
 |
VST SDK GUI Switch without
References : Posted by quintosardo[AT]yahoo[DOT]it
Notes : In VST GUI an on-vaue is represented by 1.0 and off by 0.0.
Code : Say you have two signals you want to switch between when the user changes a switch.
You could do:
if(fSwitch == 0.f) //fSwitch is either 0.0 or 1.0
output = input1
else
output = input2
However, you can avoid the branch by doing:
output = input1 * (1.f - fSwitch) + input2 * fSwitch
Which would be like a quick mix. You could make the change clickless by adding a simple one-pole filter:
smooth = filter(fSwitch)
output = input1 * (1.f - smooth) + input2 * smooth
3 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
16-Point Fast Integer Sinc Interpolator.
References : Posted by mumart[at]gmail[dot]com
Notes : This is designed for fast upsampling with good quality using only a 32-bit accumulator. Sound quality is very good. Conceptually it resamples the input signal 32768x and performs nearest-neighbour to get the requested sample rate. As a result downsampling will result in aliasing.
The provided Sinc table is Blackman-Harris windowed with a slight lowpass. The table entries are 16-bit and are 16x linear-oversampled. It should be pretty easy to figure out how to make your own table for it.
Code provided is in Java. Converting to C/MMX etc. should be pretty trivial.
Remember the interpolator requires a number of samples before and after the sample to be interpolated, so you can't resample the whole of a passed input buffer in one go.
Have fun,
Martin
Code :
public class SincResampler {
private final int FP_SHIFT = 15;
private final int FP_ONE = 1 << FP_SHIFT;
private final int FP_MASK = FP_ONE - 1;
private final int POINT_SHIFT = 4; // 16 points
private final int OVER_SHIFT = 4; // 16x oversampling
private final short[] table = {
0, -7, 27, -71, 142, -227, 299, 32439, 299, -227, 142, -71, 27, -7, 0, 0,
0, 0, -5, 36, -142, 450, -1439, 32224, 2302, -974, 455, -190, 64, -15, 2, 0,
0, 6, -33, 128, -391, 1042, -2894, 31584, 4540, -1765, 786, -318, 105, -25, 3, 0,
0, 10, -55, 204, -597, 1533, -4056, 30535, 6977, -2573, 1121, -449, 148, -36, 5, 0,
-1, 13, -71, 261, -757, 1916, -4922, 29105, 9568, -3366, 1448, -578, 191, -47, 7, 0,
-1, 15, -81, 300, -870, 2185, -5498, 27328, 12263, -4109, 1749, -698, 232, -58, 9, 0,
-1, 15, -86, 322, -936, 2343, -5800, 25249, 15006, -4765, 2011, -802, 269, -68, 10, 0,
-1, 15, -87, 328, -957, 2394, -5849, 22920, 17738, -5298, 2215, -885, 299, -77, 12, 0,
0, 14, -83, 319, -938, 2347, -5671, 20396, 20396, -5671, 2347, -938, 319, -83, 14, 0,
0, 12, -77, 299, -885, 2215, -5298, 17738, 22920, -5849, 2394, -957, 328, -87, 15, -1,
0, 10, -68, 269, -802, 2011, -4765, 15006, 25249, -5800, 2343, -936, 322, -86, 15, -1,
0, 9, -58, 232, -698, 1749, -4109, 12263, 27328, -5498, 2185, -870, 300, -81, 15, -1,
0, 7, -47, 191, -578, 1448, -3366, 9568, 29105, -4922, 1916, -757, 261, -71, 13, -1,
0, 5, -36, 148, -449, 1121, -2573, 6977, 30535, -4056, 1533, -597, 204, -55, 10, 0,
0, 3, -25, 105, -318, 786, -1765, 4540, 31584, -2894, 1042, -391, 128, -33, 6, 0,
0, 2, -15, 64, -190, 455, -974, 2302, 32224, -1439, 450, -142, 36, -5, 0, 0,
0, 0, -7, 27, -71, 142, -227, 299, 32439, 299, -227, 142, -71, 27, -7, 0
};
/*
private final int POINT_SHIFT = 1; // 2 points
private final int OVER_SHIFT = 0; // 1x oversampling
private final short[] table = {
32767, 0,
0 , 32767
};
*/
private final int POINTS = 1 << POINT_SHIFT;
private final int INTERP_SHIFT = FP_SHIFT - OVER_SHIFT;
private final int INTERP_BITMASK = ( 1 << INTERP_SHIFT ) - 1;
/*
input - array of input samples
inputPos - sample position ( must be at least POINTS/2 + 1, ie. 7 )
inputFrac - fractional sample position ( 0 <= inputFrac < FP_ONE )
step - number of input samples per output sample * FP_ONE
lAmp - left output amplitude ( 1.0 = FP_ONE )
lBuf - left output buffer
rAmp - right output amplitude ( 1.0 = FP_ONE )
rBuf - right output buffer
pos - offset into output buffers
count - number of output samples to produce
*/
public void resample( short[] input, int inputPos, int inputFrac, int step,
int lAmp, int[] lBuf, int rAmp, int[] rBuf, int pos, int count ) {
for( int p = 0; p < count; p++ ) {
int tabidx1 = ( inputFrac >> INTERP_SHIFT ) << POINT_SHIFT;
int tabidx2 = tabidx1 + POINTS;
int bufidx = inputPos - POINTS/2 + 1;
int a1 = 0, a2 = 0;
for( int t = 0; t < POINTS; t++ ) {
a1 += table[ tabidx1++ ] * input[ bufidx ] >> 15;
a2 += table[ tabidx2++ ] * input[ bufidx ] >> 15;
bufidx++;
}
int out = a1 + ( ( a2 - a1 ) * ( inputFrac & INTERP_BITMASK ) >> INTERP_SHIFT );
lBuf[ pos ] += out * lAmp >> FP_SHIFT;
rBuf[ pos ] += out * rAmp >> FP_SHIFT;
pos++;
inputFrac += step;
inputPos += inputFrac >> FP_SHIFT;
inputFrac &= FP_MASK;
}
}
}
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
16-to-8-bit first-order dither
Type : First order error feedforward dithering code References : Posted by Jon Watte
Notes : This is about as simple a dithering algorithm as you can implement, but it's likely to sound better than just truncating to N bits.
Note that you might not want to carry forward the full difference for infinity. It's probably likely that the worst performance hit comes from the saturation conditionals, which can be avoided with appropriate instructions on many DSPs and integer SIMD type instructions, or CMOV.
Last, if sound quality is paramount (such as when going from > 16 bits to 16 bits) you probably want to use a higher-order dither function found elsewhere on this site.
Code : // This code will down-convert and dither a 16-bit signed short
// mono signal into an 8-bit unsigned char signal, using a first
// order forward-feeding error term dither.
#define uchar unsigned char
void dither_one_channel_16_to_8( short * input, uchar * output, int count, int * memory )
{
int m = *memory;
while( count-- > 0 ) {
int i = *input++;
i += m;
int j = i + 32768 - 128;
uchar o;
if( j < 0 ) {
o = 0;
}
else if( j > 65535 ) {
o = 255;
}
else {
o = (uchar)((j>>8)&0xff);
}
m = ((j-32768+128)-i);
*output++ = o;
}
*memory = m;
}
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
3rd order Spline interpollation
References : Posted by Dave from Muon Software, originally from Josh Scholar
Notes : (from Joshua Scholar about Spline interpollation in general...)
According to sampling theory, a perfect interpolation could be found by replacing each sample with a sinc function centered on that sample, ringing at your target nyquest frequency, and at each target point you just sum all of contributions from the sinc functions of every single point in source.
The sinc function has ringing that dies away very slowly, so each target sample will have to have contributions from a large neighborhood of source samples. Luckily, by definition the sinc function is bandwidth limited, so once we have a source that is prefilitered for our target nyquest frequency and reasonably oversampled relative to our nyquest frequency, ordinary interpolation techniques are quite fruitful even though they would be pretty useless if we hadn't oversampled.
We want an interpolation routine that at very least has the following characteristics:
1. Obviously it's continuous. But since finite differencing a signal (I don't really know about true differentiation) is equivalent to a low frequency attenuator that drops only about 6 dB per octave, continuity at the higher derivatives is important too.
2. It has to be stiff enough to find peaks when our oversampling missed them. This is where what I said about the combination the sinc function's limited bandwidth and oversampling making interpolation possible comes into play.
I've read some papers on splines, but most stuff on splines relates to graphics and uses a control point descriptions that is completely irrelevant to our sort of interpolation. In reading this stuff I quickly came to the conclusion that splines:
1. Are just piecewise functions made of polynomials designed to have some higher order continuity at the transition points.
2. Splines are highly arbitrary, because you can choose arbitrary derivatives (to any order) at each transition. Of course the more you specify the higher order the polynomials will be.
3. I already know enough about polynomials to construct any sort of spline. A polynomial through 'n' points with a derivative specified at 'm[1]' points and second derivatives specified at 'm[2]' points etc. will be a polynomial of the order n-1+m[1]+m[2]...
A way to construct third order splines (that admittedly doesn't help you construct higher order splines), is to linear interpolate between two parabolas. At each point (they are called knots) you have a parabola going through that point, the previous and the next point. Between each point you linearly interpolate between the polynomials for each point. This may help you imagine splines.
As a starting point I used a polynomial through 5 points for each knot and used MuPad (a free Mathematica like program) to derive a polynomial going through two points (knots) where at each point it has the same first two derivatives as a 4th order polynomial through the surrounding 5 points. My intuition was that basing it on polynomials through 3 points wouldn't be enough of a neighborhood to get good continuity. When I tested it, I found that not only did basing it on 5 point polynomials do much better than basing it on 3 point ones, but that 7 point ones did nearly as badly as 3 point ones. 5 points seems to be a sweet spot.
However, I could have set the derivatives to a nearly arbitrary values - basing the values on those of polynomials through the surrounding points was just a guess.
I've read that the math of sampling theory has different interpretation to the sinc function one where you could upsample by making a polynomial through every point at the same order as the number of points and this would give you the same answer as sinc function interpolation (but this only converges perfectly when there are an infinite number of points). Your head is probably spinning right now - the only point of mentioning that is to point out that perfect interpolation is exactly as stiff as a polynomial through the target points of the same order as the number of target points.
Code : //interpolates between L0 and H0 taking the previous (L1) and next (H1)
points into account
inline float ThirdInterp(const float x,const float L1,const float L0,const
float H0,const float H1)
{
return
L0 +
.5f*
x*(H0-L1 +
x*(H0 + L0*(-2) + L1 +
x*( (H0 - L0)*9 + (L1 - H1)*3 +
x*((L0 - H0)*15 + (H1 - L1)*5 +
x*((H0 - L0)*6 + (L1 - H1)*2 )))));
}
5 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
5-point spline interpollation
Type : interpollation References : Joshua Scholar, posted by David Waugh Code : //nMask = sizeofwavetable-1 where sizeofwavetable is a power of two.
double interpolate(double* wavetable, int nMask, double location)
{
/* 5-point spline*/
int nearest_sample = (int) location;
double x = location - (double) nearest_sample;
double p0=wavetable[(nearest_sample-2)&nMask];
double p1=wavetable[(nearest_sample-1)&nMask];
double p2=wavetable[nearest_sample];
double p3=wavetable[(nearest_sample+1)&nMask];
double p4=wavetable[(nearest_sample+2)&nMask];
double p5=wavetable[(nearest_sample+3)&nMask];
return p2 + 0.04166666666*x*((p3-p1)*16.0+(p0-p4)*2.0
+ x *((p3+p1)*16.0-p0-p2*30.0- p4
+ x *(p3*66.0-p2*70.0-p4*33.0+p1*39.0+ p5*7.0- p0*9.0
+ x *( p2*126.0-p3*124.0+p4*61.0-p1*64.0- p5*12.0+p0*13.0
+ x *((p3-p2)*50.0+(p1-p4)*25.0+(p5-p0)*5.0)))));
};
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Allocating aligned memory
Type : memory allocation References : Posted by Benno Senoner
Notes : we waste up to align_size + sizeof(int) bytes when we alloc a memory area.
We store the aligned_ptr - unaligned_ptr delta in an int located before the aligned area.
This is needed for the free() routine since we need to free all the memory not only the aligned area.
You have to use aligned_free() to free the memory allocated with aligned_malloc() !
Code : /* align_size has to be a power of two !! */
void *aligned_malloc(size_t size, size_t align_size) {
char *ptr,*ptr2,*aligned_ptr;
int align_mask = align_size - 1;
ptr=(char *)malloc(size + align_size + sizeof(int));
if(ptr==NULL) return(NULL);
ptr2 = ptr + sizeof(int);
aligned_ptr = ptr2 + (align_size - ((size_t)ptr2 & align_mask));
ptr2 = aligned_ptr - sizeof(int);
*((int *)ptr2)=(int)(aligned_ptr - ptr);
return(aligned_ptr);
}
void aligned_free(void *ptr) {
int *ptr2=(int *)ptr - 1;
ptr -= *ptr2;
free(ptr);
}
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Antialiased Lines
Type : A slow, ugly, and unoptimized but short method to perform antialiased lines in a framebuffer References : Posted by arguru[AT]smartelectronix[DOT]com
Notes : Simple code to perform antialiased lines in a 32-bit RGBA (1 byte/component) framebuffer.
pframebuffer <- unsigned char* to framebuffer bytes (important: Y flipped line order! [like in the way Win32 CreateDIBSection works...])
client_height=framebuffer height in lines
client_width=framebuffer width in pixels (not in bytes)
This doesnt perform any clip checl so it fails if coordinates are set out of bounds.
sorry for the engrish
Code : //
// By Arguru
//
void PutTransPixel(int const x,int const y,UCHAR const r,UCHAR const g,UCHAR const b,UCHAR const a)
{
unsigned char* ppix=pframebuffer+(x+(client_height-(y+1))*client_width)*4;
ppix[0]=((a*b)+(255-a)*ppix[0])/256;
ppix[1]=((a*g)+(255-a)*ppix[1])/256;
ppix[2]=((a*r)+(255-a)*ppix[2])/256;
}
void LineAntialiased(int const x1,int const y1,int const x2,int const y2,UCHAR const r,UCHAR const g,UCHAR const b)
{
// some useful constants first
double const dw=x2-x1;
double const dh=y2-y1;
double const slx=dh/dw;
double const sly=dw/dh;
// determine wichever raster scanning behaviour to use
if(fabs(slx)<1.0)
{
// x scan
int tx1=x1;
int tx2=x2;
double raster=y1;
if(x1>x2)
{
tx1=x2;
tx2=x1;
raster=y2;
}
for(int x=tx1;x<=tx2;x++)
{
int const ri=int(raster);
double const in_y0=1.0-(raster-ri);
double const in_y1=1.0-(ri+1-raster);
PutTransPixel(x,ri+0,r,g,b,in_y0*255.0);
PutTransPixel(x,ri+1,r,g,b,in_y1*255.0);
raster+=slx;
}
}
else
{
// y scan
int ty1=y1;
int ty2=y2;
double raster=x1;
if(y1>y2)
{
ty1=y2;
ty2=y1;
raster=x2;
}
for(int y=ty1;y<=ty2;y++)
{
int const ri=int(raster);
double const in_x0=1.0-(raster-ri);
double const in_x1=1.0-(ri+1-raster);
PutTransPixel(ri+0,y,r,g,b,in_x0*255.0);
PutTransPixel(ri+1,y,r,g,b,in_x1*255.0);
raster+=sly;
}
}
}
7 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Automatic PDC system
Type : the type that actually works, completely References : Posted by Tebello Thejane Linked file : pdc.pdf
Notes : No, people, implementing PDC is actually not as difficult as you might think it is.
This paper presents a solution to the problem of latency inherent in audio effects processors, and the two appendices give examples of the method being applied on Cubase SX (with an example which its native half-baked PDC fails to solve properly) as well as a convoluted example in FL Studio (taking advantage of the flexible routing capabilities introduced in version 6 of the software). All that's necessary to understand it is a grasp of basic algebra and an intermediate understanding of how music production software works (no need to understand the Laplace transform, linear processes, sigma and integral notation... YAY!).
Please do send me any feedback (kudos, errata, flames, job offers, questions, comments) you might have - my email address is included in the paper - or simply use musicdsp.org's own commenting system.
Tebello Thejane.
Code : (I have sent the PDF to Bram as he suggested)
3 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Base-2 exp
References : Posted by Laurent de Soras
Notes : Linear approx. between 2 integer values of val. Uses 32-bit integers. Not very efficient but fastest than exp()
This code was designed for x86 (little endian), but could be adapted for big endian processors.
Laurent thinks you just have to change the (*(1 + (int *) &ret)) expressions and replace it by (*(int *) &ret). However, He didn't test it.
Code : inline double fast_exp2 (const double val)
{
int e;
double ret;
if (val >= 0)
{
e = int (val);
ret = val - (e - 1);
((*(1 + (int *) &ret)) &= ~(2047 << 20)) += (e + 1023) << 20;
}
else
{
e = int (val + 1023);
ret = val - (e - 1024);
((*(1 + (int *) &ret)) &= ~(2047 << 20)) += e << 20;
}
return (ret);
}
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Bit-Reversed Counting
References : Posted by mailbjl[AT]yahoo[DOT]com
Notes : Bit-reversed ordering comes up frequently in FFT implementations. Here is a non-branching algorithm (given in C) that increments the variable "s" bit-reversedly from 0 to N-1, where N is a power of 2.
Code : int r = 0; // counter
int s = 0; // bit-reversal of r/2
int N = 256; // N can be any power of 2
int N2 = N << 1; // N<<1 == N*2
do {
printf("%u ", s);
r += 2;
s ^= N - (N / (r&-r));
}
while (r < N2);
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Block/Loop Benchmarking
Type : Benchmarking Tool References : Posted by arguru[AT]smartelectronix[DOT]com
Notes : Requires CPU with RDTSC support
Code : // Block-Process Benchmarking Code using rdtsc
// useful for measure DSP block stuff
// (based on Intel papers)
// 64-bit precission
// VeryUglyCode(tm) by Arguru
// globals
UINT time,time_low,time_high;
// call this just before enter your loop or whatever
void bpb_start()
{
// read time stamp to EAX
__asm rdtsc;
__asm mov time_low,eax;
__asm mov time_high,edx;
}
// call the following function just after your loop
// returns average cycles wasted per sample
UINT bpb_finish(UINT const num_samples)
{
__asm rdtsc
__asm sub eax,time_low;
__asm sub edx,time_high;
__asm div num_samples;
__asm mov time,eax;
return time;
}
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Branchless Clipping
Type : Clipping at 0dB, with none of the usual 'if..then..' References : Posted by musicdsp[AT]dsparsons[DOT]co[DOT]uk
Notes : I was working on something that I wanted to ensure that the signal never went above 0dB, and a branchless solution occurred to me.
It works by playing with the structure of a single type, shifting the sign bit down to make a new mulitplicand.
calling MaxZerodB(mydBSample) will ensure that it will never stray over 0dB.
By playing with signs or adding/removing offsets, this offers a complete branchless limiting solution, no matter whether dB or not (after all, they're all just numbers...).
eg:
Limit to <=0 : sample:=MaxZerodB(sample);
Limit to <=3 : sample:=MaxZerodB(sample-3)+3;
Limit to <=-4 : sample:=MaxZerodB(sample+4)-4;
Limit to >=0 : sample:=-MaxZerodB(-sample);
Limit to >=2 : sample:=-MaxZerodB(-sample+2)+2;
Limit to >=-1.5: sample:=-MaxZerodB(-sample-1.5)-1.5;
Whether it actually saves any CPU cycles remains to be seen, but it was an interesting diversion for half an hour :)
[Translating from pascal to other languages shouldn't be too hard, and for doubles, you'll need to fiddle it abit :)]
Code : function MaxZerodB(dBin:single):single;
var tmp:longint;
begin
//given that leftmost bit of a longint indicates the negative,
// if we shift that down to bit0, and multiply dBin by that
// it will return dBin, or zero :)
tmp:=(longint((@dBin)^) and $80000000) shr 31;
result:=dBin*tmp;
end;
6 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Calculate notes (java)
Type : Java class for calculating notes with different in params References : Posted by larsby[AT]elak[DOT]org Linked file : Frequency.java
Notes : Converts between string notes and frequencies and back. I vaguely remember writing bits of it, and I got it off the net somwhere so dont ask me
- Larsby
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Center separation in a stereo mixdown
References : Posted by Thiburce BELAVENTURE
Notes : One year ago, i found a little trick to isolate or remove the center in a stereo mixdown.
My method use the time-frequency representation (FFT). I use a min fuction between left and right channels (for each bin) to create the pseudo center. I apply a phase correction, and i substract this signal to the left and right signals.
Then, we can remix them after treatments (or without) to produce a stereo signal in output.
This algorithm (I called it "TBIsolator") is not perfect, but the result is very nice, better than the phase technic (L substract R...). I know that it is not mathematically correct, but as an estimation of the center, the exact match is very hard to obtain. So, it is not so bad (just listen the result and see).
My implementation use a 4096 FFT size, with overlap-add method (factor 2). With a lower FFT size, the sound will be more dirty, and with a 16384 FFT size, the center will have too much high frequency (I don't explore why this thing appears).
I just post the TBIsolator code (see FFTReal in this site for implement the FFT engine).
pIns and pOuts buffers use the representation of the FFTReal class (0 to N/2-1: real parts, N/2 to N-1: imaginary parts).
Have fun with the TBIsolator algorithm ! I hope you enjoy it and if you enhance it, contact me (it's my baby...).
P.S.: the following function is not optimized.
Code : /* ============================================================= */
/* nFFTSize must be a power of 2 */
/* ============================================================= */
/* Usage examples: */
/* - suppress the center: fAmpL = 1.f, fAmpC = 0.f, fAmpR = 1.f */
/* - keep only the center: fAmpL = 0.f, fAmpC = 1.f, fAmpR = 0.f */
/* ============================================================= */
void processTBIsolator(float *pIns[2], float *pOuts[2], long nFFTSize, float fAmpL, float fAmpC, float fAmpR)
{
float fModL, fModR;
float fRealL, fRealC, fRealR;
float fImagL, fImagC, fImagR;
double u;
for ( long i = 0, j = nFFTSize / 2; i < nFFTSize / 2; i++ )
{
fModL = pIns[0][i] * pIns[0][i] + pIns[0][j] * pIns[0][j];
fModR = pIns[1][i] * pIns[1][i] + pIns[1][j] * pIns[1][j];
// min on complex numbers
if ( fModL > fModR )
{
fRealC = fRealR;
fImagC = fImagR;
}
else
{
fRealC = fRealL;
fImagC = fImagL;
}
// phase correction...
u = fabs(atan2(pIns[0][j], pIns[0][i]) - atan2(pIns[1][j], pIns[1][i])) / 3.141592653589;
if ( u >= 1 ) u -= 1.;
u = pow(1 - u*u*u, 24);
fRealC *= (float) u;
fImagC *= (float) u;
// center extraction...
fRealL = pIns[0][i] - fRealC;
fImagL = pIns[0][j] - fImagC;
fRealR = pIns[1][i] - fRealC;
fImagR = pIns[1][j] - fImagC;
// You can do some treatments here...
pOuts[0][i] = fRealL * fAmpL + fRealC * fAmpC;
pOuts[0][j] = fImagL * fAmpL + fImagC * fAmpC;
pOuts[1][i] = fRealR * fAmpR + fRealC * fAmpC;
pOuts[1][j] = fImagR * fAmpR + fImagC * fAmpC;
}
}
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Center separation in a stereo mixdown
References : Posted by Thiburce BELAVENTURE
Notes : One year ago, i found a little trick to isolate or remove the center in a stereo mixdown.
My method use the time-frequency representation (FFT). I use a min fuction between left and right channels (for each bin) to create the pseudo center. I apply a phase correction, and i substract this signal to the left and right signals.
Then, we can remix them after treatments (or without) to produce a stereo signal in output.
This algorithm (I called it "TBIsolator") is not perfect, but the result is very nice, better than the phase technic (L substract R...). I know that it is not mathematically correct, but as an estimation of the center, the exact match is very hard to obtain. So, it is not so bad (just listen the result and see).
My implementation use a 4096 FFT size, with overlap-add method (factor 2). With a lower FFT size, the sound will be more dirty, and with a 16384 FFT size, the center will have too much high frequency (I don't explore why this thing appears).
I just post the TBIsolator code (see FFTReal in this site for implement the FFT engine).
pIns and pOuts buffers use the representation of the FFTReal class (0 to N/2-1: real parts, N/2 to N-1: imaginary parts).
Have fun with the TBIsolator algorithm ! I hope you enjoy it and if you enhance it, contact me (it's my baby...).
P.S.: the following function is not optimized.
Code : /* ============================================================= */
/* nFFTSize must be a power of 2 */
/* ============================================================= */
/* Usage examples: */
/* - suppress the center: fAmpL = 1.f, fAmpC = 0.f, fAmpR = 1.f */
/* - keep only the center: fAmpL = 0.f, fAmpC = 1.f, fAmpR = 0.f */
/* ============================================================= */
void processTBIsolator(float *pIns[2], float *pOuts[2], long nFFTSize, float fAmpL, float fAmpC, float fAmpR)
{
float fModL, fModR;
float fRealL, fRealC, fRealR;
float fImagL, fImagC, fImagR;
double u;
for ( long i = 0, j = nFFTSize / 2; i < nFFTSize / 2; i++ )
{
fModL = pIns[0][i] * pIns[0][i] + pIns[0][j] * pIns[0][j];
fModR = pIns[1][i] * pIns[1][i] + pIns[1][j] * pIns[1][j];
// min on complex numbers
if ( fModL > fModR )
{
fRealC = fRealR;
fImagC = fImagR;
}
else
{
fRealC = fRealL;
fImagC = fImagL;
}
// phase correction...
u = fabs(atan2(pIns[0][j], pIns[0][i]) - atan2(pIns[1][j], pIns[1][i])) / 3.141592653589;
if ( u >= 1 ) u -= 1.;
u = pow(1 - u*u*u, 24);
fRealC *= (float) u;
fImagC *= (float) u;
// center extraction...
fRealL = pIns[0][i] - fRealC;
fImagL = pIns[0][j] - fImagC;
fRealR = pIns[1][i] - fRealC;
fImagR = pIns[1][j] - fImagC;
// You can do some treatments here...
pOuts[0][i] = fRealL * fAmpL + fRealC * fAmpC;
pOuts[0][j] = fImagL * fAmpL + fImagC * fAmpC;
pOuts[1][i] = fRealR * fAmpR + fRealC * fAmpC;
pOuts[1][j] = fImagR * fAmpR + fImagC * fAmpC;
}
}
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Cheap pseudo-sinusoidal lfo
References : Posted by fumminger[AT]umminger[DOT]com
Notes : Although the code is written in standard C++, this algorithm is really better suited for dsps where one can take advantage of multiply-accumulate instructions and where the required phase accumulator can be easily implemented by masking a counter.
It provides a pretty cheap roughly sinusoidal waveform that is good enough for an lfo.
Code : // x should be between -1.0 and 1.0
inline
double pseudo_sine(double x)
{
// Compute 2*(x^2-1.0)^2-1.0
x *= x;
x -= 1.0;
x *= x;
// The following lines modify the range to lie between -1.0 and 1.0.
// If a range of between 0.0 and 1.0 is acceptable or preferable
// (as in a modulated delay line) then you can save some cycles.
x *= 2.0;
x -= 1.0;
}
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Clipping without branching
Type : Min, max and clip References : Posted by Laurent de Soras
Notes : It may reduce accuracy for small numbers. I.e. if you clip to [-1; 1], fractional part of the result will be quantized to 23 bits (or more, depending on the bit depth of the temporary results). Thus, 1e-20 will be rounded to 0. The other (positive) side effect is the denormal number elimination.
Code : float max (float x, float a)
{
x -= a;
x += fabs (x);
x *= 0.5;
x += a;
return (x);
}
float min (float x, float b)
{
x = b - x;
x += fabs (x)
x *= 0.5;
x = b - x;
return (x);
}
float clip (float x, float a, float b)
{
x1 = fabs (x-a);
x2 = fabs (x-b);
x = x1 + (a+b);
x -= x2;
x *= 0.5;
return (x);
}
6 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Constant-time exponent of 2 detector
References : Posted by Brent Lehman (mailbjl[AT]yahoo.com)
Notes : In your common FFT program, you want to make sure that the frame you're working with has a size that is a power of 2. This tells you in just a few operations. Granted, you won't be using this algorithm inside a loop, so the savings aren't that great, but every little hack helps ;)
Code : // Quit if size isn't a power of 2
if ((-size ^ size) & size) return;
// If size is an unsigned int, the above might not compile.
// You'd want to use this instead:
if (((~size + 1) ^ size) & size) return;
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Conversions on a PowerPC
Type : motorola ASM conversions References : Posted by James McCartney Code : double ftod(float x) { return (double)x;
00000000: 4E800020 blr
// blr == return from subroutine, i.e. this function is a noop
float dtof(double x) { return (float)x;
00000000: FC200818 frsp fp1,fp1
00000004: 4E800020 blr
int ftoi(float x) { return (int)x;
00000000: FC00081E fctiwz fp0,fp1
00000004: D801FFF0 stfd fp0,-16(SP)
00000008: 8061FFF4 lwz r3,-12(SP)
0000000C: 4E800020 blr
int dtoi(double x) { return (int)x;
00000000: FC00081E fctiwz fp0,fp1
00000004: D801FFF0 stfd fp0,-16(SP)
00000008: 8061FFF4 lwz r3,-12(SP)
0000000C: 4E800020 blr
double itod(int x) { return (double)x;
00000000: C8220000 lfd fp1,@1558(RTOC)
00000004: 6C608000 xoris r0,r3,$8000
00000008: 9001FFF4 stw r0,-12(SP)
0000000C: 3C004330 lis r0,17200
00000010: 9001FFF0 stw r0,-16(SP)
00000014: C801FFF0 lfd fp0,-16(SP)
00000018: FC200828 fsub fp1,fp0,fp1
0000001C: 4E800020 blr
float itof(int x) { return (float)x;
00000000: C8220000 lfd fp1,@1558(RTOC)
00000004: 6C608000 xoris r0,r3,$8000
00000008: 9001FFF4 stw r0,-12(SP)
0000000C: 3C004330 lis r0,17200
00000010: 9001FFF0 stw r0,-16(SP)
00000014: C801FFF0 lfd fp0,-16(SP)
00000018: EC200828 fsubs fp1,fp0,fp1
0000001C: 4E800020 blr
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Copy-protection schemes
References : Posted by Moyer, Andy
Notes : This post of Andy sums up everything there is to know about copy-protection schemes:
"Build a great product and release improvements regularly so that people will
be willing to spend the money on it, thus causing anything that is cracked
to be outdated quickly. Build a strong relationship with your customers,
because if they've already paid for one of your products, and were
satisfied, chances are, they will be more likely to buy another one of your
products. Make your copy protection good enough so that somebody can't just
do a search in Google and enter in a published serial number, but don't make
registered users jump through flaming hoops to be able to use the product.
Also use various approaches to copy protection within a release, and vary
those approaches over multiple releases so that a hacker that cracked your
app's version 1.0 can't just run a recorded macro in a text editor to crack
your version 2.0 software [this being simplified]."
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Cubic interpollation
Type : interpollation References : Posted by Olli Niemitalo Linked file : other001.gif
Notes : (see linkfile)
finpos is the fractional, inpos the integer part.
Code : xm1 = x [inpos - 1];
x0 = x [inpos + 0];
x1 = x [inpos + 1];
x2 = x [inpos + 2];
a = (3 * (x0-x1) - xm1 + x2) / 2;
b = 2*x1 + xm1 - (5*x0 + x2) / 2;
c = (x1 - xm1) / 2;
y [outpos] = (((a * finpos) + b) * finpos + c) * finpos + x0;
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Cure for malicious samples
Type : Filters Denormals, NaNs, Infinities References : Posted by urs[AT]u-he[DOT]com
Notes : A lot of incidents can happen during processing samples. A nasty one is denormalization, which makes cpus consume insanely many cycles for easiest instructions.
But even worse, if you have NaNs or Infinities inside recursive structures, maybe due to division by zero, all subsequent samples that are multiplied with these values will get "infected" and become NaN or Infinity. Your sound makes BLIPPP and that was it, silence from the speakers.
Thus I've written a small function that sets all of these cases to 0.0f.
You'll notice that I treat a buffer of floats as unsigned integers. And I avaoid branches by using comparison results as 0 or 1.
When compiled with GCC, this function should not create any "hidden" branches, but you should verify the assembly code anyway. Sometimes some parenthesis do the trick...
;) Urs
Code : #ifndef UInt32
#define UInt32 unsigned int
#endif
void erase_All_NaNs_Infinities_And_Denormals( float* inSamples, int& inNumberOfSamples )
{
UInt32* inArrayOfFloats = (UInt32*) inSamples;
for ( int i = 0; i < inNumberOfSamples; i++ )
{
UInt32 sample = *inArrayOfFloats;
UInt32 exponent = sample & 0x7F800000;
// exponent < 0x7F800000 is 0 if NaN or Infinity, otherwise 1
// exponent > 0 is 0 if denormalized, otherwise 1
int aNaN = exponent < 0x7F800000;
int aDen = exponent > 0;
*inArrayOfFloats++ = sample * ( aNaN & aDen );
}
}
4 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Denormal DOUBLE variables, macro
References : Posted by Jon Watte
Notes : Use this macro if you want to find denormal numbers and you're using doubles...
Code : #if PLATFORM_IS_BIG_ENDIAN
#define INDEX 0
#else
#define INDEX 1
#endif
inline bool is_denormal( double const & d ) {
assert( sizeof( d ) == 2*sizeof( int ) );
int l = ((int *)&d)[INDEX];
return (l&0x7fe00000) != 0;
}
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Denormal numbers, the meta-text
References : Laurent de Soras Linked file : denormal.pdf
Notes : This very interesting paper, written by Laurent de Soras (www.ohmforce.com) has everything you ever wanted to know about denormal numbers! And it obviously descibes how you can get rid of them too!
(see linked file)
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Denormalization preventer
References : Posted by gol
Notes : A fast tiny numbers sweeper using integer math. Only for 32bit floats. Den_Thres is your 32bit (normalized) float threshold, something small enough but big enough to prevent future denormalizations.
EAX=input buffer
EDX=length
(adapt to your compiler)
Code : MOV ECX,EDX
LEA EDI,[EAX+4*ECX]
NEG ECX
MOV EDX,Den_Thres
SHL EDX,1
XOR ESI,ESI
@Loop:
MOV EAX,[EDI+4*ECX]
LEA EBX,[EAX*2]
CMP EBX,EDX
CMOVB EAX,ESI
MOV [EDI+4*ECX],EAX
INC ECX
JNZ @Loop
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Denormalization preventer
References : Posted by didid[AT]skynet[DOT]be
Notes : Because I still see people adding noise or offset to their signal to avoid slow denormalization, here's a piece of code to zero out (near) tiny numbers instead.
Why zeroing out is better? Because a fully silent signal is better than a little offset, or noise. A host or effect can detect silent signals and choose not to process them in a safe way.
Plus, adding an offset or noise reduces huge packets of denormalization, but still leaves some behind.
Also, truncating is what the DAZ (Denormals Are Zero) SSE flag does.
This code uses integer comparison, and a CMOV, so you need a Pentium Pro or higher.
There's no need for an SSE version, as if you have SSE code you're probably already using the DAZ flag instead (but I advise plugins not to mess with the SSE flags, as the host is likely to have DAZ switched on already). This is for FPU code. Should work much faster than crap FPU comparison.
Den_Thres is your threshold, it cannot be denormalized (would be pointless). The function is Delphi, if you want to adapt, just make sure EAX is the buffer and EDX is length (Delphi register calling convention - it's not the same in C++).
Code : const Den_Thres:Single=1/$1000000;
procedure PrevFPUDen_Buffer(Buffer:Pointer;Length:Integer);
asm
PUSH ESI
PUSH EDI
PUSH EBX
MOV ECX,EDX
LEA EDI,[EAX+4*ECX]
NEG ECX
MOV EDX,Den_Thres
SHL EDX,1
XOR ESI,ESI
@Loop:
MOV EAX,[EDI+4*ECX]
LEA EBX,[EAX*2]
CMP EBX,EDX
CMOVB EAX,ESI
MOV [EDI+4*ECX],EAX
INC ECX
JNZ @Loop
POP EBX
POP EDI
POP ESI
End;
4 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Dither code
Type : Dither with noise-shaping References : Posted by Paul Kellett
Notes : This is a simple implementation of highpass triangular-PDF dither (a good general-purpose dither) with optional 2nd-order noise shaping (which lowers the noise floor by 11dB below 0.1 Fs).
The code assumes input data is in the range +1 to -1 and doesn't check for overloads!
To save time when generating dither for multiple channels you can re-use lower bits of a previous random number instead of calling rand() again. e.g. r3=(r1 & 0x7F)<<8;
Code : int r1, r2; //rectangular-PDF random numbers
float s1, s2; //error feedback buffers
float s = 0.5f; //set to 0.0f for no noise shaping
float w = pow(2.0,bits-1); //word length (usually bits=16)
float wi= 1.0f/w;
float d = wi / RAND_MAX; //dither amplitude (2 lsb)
float o = wi * 0.5f; //remove dc offset
float in, tmp;
int out;
//for each sample...
r2=r1; //can make HP-TRI dither by
r1=rand(); //subtracting previous rand()
in += s * (s1 + s1 - s2); //error feedback
tmp = in + o + d * (float)(r1 - r2); //dc offset and dither
out = (int)(w * tmp); //truncate downwards
if(tmp<0.0f) out--; //this is faster than floor()
s2 = s1;
s1 = in - wi * (float)out; //error
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Double to Int
Type : pointer cast (round to zero, or 'trunctate') References : Posted by many people, implementation by Andy M00cho
Notes : -Platform independant, literally. You have IEEE FP numbers, this will work, as long as your not expecting a signed integer back larger than 16bits :)
-Will only work correctly for FP numbers within the range of [-32768.0,32767.0]
-The FPU must be in Double-Precision mode
Code : typedef double lreal;
typedef float real;
typedef unsigned long uint32;
typedef long int32;
//2^36 * 1.5, (52-_shiftamt=36) uses limited precision to floor
//16.16 fixed point representation
const lreal _double2fixmagic = 68719476736.0*1.5;
const int32 _shiftamt = 16;
#if BigEndian_
#define iexp_ 0
#define iman_ 1
#else
#define iexp_ 1
#define iman_ 0
#endif //BigEndian_
// Real2Int
inline int32 Real2Int(lreal val)
{
val= val + _double2fixmagic;
return ((int32*)&val)[iman_] >> _shiftamt;
}
// Real2Int
inline int32 Real2Int(real val)
{
return Real2Int ((lreal)val);
}
For the x86 assembler freaks here's the assembler equivalent:
__double2fixmagic dd 000000000h,042380000h
fld AFloatingPoint Number
fadd QWORD PTR __double2fixmagic
fstp TEMP
movsx eax,TEMP+2
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Envelope Follower
References : Posted by ers Code : #define V_ENVELOPE_FOLLOWER_NUM_POINTS 2000
class vEnvelopeFollower :
{
public:
vEnvelopeFollower();
virtual ~vEnvelopeFollower();
inline void Calculate(float *b)
{
envelopeVal -= *buff;
if (*b < 0)
envelopeVal += *buff = -*b;
else
envelopeVal += *buff = *b;
if (buff++ == bufferEnd)
buff = buffer;
}
void SetBufferSize(float value);
void GetControlValue(){return envelopeVal / (float)bufferSize;}
private:
float buffer[V_ENVELOPE_FOLLOWER_NUM_POINTS];
float *bufferEnd, *buff, envelopeVal;
int bufferSize;
float val;
};
vEnvelopeFollower::vEnvelopeFollower()
{
bufferEnd = buffer + V_ENVELOPE_FOLLOWER_NUM_POINTS-1;
buff = buffer;
val = 0;
float *b = buffer;
do
{
*b++ = 0;
}while (b <= bufferEnd);
bufferSize = V_ENVELOPE_FOLLOWER_NUM_POINTS;
envelopeVal= 0;
}
vEnvelopeFollower::~vEnvelopeFollower()
{
}
void vEnvelopeFollower::SetBufferSize(float value)
{
bufferEnd = buffer + (bufferSize = 100 + (int)(value * ((float)V_ENVELOPE_FOLLOWER_NUM_POINTS-102)));
buff = buffer;
float val = envelopeVal / bufferSize;
do
{
*buff++ = val;
}while (buff <= bufferEnd);
buff = buffer;
}
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Exponential curve for
Type : Exponential curve References : Posted by neolit123 [at] gmail [dot] com
Notes : When you design a frequency control for your filters you may need an exponential curve to give the lower frequencies more resolution.
F=20-20000hz
x=0-100%
Case (control middle point):
x=50%
F=1135hz
Ploted diagram with 5 points:
http://img151.imageshack.us/img151/9938/expplotur3.jpg
Code : //tweak - 14.15005 to change middle point and F(max)
F = 19+floor(pow(4,x/14.15005))+x*20;
3 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Exponential parameter mapping
References : Posted by Russell Borogove
Notes : Use this if you want to do an exponential map of a parameter (mParam) to a range (mMin - mMax).
Output is in mData...
Code : float logmax = log10f( mMax );
float logmin = log10f( mMin );
float logdata = (mParam * (logmax-logmin)) + logmin;
mData = powf( 10.0f, logdata );
if (mData < mMin)
{
mData = mMin;
}
if (mData > mMax)
{
mData = mMax;
}
4 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
fast abs/neg/sign for 32bit floats
Type : floating point functions References : Posted by tobybear[AT]web[DOT]de
Notes : Haven't seen this elsewhere, probably because it is too obvious? Anyway, these functions are intended for 32-bit floating point numbers only and should work a bit faster than the regular ones.
fastabs() gives you the absolute value of a float
fastneg() gives you the negative number (faster than multiplying with -1)
fastsgn() gives back +1 for 0 or positive numbers, -1 for negative numbers
Comments are welcome (tobybear[AT]web[DOT]de)
Cheers
Toby (www.tobybear.de)
Code : // C/C++ code:
float fastabs(float f)
{int i=((*(int*)&f)&0x7fffffff);return (*(float*)&i);}
float fastneg(float f)
{int i=((*(int*)&f)^0x80000000);return (*(float*)&i);}
int fastsgn(float f)
{return 1+(((*(int*)&f)>>31)<<1);}
//Delphi/Pascal code:
function fastabs(f:single):single;
begin i:=longint((@f)^) and $7FFFFFFF;result:=single((@i)^) end;
function fastneg(f:single):single;
begin i:=longint((@f)^) xor $80000000;result:=single((@i)^) end;
function fastsgn(f:single):longint;
begin result:=1+((longint((@f)^) shr 31)shl 1) end;
6 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Fast binary log approximations
Type : C code References : Posted by musicdsp.org[AT]mindcontrol.org
Notes : This code uses IEEE 32-bit floating point representation knowledge to quickly compute approximations to the log2 of a value. Both functions return under-estimates of the actual value, although the second flavour is less of an under-estimate than the first (and might be sufficient for using in, say, a dBV/FS level meter).
Running the test program, here's the output:
0.1: -4 -3.400000
1: 0 0.000000
2: 1 1.000000
5: 2 2.250000
100: 6 6.562500
Code : // Fast logarithm (2-based) approximation
// by Jon Watte
#include <assert.h>
int floorOfLn2( float f ) {
assert( f > 0. );
assert( sizeof(f) == sizeof(int) );
assert( sizeof(f) == 4 );
return (((*(int *)&f)&0x7f800000)>>23)-0x7f;
}
float approxLn2( float f ) {
assert( f > 0. );
assert( sizeof(f) == sizeof(int) );
assert( sizeof(f) == 4 );
int i = (*(int *)&f);
return (((i&0x7f800000)>>23)-0x7f)+(i&0x007fffff)/(float)0x800000;
}
// Here's a test program:
#include <stdio.h>
// insert code from above here
int
main()
{
printf( "0.1: %d %f\n", floorOfLn2( 0.1 ), approxLn2( 0.1 ) );
printf( "1: %d %f\n", floorOfLn2( 1. ), approxLn2( 1. ) );
printf( "2: %d %f\n", floorOfLn2( 2. ), approxLn2( 2. ) );
printf( "5: %d %f\n", floorOfLn2( 5. ), approxLn2( 5. ) );
printf( "100: %d %f\n", floorOfLn2( 100. ), approxLn2( 100. ) );
return 0;
}
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Fast cube root, square root, and reciprocal for x86/SSE CPUs.
References : Posted by kaleja[AT]estarcion[DOT]com
Notes : All of these methods use SSE instructions or bit twiddling tricks to get a rough approximation to cube root, square root, or reciprocal, which is then refined with one or more Newton-Raphson approximation steps.
Each is named to indicate its approximate level of accuracy and a comment describes its performance relative to the conventional versions.
Code : // These functions approximate reciprocal, square root, and
// cube root to varying degrees of precision substantially
// faster than the straightforward methods 1/x, sqrtf(x),
// and powf( x, 1.0f/3.0f ). All require SSE-enabled CPU & OS.
// Unlike the powf() solution, the cube roots also correctly
// handle negative inputs.
#define REALLY_INLINE __forceinline
// ~34 clocks on Pentium M vs. ~360 for powf
REALLY_INLINE float cuberoot_sse_8bits( float x )
{
float z;
static const float three = 3.0f;
_asm
{
mov eax, x // x as bits
movss xmm2, x // x2: x
movss xmm1, three // x1: 3
// Magic floating point cube root done with integer math.
// The exponent is divided by three in such a way that
// remainder bits get shoved into the top of the normalized
// mantissa.
mov ecx, eax // copy of x
and eax, 0x7FFFFFFF // exponent & mantissa of x in biased-127
sub eax, 0x3F800000 // exponent & mantissa of x in 2's comp
sar eax, 10 //
imul eax, 341 // 341/1024 ~= .333
add eax, 0x3F800000 // back to biased-127
and eax, 0x7FFFFFFF // remask
and ecx, 0x80000000 // original sign and mantissa
or eax, ecx // masked new exponent, old sign and mantissa
mov z, eax //
// use SSE to refine the first approximation
movss xmm0, z ;// x0: z
movss xmm3, xmm0 ;// x3: z
mulss xmm3, xmm0 ;// x3: z*z
movss xmm4, xmm3 ;// x4: z*z
mulss xmm3, xmm1 ;// x3: 3*z*z
rcpss xmm3, xmm3 ;// x3: ~ 1/(3*z*z)
mulss xmm4, xmm0 ;// x4: z*z*z
subss xmm4, xmm2 ;// x4: z*z*z-x
mulss xmm4, xmm3 ;// x4: (z*z*z-x) / (3*z*z)
subss xmm0, xmm4 ;// x0: z' accurate to within about 0.3%
movss z, xmm0
}
return z;
}
// ~60 clocks on Pentium M vs. ~360 for powf
REALLY_INLINE float cuberoot_sse_16bits( float x )
{
float z;
static const float three = 3.0f;
_asm
{
mov eax, x // x as bits
movss xmm2, x // x2: x
movss xmm1, three // x1: 3
// Magic floating point cube root done with integer math.
// The exponent is divided by three in such a way that
// remainder bits get shoved into the top of the normalized
// mantissa.
mov ecx, eax // copy of x
and eax, 0x7FFFFFFF // exponent & mantissa of x in biased-127
sub eax, 0x3F800000 // exponent & mantissa of x in 2's comp
sar eax, 10 //
imul eax, 341 // 341/1024 ~= .333
add eax, 0x3F800000 // back to biased-127
and eax, 0x7FFFFFFF // remask
and ecx, 0x80000000 // original sign and mantissa
or eax, ecx // masked new exponent, old sign and mantissa
mov z, eax //
// use SSE to refine the first approximation
movss xmm0, z ;// x0: z
movss xmm3, xmm0 ;// x3: z
mulss xmm3, xmm0 ;// x3: z*z
movss xmm4, xmm3 ;// x4: z*z
mulss xmm3, xmm1 ;// x3: 3*z*z
rcpss xmm3, xmm3 ;// x3: ~ 1/(3*z*z)
mulss xmm4, xmm0 ;// x4: z*z*z
subss xmm4, xmm2 ;// x4: z*z*z-x
mulss xmm4, xmm3 ;// x4: (z*z*z-x) / (3*z*z)
subss xmm0, xmm4 ;// x0: z' accurate to within about 0.3%
movss xmm3, xmm0 ;// x3: z
mulss xmm3, xmm0 ;// x3: z*z
movss xmm4, xmm3 ;// x4: z*z
mulss xmm3, xmm1 ;// x3: 3*z*z
rcpss xmm3, xmm3 ;// x3: ~ 1/(3*z*z)
mulss xmm4, xmm0 ;// x4: z*z*z
subss xmm4, xmm2 ;// x4: z*z*z-x
mulss xmm4, xmm3 ;// x4: (z*z*z-x) / (3*z*z)
subss xmm0, xmm4 ;// x0: z'' accurate to within about 0.001%
movss z, xmm0
}
return z;
}
// ~77 clocks on Pentium M vs. ~360 for powf
REALLY_INLINE float cuberoot_sse_22bits( float x )
{
float z;
static const float three = 3.0f;
_asm
{
mov eax, x // x as bits
movss xmm2, x // x2: x
movss xmm1, three // x1: 3
// Magic floating point cube root done with integer math.
// The exponent is divided by three in such a way that
// remainder bits get shoved into the top of the normalized
// mantissa.
mov ecx, eax // copy of x
and eax, 0x7FFFFFFF // exponent & mantissa of x in biased-127
sub eax, 0x3F800000 // exponent & mantissa of x in 2's comp
sar eax, 10 //
imul eax, 341 // 341/1024 ~= .333
add eax, 0x3F800000 // back to biased-127
and eax, 0x7FFFFFFF // remask
and ecx, 0x80000000 // original sign and mantissa
or eax, ecx // masked new exponent, old sign and mantissa
mov z, eax //
// use SSE to refine the first approximation
movss xmm0, z // x0: z
movss xmm3, xmm0 // x3: z
mulss xmm3, xmm0 // x3: z*z
movss xmm4, xmm3 // x4: z*z
mulss xmm3, xmm1 // x3: 3*z*z
rcpss xmm3, xmm3 // x3: ~ 1/(3*z*z)
mulss xmm4, xmm0 // x4: z*z*z
subss xmm4, xmm2 // x4: z*z*z-x
mulss xmm4, xmm3 // x4: (z*z*z-x) / (3*z*z)
subss xmm0, xmm4 // x0: z' accurate to within about 0.3%
movss xmm3, xmm0 // x3: z
mulss xmm3, xmm0 // x3: z*z
movss xmm4, xmm3 // x4: z*z
mulss xmm3, xmm1 // x3: 3*z*z
rcpss xmm3, xmm3 // x3: ~ 1/(3*z*z)
mulss xmm4, xmm0 // x4: z*z*z
subss xmm4, xmm2 // x4: z*z*z-x
mulss xmm4, xmm3 // x4: (z*z*z-x) / (3*z*z)
subss xmm0, xmm4 // x0: z'' accurate to within about 0.001%
movss xmm3, xmm0 // x3: z
mulss xmm3, xmm0 // x3: z*z
movss xmm4, xmm3 // x4: z*z
mulss xmm3, xmm1 // x3: 3*z*z
rcpss xmm3, xmm3 // x3: ~ 1/(3*z*z)
mulss xmm4, xmm0 // x4: z*z*z
subss xmm4, xmm2 // x4: z*z*z-x
mulss xmm4, xmm3 // x4: (z*z*z-x) / (3*z*z)
subss xmm0, xmm4 // x0: z''' accurate to within about 0.000012%
movss z, xmm0
}
return z;
}
// ~6 clocks on Pentium M vs. ~24 for single precision sqrtf
REALLY_INLINE float squareroot_sse_11bits( float x )
{
float z;
_asm
{
rsqrtss xmm0, x
rcpss xmm0, xmm0
movss z, xmm0 // z ~= sqrt(x) to 0.038%
}
return z;
}
// ~19 clocks on Pentium M vs. ~24 for single precision sqrtf
REALLY_INLINE float squareroot_sse_22bits( float x )
{
static float half = 0.5f;
float z;
_asm
{
movss xmm1, x // x1: x
rsqrtss xmm2, xmm1 // x2: ~1/sqrt(x) = 1/z
rcpss xmm0, xmm2 // x0: z == ~sqrt(x) to 0.05%
movss xmm4, xmm0 // x4: z
movss xmm3, half
mulss xmm4, xmm4 // x4: z*z
mulss xmm2, xmm3 // x2: 1 / 2z
subss xmm4, xmm1 // x4: z*z-x
mulss xmm4, xmm2 // x4: (z*z-x)/2z
subss xmm0, xmm4 // x0: z' to 0.000015%
movss z, xmm0
}
return z;
}
// ~12 clocks on Pentium M vs. ~16 for single precision divide
REALLY_INLINE float reciprocal_sse_22bits( float x )
{
float z;
_asm
{
rcpss xmm0, x // x0: z ~= 1/x
movss xmm2, x // x2: x
movss xmm1, xmm0 // x1: z ~= 1/x
addss xmm0, xmm0 // x0: 2z
mulss xmm1, xmm1 // x1: z^2
mulss xmm1, xmm2 // x1: xz^2
subss xmm0, xmm1 // x0: z' ~= 1/x to 0.000012%
movss z, xmm0
}
return z;
}
5 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
fast exp() approximations
Type : Taylor series approximation References : Posted by scoofy[AT]inf[DOT]elte[DOT]hu
Notes : I needed a fast exp() approximation in the -3.14..3.14 range, so I made some approximations based on the tanh() code posted in the archive by Fuzzpilz. Should be pretty straightforward, but someone may find this useful.
The increasing numbers in the name of the function means increasing precision. Maximum error in the -1..1 range:
fastexp3: 0.05 (1.8%)
fastexp4: 0.01 (0.36%)
fastexp5: 0.0016152 (0.59%)
fastexp6: 0.0002263 (0.0083%)
fastexp7: 0.0000279 (0.001%)
fastexp8: 0.0000031 (0.00011%)
fastexp9: 0.0000003 (0.000011%)
Maximum error in the -3.14..3.14 range:
fastexp3: 8.8742 (38.4%)
fastexp4: 4.8237 (20.8%)
fastexp5: 2.28 (9.8%)
fastexp6: 0.9488 (4.1%)
fastexp7: 0.3516 (1.5%)
fastexp8: 0.1172 (0.5%)
fastexp9: 0.0355 (0.15%)
These were done using the Taylor series, for example I got fastexp4 by using:
exp(x) = 1 + x + x^2/2 + x^3/6 + x^4/24 + ...
= (24 + 24x + x^2*12 + x^3*4 + x^4) / 24
(using Horner-scheme:)
= (24 + x * (24 + x * (12 + x * (4 + x)))) * 0.041666666f
Code : inline float fastexp3(float x) {
return (6+x*(6+x*(3+x)))*0.16666666f;
}
inline float fastexp4(float x) {
return (24+x*(24+x*(12+x*(4+x))))*0.041666666f;
}
inline float fastexp5(float x) {
return (120+x*(120+x*(60+x*(20+x*(5+x)))))*0.0083333333f;
}
inline float fastexp6(float x) {
return 720+x*(720+x*(360+x*(120+x*(30+x*(6+x))))))*0.0013888888f;
}
inline float fastexp7(float x) {
return (5040+x*(5040+x*(2520+x*(840+x*(210+x*(42+x*(7+x)))))))*0.00019841269f;
}
inline float fastexp8(float x) {
return (40320+x*(40320+x*(20160+x*(6720+x*(1680+x*(336+x*(56+x*(8+x))))))))*2.4801587301e-5;
}
inline float fastexp9(float x) {
return (362880+x*(362880+x*(181440+x*(60480+x*(15120+x*(3024+x*(504+x*(72+x*(9+x)))))))))*2.75573192e-6;
}
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Fast exp2 approximation
References : Posted by Laurent de Soras
Notes : Partial approximation of exp2 in fixed-point arithmetic. It is exactly :
[0 ; 1[ -> [0.5 ; 1[
f : x |-> 2^(x-1)
To get the full exp2 function, you have to separate the integer and fractionnal part of the number. The integer part may be processed by bitshifting. Process the fractionnal part with the function, and multiply the two results.
Maximum error is only 0.3 % which is pretty good for two mul ! You get also the continuity of the first derivate.
-- Laurent
Code : // val is a 16-bit fixed-point value in 0x0 - 0xFFFF ([0 ; 1[)
// Returns a 32-bit fixed-point value in 0x80000000 - 0xFFFFFFFF ([0.5 ; 1[)
unsigned int fast_partial_exp2 (int val)
{
unsigned int result;
__asm
{
mov eax, val
shl eax, 15 ; eax = input [31/31 bits]
or eax, 080000000h ; eax = input + 1 [32/31 bits]
mul eax
mov eax, edx ; eax = (input + 1) ^ 2 [32/30 bits]
mov edx, 2863311531 ; 2/3 [32/32 bits], rounded to +oo
mul edx ; eax = 2/3 (input + 1) ^ 2 [32/30 bits]
add edx, 1431655766 ; + 4/3 [32/30 bits] + 1
mov result, edx
}
return (result);
}
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Fast Float Random Numbers
References : Posted by dominik.ries[AT]gmail[DOT]com
Notes : a small and fast implementation for random float numbers in the range [-1,1], usable as white noise oscillator.
compared to the naive usage of the rand() function it gives a speedup factor of 9-10.
compared to a direct implementation of the rand() function (visual studio implementation) it still gives a speedup by a factor of 2-3.
apart from beeing faster it also provides more precision for the resulting floats since its base values use full 32bit precision.
Code : // set the initial seed to whatever you like
static int RandSeed = 1;
// using rand() (16bit precision)
// takes about 110 seconds for 2 billion calls
float RandFloat1()
{
return ((float)rand()/RAND_MAX) * 2.0f - 1.0f;
}
// direct implementation of rand() (16 bit precision)
// takes about 32 seconds for 2 billion calls
float RandFloat2()
{
return ((float)(((RandSeed = RandSeed * 214013L + 2531011L) >> 16) & 0x7fff)/RAND_MAX) * 2.0f - 1.0f;
}
// fast rand float, using full 32bit precision
// takes about 12 seconds for 2 billion calls
float Fast_RandFloat()
{
RandSeed *= 16807;
return (float)RandSeed * 4.6566129e-010f;
}
5 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Fast log2
References : Posted by Laurent de Soras Code : inline float fast_log2 (float val)
{
assert (val > 0);
int * const exp_ptr = reinterpret_cast <int *> (&val);
int x = *exp_ptr;
const int log_2 = ((x >> 23) & 255) - 128;
x &= ~(255 << 23);
x += 127 << 23;
*exp_ptr = x;
return (val + log_2);
}
6 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
fast power and root estimates for 32bit floats
Type : floating point functions References : Posted by tobybear[AT]web[DOT]de
Notes : Original code by Stefan Stenzel (also in this archive, see "pow(x,4) approximation") - extended for more flexibility.
fastpow(f,n) gives a rather *rough* estimate of a float number f to the power of an integer number n (y=f^n). It is fast but result can be quite a bit off, since we directly mess with the floating point exponent.-> use it only for getting rough estimates of the values and where precision is not that important.
fastroot(f,n) gives the n-th root of f. Same thing concerning precision applies here.
Cheers
Toby (www.tobybear.de)
Code : //C/C++ source code:
float fastpower(float f,int n)
{
long *lp,l;
lp=(long*)(&f);
l=*lp;l-=0x3F800000l;l<<=(n-1);l+=0x3F800000l;
*lp=l;
return f;
}
float fastroot(float f,int n)
{
long *lp,l;
lp=(long*)(&f);
l=*lp;l-=0x3F800000l;l>>=(n-1);l+=0x3F800000l;
*lp=l;
return f;
}
//Delphi/Pascal source code:
function fastpower(i:single;n:integer):single;
var l:longint;
begin
l:=longint((@i)^);
l:=l-$3F800000;l:=l shl (n-1);l:=l+$3F800000;
result:=single((@l)^);
end;
function fastroot(i:single;n:integer):single;
var l:longint;
begin
l:=longint((@i)^);
l:=l-$3F800000;l:=l shr (n-1);l:=l+$3F800000;
result:=single((@l)^);
end;
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Fast rounding functions in pascal
Type : round/ceil/floor/trunc References : Posted by bouba@hotmail.com
Notes : Original documentation with cpp samples:
http://ldesoras.free.fr/prod.html#doc_rounding
Code : Pascal translation of the functions (accurates ones) :
function ffloor(f:double):integer;
var
i:integer;
t : double;
begin
t := -0.5 ;
asm
fld f
fadd st,st(0)
fadd t
fistp i
sar i, 1
end;
result:= i
end;
function fceil(f:double):integer;
var
i:integer;
t : double;
begin
t := -0.5 ;
asm
fld f
fadd st,st(0)
fsubr t
fistp i
sar i, 1
end;
result:= -i
end;
function ftrunc(f:double):integer;
var
i:integer;
t : double;
begin
t := -0.5 ;
asm
fld f
fadd st,st(0)
fabs
fadd t
fistp i
sar i, 1
end;
if f<0 then i := -i;
result:= i
end;
function fround(f:double):integer;
var
i:integer;
t : double;
begin
t := 0.5 ;
asm
fld f
fadd st,st(0)
fadd t
fistp i
sar i, 1
end;
result:= i
end;
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Fast sign for 32 bit floats
References : Posted by Peter Schoffhauzer
Notes : Fast functions which give the sign of a 32 bit floating point number by checking the sign bit. There are two versions, one which gives the value as a float, and the other gives an int.
The _nozero versions are faster, but they give incorrect 1 or -1 for zero (depending on the sign bit set in the number). The int version should be faster than the Tobybear one in the archive, since this one doesn't have an addition, just bit operations.
Code : /*------------------------------------------------------------------------------------
fast sign, returns float
------------------------------------------------------------------------------------*/
// returns 1.0f for positive floats, -1.0f for negative floats
// for zero it does not work (may gives 1.0f or -1.0f), but it's faster
inline float fast_sign_nozero(float f) {
float r = 1.0f;
(int&)r |= ((int&)f & 0x80000000); // mask sign bit in f, set it in r if necessary
return r;
}
// returns 1.0f for positive floats, -1.0f for negative floats, 0.0f for zero
inline float fast_sign(float f) {
if (((int&)f & 0x7FFFFFFF)==0) return 0.f; // test exponent & mantissa bits: is input zero?
else {
float r = 1.0f;
(int&)r |= ((int&)f & 0x80000000); // mask sign bit in f, set it in r if necessary
return r;
}
}
/*------------------------------------------------------------------------------------
fast sign, returns int
------------------------------------------------------------------------------------*/
// returns 1 for positive floats, -1 for negative floats
// for 0.0f input it does not work (may give 1 or -1), but it's faster
inline int fast_sign_int_nozero(float f) {
return (signed((int&)f & 0x80000000) >> 31 ) | 1;
}
// returns 1 for positive floats, -1 for negative floats, 0 for 0.0f
inline int fast_sign_int(float f) {
if (((int&)f & 0x7FFFFFFF)==0) return 0; // test exponent & mantissa bits: is input zero?
return (signed((int&)f & 0x80000000) >> 31 ) | 1;
}
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Float to int (more intel asm)
References : Posted by Laurent de Soras (via flipcode)
Notes : [Found this on flipcode, seemed worth posting here too, hopefully Laurent will approve :) -- Ross]
Here is the code I often use. It is not a _ftol replacement, but it provides useful functions. The current processor rounding mode should be "to nearest" ; it is the default setting for most of the compilers.
The [fadd st, st (0) / sar i,1] trick fixes the "round to nearest even number" behaviour. Thus, round_int (N+0.5) always returns N+1 and floor_int function is appropriate to convert floating point numbers into fixed point numbers.
---------------------
Laurent de Soras
Audio DSP engineer & software designer
Ohm Force - Digital audio software
http://www.ohmforce.com
Code : inline int round_int (double x)
{
int i;
static const float round_to_nearest = 0.5f;
__asm
{
fld x
fadd st, st (0)
fadd round_to_nearest
fistp i
sar i, 1
}
return (i);
}
inline int floor_int (double x)
{
int i;
static const float round_toward_m_i = -0.5f;
__asm
{
fld x
fadd st, st (0)
fadd round_toward_m_i
fistp i
sar i, 1
}
return (i);
}
inline int ceil_int (double x)
{
int i;
static const float round_toward_p_i = -0.5f;
__asm
{
fld x
fadd st, st (0)
fsubr round_toward_p_i
fistp i
sar i, 1
}
return (-i);
}
5 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Float to integer conversion
References : Posted by Peter Schoffhauzer
Notes : The fld x/fistp i instructions can be used for fast conversion of floating point numbers to integers. By default, the number is rounded to the nearest integer. However, sometimes that's not what we need.
Bits 12 and 11 if the FPU control word determine the way the FPU rounds numbers. The four rounding states are:
00 = round to nearest (this is the default)
01 = round down (towards -infinity)
10 = round up(towards +infinity)
11 = truncate up(towards zero)
The status word is loaded/stored using the fldcw/fstcw instructions. After setting the rounding mode as desired, the float2int() function will use that rounding mode until the control mode is reset. The ceil() and floor() functions set the rounding mode for every instruction, which is very slow. Therefore, it is a lot faster to set the rounding mode (down or up) before processing a block, and use float2int() instead.
References: SIMPLY FPU by Raymond Filiatreault
http://www.website.masmforum.com/tutorials/fptute/index.html
MSVC (and probably other compilers too) has functions defined in <float.h> for changing the FPU control word: _control87/_controlfp. The equivalent instructions are:
_controlfp(_RC_CHOP, _MCW_RC); // set rounding mode to truncate
_controlfp(_RC_UP _MCW_RC); // set rounding mode to up (ceil)
_controlfp(_RC_DOWN, _MCW_RC); // set rounding mode to down (floor)
_controlfp(_RC_NEAR, _MCW_RC); // set rounding mode to near (default)
Note that the FPU rounding mode affects other calculations too, so the same code may give different results.
Alternatively, single precision floating point numbers can be truncated or rounded to integers by using SSE instructions cvtss2si, cvttss2si, cvtps2pi or cvttps2pi, where SSE instructions are available (which means probably on all modern CPUs). These are not discussed here in detail, but I provided the function truncateSSE which always truncates a single precision floating point number to integer, regardless of the current rounding mode.
(Also I think the SSE rounding mode can differ from the rounding mode set in the FPU control word, but I haven't tested it so far.)
Code : #ifndef __FPU_ROUNDING_H_
#define __FPU_ROUNDING_H_
static short control_word;
static short control_word2;
// round a float to int using the actual rounding mode
inline int float2int(float x) {
int i;
__asm {
fld x
fistp i
}
return i;
}
// set rounding mode to nearest
inline void set_round2near() {
__asm {
fstcw control_word // store fpu control word
mov dx, word ptr [control_word]
and dx,0xf9ff // round towards nearest (default)
mov control_word2, dx
fldcw control_word2 // load modfied control word
}
}
// set rounding mode to round down
inline void set_round_down() {
__asm {
fstcw control_word // store fpu control word
mov dx, word ptr [control_word]
or dx,0x0400 // round towards -inf
and dx,0xf7ff
mov control_word2, dx
fldcw control_word2 // load modfied control word
}
}
// set rounding mode to round up
inline void set_round_up() {
__asm {
fstcw control_word // store fpu control word
mov dx, word ptr [control_word]
or dx,0x0800 // round towards +inf
and dx,0xfbff
mov control_word2, dx
fldcw control_word2 // load modfied control word
}
}
// set rounding mode to truncate
inline void set_truncate() {
__asm {
fstcw control_word // store fpu control word
mov dx, word ptr [control_word]
or dx,0x0c00 // rounding: truncate
mov control_word2, dx
fldcw control_word2 // load modfied control word
}
}
// restore original rounding mode
inline void restore_cw() {
__asm fldcw control_word
}
// truncate to integer using SSE
inline long truncateSSE(float x) {
__asm cvttss2si eax,x
}
#endif
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Float-to-int, coverting an array of floats
References : Posted by Stefan Stenzel
Notes : intel only
Code : void f2short(float *fptr,short *iptr,int n)
{
_asm {
mov ebx,n
mov esi,fptr
mov edi,iptr
lea ebx,[esi+ebx*4] ; ptr after last
mov edx,0x80008000 ; damn endianess confuses...
mov ecx,0x4b004b00 ; damn endianess confuses...
mov eax,[ebx] ; get last value
push eax
mov eax,0x4b014B01
mov [ebx],eax ; mark end
mov ax,[esi+2]
jmp startf2slp
; Pad with nops to make loop start at address divisible
; by 16 + 2, e.g. 0x01408062, don't ask why, but this
; gives best performance. Unfortumately "align 16" does
; not seem to work with my VC.
; below I noted the measured execution times for different
; nop-paddings on my Pentium Pro, 100 conversions.
; saturation: off pos neg
nop ;355 546 563 <- seems to be best
; nop ;951 547 544
; nop ;444 646 643
; nop ;444 646 643
; nop ;944 951 950
; nop ;358 447 644
; nop ;358 447 643
; nop ;358 544 643
; nop ;543 447 643
; nop ;643 447 643
; nop ;1047 546 746
; nop ;545 954 1253
; nop ;545 547 661
; nop ;544 547 746
; nop ;444 947 1147
; nop ;444 548 545
in_range:
mov eax,[esi]
xor eax,edx
saturate:
lea esi,[esi+4]
mov [edi],ax
mov ax,[esi+2]
add edi,2
startf2slp:
cmp ax,cx
je in_range
mov eax,edx
js saturate ; saturate neg -> 0x8000
dec eax ; saturate pos -> 0x7FFF
cmp esi,ebx ; end reached ?
jb saturate
pop eax
mov [ebx],eax ; restore end flag
}
}
1 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Gaussian dithering
Type : Dithering References : Posted by Aleksey Vaneev (picoder[AT]mail[DOT]ru)
Notes : It is a more sophisticated dithering than simple RND. It gives the most low noise floor for the whole spectrum even without noise-shaping. You can use as big N as you can afford (it will not hurt), but 4 or 5 is generally enough.
Code : Basically, next value is calculated this way (for RND going from -0.5 to 0.5):
dither = (RND+RND+...+RND) / N.
\ /
\_________/
N times
If your RND goes from 0 to 1, then this code is applicable:
dither = (RND+RND+...+RND - 0.5*N) / N.
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Gaussian random numbers
Type : random number generation References : Posted by tobybear[AT]web[DOT]de
Notes : // Gaussian random numbers
// This algorithm (adapted from "Natur als fraktale Grafik" by
// Reinhard Scholl) implements a generation method for gaussian
// distributed random numbers with mean=0 and variance=1
// (standard gaussian distribution) mapped to the range of
// -1 to +1 with the maximum at 0.
// For only positive results you might abs() the return value.
// The q variable defines the precision, with q=15 the smallest
// distance between two numbers will be 1/(2^q div 3)=1/10922
// which usually gives good results.
// Note: the random() function used is the standard random
// function from Delphi/Pascal that produces *linear*
// distributed numbers from 0 to parameter-1, the equivalent
// C function is probably rand().
Code : const q=15;
c1=(1 shl q)-1;
c2=(c1 div 3)+1;
c3=1/c1;
function GRandom:single;
begin
result:=(2*(random(c2)+random(c2)+random(c2))-3*(c2-1))*c3;
end;
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Hermite Interpolator (x86 ASM)
Type : Hermite interpolator in x86 assembly (for MS VC++) References : Posted by robert[DOT]bielik[AT]rbcaudio[DOT]com
Notes : An "assemblified" variant of Laurent de Soras hermite interpolator. I tried to do calculations as parallell as I could muster, but there is almost certainly room for improvements. Right now, it works about 5.3 times (!) faster, not bad to start with...
Parameter explanation:
frac_pos: fractional value [0.0f - 1.0f] to interpolator
pntr: pointer to float array where:
pntr[0] = previous sample (idx = -1)
pntr[1] = current sample (idx = 0)
pntr[2] = next sample (idx = +1)
pntr[3] = after next sample (idx = +2)
The interpolation takes place between pntr[1] and pntr[2].
Regards,
/Robert Bielik
RBC Audio
Code : const float c_half = 0.5f;
__declspec(naked) float __hermite(float frac_pos, const float* pntr)
{
__asm
{
push ecx;
mov ecx, dword ptr[esp + 12]; //////////////////////////////////////////////////////////////////////////////////////////////////
add ecx, 0x04; // ST(0) ST(1) ST(2) ST(3) ST(4) ST(5) ST(6) ST(7)
fld dword ptr [ecx+4]; // x1
fsub dword ptr [ecx-4]; // x1-xm1
fld dword ptr [ecx]; // x0 x1-xm1
fsub dword ptr [ecx+4]; // v x1-xm1
fld dword ptr [ecx+8]; // x2 v x1-xm1
fsub dword ptr [ecx]; // x2-x0 v x1-xm1
fxch st(2); // x1-m1 v x2-x0
fmul c_half; // c v x2-x0
fxch st(2); // x2-x0 v c
fmul c_half; // 0.5*(x2-x0) v c
fxch st(2); // c v 0.5*(x2-x0)
fst st(3); // c v 0.5*(x2-x0) c
fadd st(0), st(1); // w v 0.5*(x2-x0) c
fxch st(2); // 0.5*(x2-x0) v w c
faddp st(1), st(0); // v+.5(x2-x0) w c
fadd st(0), st(1); // a w c
fadd st(1), st(0); // a b_neg c
fmul dword ptr [esp+8]; // a*frac b_neg c
fsubp st(1), st(0); // a*f-b c
fmul dword ptr [esp+8]; // (a*f-b)*f c
faddp st(1), st(0); // res-x0/f
fmul dword ptr [esp+8]; // res-x0
fadd dword ptr [ecx]; // res
pop ecx;
ret;
}
}
4 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Hermite interpollation
References : Posted by various
Notes : These are all different ways to do the same thing : hermite interpollation. Try'm all and benchmark.
Code : // original
inline float hermite1(float x, float y0, float y1, float y2, float y3)
{
// 4-point, 3rd-order Hermite (x-form)
float c0 = y1;
float c1 = 0.5f * (y2 - y0);
float c2 = y0 - 2.5f * y1 + 2.f * y2 - 0.5f * y3;
float c3 = 1.5f * (y1 - y2) + 0.5f * (y3 - y0);
return ((c3 * x + c2) * x + c1) * x + c0;
}
// james mccartney
inline float hermite2(float x, float y0, float y1, float y2, float y3)
{
// 4-point, 3rd-order Hermite (x-form)
float c0 = y1;
float c1 = 0.5f * (y2 - y0);
float c3 = 1.5f * (y1 - y2) + 0.5f * (y3 - y0);
float c2 = y0 - y1 + c1 - c3;
return ((c3 * x + c2) * x + c1) * x + c0;
}
// james mccartney
inline float hermite3(float x, float y0, float y1, float y2, float y3)
{
// 4-point, 3rd-order Hermite (x-form)
float c0 = y1;
float c1 = 0.5f * (y2 - y0);
float y0my1 = y0 - y1;
float c3 = (y1 - y2) + 0.5f * (y3 - y0my1 - y2);
float c2 = y0my1 + c1 - c3;
return ((c3 * x + c2) * x + c1) * x + c0;
}
// laurent de soras
inline float hermite4(float frac_pos, float xm1, float x0, float x1, float x2)
{
const float c = (x1 - xm1) * 0.5f;
const float v = x0 - x1;
const float w = c + v;
const float a = w + v + (x2 - x0) * 0.5f;
const float b_neg = w + a;
return ((((a * frac_pos) - b_neg) * frac_pos + c) * frac_pos + x0);
}
9 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Linear interpolation
Type : Linear interpolators for oversampled audio References : Posted by scoofy[AT]inf[DOT]elte[DOT]hu
Notes : Simple, fast linear interpolators for upsampling a signal by a factor of 2,4,8,16 or 32. Not very usable on their own since they introduce aliasing (but still better than zero order hold). These are best used with already oversampled signals.
-- Peter Schoffhauzer
Code : #ifndef __LIN_INTERPOLATOR_H_
#define __LIN_INTERPOLATOR_H_
/************************************************************************
* Linear interpolator class *
************************************************************************/
class interpolator_linear
{
public:
interpolator_linear() {
reset_hist();
}
// reset history
void reset_hist() {
d1 = 0.f;
}
// 2x interpolator
// out: pointer to float[2]
inline void process2x(float const in, float *out) {
out[0] = d1 + 0.5f*(in-d1); // interpolate
out[1] = in;
d1 = in; // store delay
}
// 4x interpolator
// out: pointer to float[4]
inline void process4x(float const in, float *out) {
float y = in-d1;
out[0] = d1 + 0.25f*y; // interpolate
out[1] = d1 + 0.5f*y;
out[2] = d1 + 0.75f*y;
out[3] = in;
d1 = in; // store delay
}
// 8x interpolator
// out: pointer to float[8]
inline void process8x(float const in, float *out) {
float y = in-d1;
out[0] = d1 + 0.125f*y; // interpolate
out[1] = d1 + 0.25f*y;
out[2] = d1 + 0.375f*y;
out[3] = d1 + 0.5f*y;
out[4] = d1 + 0.625f*y;
out[5] = d1 + 0.75f*y;
out[6] = d1 + 0.875f*y;
out[7] = in;
d1 = in; // store delay
}
// 16x interpolator
// out: pointer to float[16]
inline void process16x(float const in, float *out) {
float y = in-d1;
out[0] = d1 + (1.0f/16.0f)*y; // interpolate
out[1] = d1 + (2.0f/16.0f)*y;
out[2] = d1 + (3.0f/16.0f)*y;
out[3] = d1 + (4.0f/16.0f)*y;
out[4] = d1 + (5.0f/16.0f)*y;
out[5] = d1 + (6.0f/16.0f)*y;
out[6] = d1 + (7.0f/16.0f)*y;
out[7] = d1 + (8.0f/16.0f)*y;
out[8] = d1 + (9.0f/16.0f)*y;
out[9] = d1 + (10.0f/16.0f)*y;
out[10] = d1 + (11.0f/16.0f)*y;
out[11] = d1 + (12.0f/16.0f)*y;
out[12] = d1 + (13.0f/16.0f)*y;
out[13] = d1 + (14.0f/16.0f)*y;
out[14] = d1 + (15.0f/16.0f)*y;
out[15] = in;
d1 = in; // store delay
}
// 32x interpolator
// out: pointer to float[32]
inline void process32x(float const in, float *out) {
float y = in-d1;
out[0] = d1 + (1.0f/32.0f)*y; // interpolate
out[1] = d1 + (2.0f/32.0f)*y;
out[2] = d1 + (3.0f/32.0f)*y;
out[3] = d1 + (4.0f/32.0f)*y;
out[4] = d1 + (5.0f/32.0f)*y;
out[5] = d1 + (6.0f/32.0f)*y;
out[6] = d1 + (7.0f/32.0f)*y;
out[7] = d1 + (8.0f/32.0f)*y;
out[8] = d1 + (9.0f/32.0f)*y;
out[9] = d1 + (10.0f/32.0f)*y;
out[10] = d1 + (11.0f/32.0f)*y;
out[11] = d1 + (12.0f/32.0f)*y;
out[12] = d1 + (13.0f/32.0f)*y;
out[13] = d1 + (14.0f/32.0f)*y;
out[14] = d1 + (15.0f/32.0f)*y;
out[15] = d1 + (16.0f/32.0f)*y;
out[16] = d1 + (17.0f/32.0f)*y;
out[17] = d1 + (18.0f/32.0f)*y;
out[18] = d1 + (19.0f/32.0f)*y;
out[19] = d1 + (20.0f/32.0f)*y;
out[20] = d1 + (21.0f/32.0f)*y;
out[21] = d1 + (22.0f/32.0f)*y;
out[22] = d1 + (23.0f/32.0f)*y;
out[23] = d1 + (24.0f/32.0f)*y;
out[24] = d1 + (25.0f/32.0f)*y;
out[25] = d1 + (26.0f/32.0f)*y;
out[26] = d1 + (27.0f/32.0f)*y;
out[27] = d1 + (28.0f/32.0f)*y;
out[28] = d1 + (29.0f/32.0f)*y;
out[29] = d1 + (30.0f/32.0f)*y;
out[30] = d1 + (31.0f/32.0f)*y;
out[31] = in;
d1 = in; // store delay
}
private:
float d1; // previous input
};
#endif
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Matlab Time Domain Impulse Response Inverter/Divider
References : Posted by arcane[AT]arcanemethods[DOT]com
Notes : Matlab code for time domain inversion of an impulse response or the division of two of them (transfer function.) The main teoplitz function is given both as a .m file and as a .c file for Matlab'w MEX compilation. The latter is much faster.
Code : function inv=invimplms(den,n,d)
% syntax inv=invimplms(den,n,d)
% den - denominator impulse
% n - length of result
% d - delay of result
% inv - inverse impulse response of length n with delay d
%
% Levinson-Durbin algorithm from Proakis and Manolokis p.865
%
% Author: Bob Cain, May 1, 2001 arcane[AT]arcanemethods[DOT]com
m=xcorr(den,n-1);
m=m(n:end);
b=[den(d+1:-1:1);zeros(n-d-1,1)];
inv=toepsolve(m,b);
function quo=divimplms(num,den,n,d)
%Syntax quo=divimplms(num,den,n,d)
% num - numerator impulse
% den - denominator impulse
% n - length of result
% d - delay of result
% quo - quotient impulse response of length n delayed by d
%
% Levinson-Durbin algorithm from Proakis and Manolokis p.865
%
% Author: Bob Cain, May 1, 2001 arcane@arcanemethods.com
m=xcorr(den,n-1);
m=m(n:end);
b=xcorr([zeros(d,1);num],den,n-1);
b=b(n:-1:1);
quo=toepsolve(m,b);
function hinv=toepsolve(r,q)
% Solve Toeplitz system of equations.
% Solves R*hinv = q, where R is the symmetric Toeplitz matrix
% whos first column is r
% Assumes all inputs are real
% Inputs:
% r - first column of Toeplitz matrix, length n
% q - rhs vector, length n
% Outputs:
% hinv - length n solution
%
% Algorithm from Roberts & Mullis, p.233
%
% Author: T. Krauss, Sept 10, 1997
%
% Modified: R. Cain, Dec 16, 2004 to remove a pair of transposes
% that caused errors.
n=length(q);
a=zeros(n+1,2);
a(1,1)=1;
hinv=zeros(n,1);
hinv(1)=q(1)/r(1);
alpha=r(1);
c=1;
d=2;
for k=1:n-1,
a(k+1,c)=0;
a(1,d)=1;
beta=0;
j=1:k;
beta=sum(r(k+2-j).*a(j,c))/alpha;
a(j+1,d)=a(j+1,c)-beta*a(k+1-j,c);
alpha=alpha*(1-beta^2);
hinv(k+1,1)=(q(k+1)-sum(r(k+2-j).*hinv(j,1)))/alpha;
hinv(j)=hinv(j)+a(k+2-j,d)*hinv(k+1);
temp=c;
c=d;
d=temp;
end
-----What follows is the .c version of toepsolve--------
#include <math.h>
#include "mex.h"
/* function hinv = toepsolve(r,q);
* TOEPSOLVE Solve Toeplitz system of equations.
* Solves R*hinv = q, where R is the symmetric Toeplitz matrix
* whos first column is r
* Assumes all inputs are real
* Inputs:
* r - first column of Toeplitz matrix, length n
* q - rhs vector, length n
* Outputs:
* hinv - length n solution
*
* Algorithm from Roberts & Mullis, p.233
*
* Author: T. Krauss, Sept 10, 1997
*
* Modified: R. Cain, Dec 16, 2004 to replace unnecessasary
* n by n matrix allocation for a with an n by 2 rotating
* buffer and to more closely match the .m function.
*/
void mexFunction(
int nlhs,
mxArray *plhs[],
int nrhs,
const mxArray *prhs[]
)
{
double (*a)[2],*hinv,alpha,beta;
int c,d,temp,j,k;
double eps = mxGetEps();
int n = (mxGetN(prhs[0])>=mxGetM(prhs[0])) ? mxGetN(prhs[0]) : mxGetM(prhs[0]) ;
double *r = mxGetPr(prhs[0]);
double *q = mxGetPr(prhs[1]);
a = (double (*)[2])mxCalloc((n+1)*2,sizeof(double));
if (a == NULL) {
mexErrMsgTxt("Sorry, failed to allocate buffer.");
}
a[0][0]=1.0;
plhs[0] = mxCreateDoubleMatrix(n,1,0);
hinv = mxGetPr(plhs[0]);
hinv[0] = q[0]/r[0];
alpha=r[0];
c=0;
d=1;
for (k = 1; k < n; k++) {
a[k][c] = 0;
a[0][d] = 1.0;
beta = 0.0;
for (j = 1; j <= k; j++) {
beta += r[k+1-j]*a[j-1][c];
}
beta /= alpha;
for (j = 1; j <= k; j++) {
a[j][d] = a[j][c] - beta*a[k-j][c];
}
alpha *= (1 - beta*beta);
hinv[k] = q[k];
for (j = 1; j <= k; j++) {
hinv[k] -= r[k+1-j]*hinv[j-1];
}
hinv[k] /= alpha;
for (j = 1; j <= k; j++) {
hinv[j-1] += a[k+1-j][d]*hinv[k];
}
temp=c;
c=d;
d=temp;
} /* loop over k */
mxFree(a);
return;
}
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
MIDI note/frequency conversion
Type : - References : Posted by tobybear[AT]web[DOT]de
Notes : I get often asked about simple things like MIDI note/frequency conversion, so I thought I could as well post some source code about this.
The following is Pascal/Delphi syntax, but it shouldn't be a problem to convert it to almost any language in no time.
Uses for this code are mainly for initializing oscillators to the right frequency based upon a given MIDI note, but you might also check what MIDI note is closest to a given frequency for pitch detection etc.
In realtime applications it might be a good idea to get rid of the power and log2 calculations and generate a lookup table on initialization.
A full Pascal/Delphi unit with these functions (including lookup table generation) and a simple demo application can be downloaded here:
http://tobybear.phreque.com/dsp_conv.zip
If you have any comments/suggestions, please send them to: tobybear@web.de
Code : // MIDI NOTE/FREQUENCY CONVERSIONS
const notes:array[0..11] of string= ('C ','C#','D ','D#','E ','F ','F#','G ','G#','A ','A#','B ');
const base_a4=440; // set A4=440Hz
// converts from MIDI note number to frequency
// example: NoteToFrequency(12)=32.703
function NoteToFrequency(n:integer):double;
begin
if (n>=0)and(n<=119) then
result:=base_a4*power(2,(n-57)/12)
else result:=-1;
end;
// converts from MIDI note number to string
// example: NoteToName(12)='C 1'
function NoteToName(n:integer):string;
begin
if (n>=0)and(n<=119) then
result:=notes[n mod 12]+inttostr(n div 12)
else result:='---';
end;
// converts from frequency to closest MIDI note
// example: FrequencyToNote(443)=57 (A 4)
function FrequencyToNote(f:double):integer;
begin
result:=round(12*log2(f/base_a4))+57;
end;
// converts from string to MIDI note
// example: NameToNote('A4')=57
function NameToNote(s:string):integer;
var c,i:integer;
begin
if length(s)=2 then s:=s[1]+' '+s[2];
if length(s)<>3 then begin result:=-1;exit end;
s:=uppercase(s);
c:=-1;
for i:=0 to 11 do
if notes[i]=copy(s,1,2) then
begin
c:=i;
break
end;
try
i:=strtoint(s[3]);
result:=i*12+c;
except
result:=-1;
end;
if c<0 then result:=-1;
end;
3 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Millimeter to DB (faders...)
References : Posted by James McCartney
Notes : These two functions reproduce a traditional professional
mixer fader taper.
MMtoDB converts millimeters of fader travel from the
bottom of the fader for a 100 millimeter fader into
decibels. DBtoMM is the inverse.
The taper is as follows from the top:
The top of the fader is +10 dB
100 mm to 52 mm : -5 dB per 12 mm
52 mm to 16 mm : -10 dB per 12 mm
16 mm to 4 mm : -20 dB per 12 mm
4 mm to 0 mm : fade to zero. (in these functions I go to -200dB
which is effectively zero for up to 32 bit audio.)
Code : float MMtoDB(float mm)
{
float db;
mm = 100. - mm;
if (mm <= 0.) {
db = 10.;
} else if (mm < 48.) {
db = 10. - 5./12. * mm;
} else if (mm < 84.) {
db = -10. - 10./12. * (mm - 48.);
} else if (mm < 96.) {
db = -40. - 20./12. * (mm - 84.);
} else if (mm < 100.) {
db = -60. - 35. * (mm - 96.);
} else db = -200.;
return db;
}
float DBtoMM(float db)
{
float mm;
if (db >= 10.) {
mm = 0.;
} else if (db > -10.) {
mm = -12./5. * (db - 10.);
} else if (db > -40.) {
mm = 48. - 12./10. * (db + 10.);
} else if (db > -60.) {
mm = 84. - 12./20. * (db + 40.);
} else if (db > -200.) {
mm = 96. - 1./35. * (db + 60.);
} else mm = 100.;
mm = 100. - mm;
return mm;
}
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Motorola 56300 Disassembler
Type : disassembler References : Posted by chris_townsend[AT]digidesign[DOT]com Linked file : Disassemble56k.zip
Notes : This code contains functions to disassemble Motorola 56k opcodes. The code was originally created by Stephen Davis at Stanford. I made minor modifications to support many 56300 opcodes, although it would nice to add them all at some point. Specifically, I added support for CLB, NORMF, immediate AND, immediate OR, multi-bit ASR/ASL, multi-bit LSL/LSR, MAX, MAXM, BRA, Bcc, BSR, BScc, DMAC, MACsu, MACuu, and conditional ALU instructions.
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Noise Shaping Class
Type : Dithering with 9th order noise shaping References : Posted by cshei[AT]indiana.edu Linked file : NS9dither16.h
Notes : This is an implemetation of a 9th order noise shaping & dithering class, that runs quite fast (it has one function that uses Intel x86 assembly, but you can replace it with a different rounding function if you are running on a non-Intel platform). _aligned_malloc and _aligned_free require the MSVC++ Processor Pack, available from www.microsoft.com. You can replace them with "new" and "delete," but allocating aligned memory seems to make it run faster. Also, you can replace ZeroMemory with a memset that sets the memory to 0 if you aren't using Win32.
Input should be floats from -32768 to 32767 (processS will clip at these points for you, but clipping is bad when you are trying to convert floats to shorts). Note to reviewer - it would probably be better if you put the code in a file such as NSDither.h and have a link to it - it's rather long.
(see linked file)
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Piecewise quadratic approximate exponential function
Type : Approximation of base-2 exponential function References : Posted by Johannes M.-R.
Notes : The code assumes round-to-zero mode, and ieee 754 float.
To achieve other bases, multiply the input by the logarithmus dualis of the base.
Code : inline float fpow2(const float y)
{
union
{
float f;
int i;
} c;
int integer = (int)y;
if(y < 0)
integer = integer-1;
float frac = y - (float)integer;
c.i = (integer+127) << 23;
c.f *= 0.33977f*frac*frac + (1.0f-0.33977f)*frac + 1.0f;
return c.f;
}
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
pow(x,4) approximation
References : Posted by Stefan Stenzel
Notes : Very hacked, but it gives a rough estimate of x**4 by modifying exponent and mantissa.
Code : float p4fast(float in)
{
long *lp,l;
lp=(long *)(&in);
l=*lp;
l-=0x3F800000l; /* un-bias */
l<<=2; /* **4 */
l+=0x3F800000l; /* bias */
*lp=l;
/* compiler will read this from memory since & operator had been used */
return in;
}
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
rational tanh approximation
Type : saturation References : Posted by cschueler
Notes : This is a rational function to approximate a tanh-like soft clipper. It is based on the pade-approximation of the tanh function with tweaked coefficients.
The function is in the range x=-3..3 and outputs the range y=-1..1. Beyond this range the output must be clamped to -1..1.
The first to derivatives of the function vanish at -3 and 3, so the transition to the hard clipped region is C2-continuous.
Code : float rational_tanh(x)
{
if( x < -3 )
return -1;
else if( x > 3 )
return 1;
else
return x * ( 27 + x * x ) / ( 27 + 9 * x * x );
}
5 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Real basic DSP with Matlab (+ GUI) ...
Type : Like effects racks, made with Matlab ! References : Posted by guillaume[DOT]carniato[AT]meletu[DOT]univ-valenciennes[DOT]fr Linked file : http://www.xenggeng.fr.st/ici/guitou/Matlab Music.zip
Notes : You need Matlab v6.0 or more to run this stuff...
Code : take a look at http://www.xenggeng.fr.st/ici/guitou/Matlab Music.zip
I'm now working on a Matlab - sequencer, which will certainly use 'Matlab Music'. I'm interested in integrating WaveWarp in this project; it's a toolbox that allow you to make real time DSP with Matlab.
If you're ok to improve this version (add effects, improve effects quality,anything else...) let's go ! Email me if you're interested in developing this beginner work...
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
real value vs display value
Type : Macro References : Posted by emil[AT]arpanet[DOT]no
Notes :
REALVAL converts the vst param at given ranges to a display value.
VSTVAL does the opposite.
a = start
b = end
Code : #define REALVAL(a, b, vstval) (a + (vstval)*(b-a))
#define VSTVAL(a, b, realval) ((realval-a)/(b-a))
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Really fast x86 floating point sin/cos
References : Posted by rerdavies[AT]msn[DOT]com Linked file : sincos.zip
Notes : Frightful code from the Intel Performance optimization front. Not for the squeamish.
The following code calculates sin and cos of a floating point value on x86 platforms to 20 bits precision with 2 multiplies and two adds. The basic principle is to use sin(x+y) and cos(x+y) identities to generate the result from lookup tables. Each lookup table takes care of 10 bits of precision in the input. The same principle can be used to generate sin/cos to full (! Really. Full!) 24-bit float precision using two 8-bit tables, and one 10 bit table (to provide guard bits), for a net speed gain of about 4x over fsin/fcos, and 8x if you want both sin and cos. Note that microsoft compilers have trouble keeping doubles aligned properly on the stack (they must be 8-byte aligned in order not to incur a massive alignment penalty). As a result, this class should NOT be allocated on the stack. Add it as a member variable to any class that uses it.
e.g.
class CSomeClass {
CQuickTrig m_QuickTrig;
...
mQuickTrig.QuickSinCos(dAngle,fSin,fCos);
...
}
Code : (see attached file)
no comments on this item | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Reasonably accurate/fastish tanh approximation
References : Posted by Fuzzpilz
Notes : Fairly obvious, but maybe not obvious enough, since I've seen calls to tanh() in code snippets here.
It's this, basically:
tanh(x) = sinh(x)/cosh(x)
= (exp(x) - exp(-x))/(exp(x) + exp(-x))
= (exp(2x) - 1)/(exp(2x) + 1)
Combine this with a somewhat less accurate approximation for exp than usual (I use a third-order Taylor approximation below), and you're set. Useful for waveshaping.
Notes on the exp approximation:
It only works properly for input values above x, but since tanh is odd, that isn't a problem.
exp(x) = 1 + x + x^2/(2!) + x^3/(3!) + ...
Breaking the Taylor series off after the third term, I get
1 + x + x^2/2 + x^3/6.
I can save some multiplications by using
6 + x * (6 + x * (3 + x))
instead; a division by 6 becomes necessary, but is lumped into the additions in the tanh part:
(a/6 - 1)/(a/6 + 1) = (a - 6)/(a + 6).
Accuracy:
I haven't looked at this in very great detail, but it's always <= the real tanh (>= for x<0), and the greatest deviation occurs at about +/- 1.46, where the result is ca. .95 times the correct value.
This is still faster than tanh if you use a better approximation for the exponential, even if you simply call exp.
There are probably additional ways of improving parts of this, and naturally if you're going to use it you'll want to figure out whether your particular application offers additional ways of simplifying it, but it's a good start.
Code : /* single precision absolute value, a lot faster than fabsf() (if you use MSVC++ 6 Standard - others' implementations might be less slow) */
float sabs(float a)
{
int b=(*((int *)(&a)))&0x7FFFFFFF;
return *((float *)(&b));
}
/* approximates tanh(x/2) rather than tanh(x) - depending on how you're using this, fixing that could well be wasting a multiplication (though that isn't much, and it could be done with an integer addition in sabs instead) */
float tanh2(float x)
{
float a=sabs(x);
a=6+a*(6+a*(3+a));
return ((x<0)?-1:1)*(a-6)/(a+6); /* instead of using <, you can also check directly whether the sign bit is set ((*((int *)(&x)))&0x80000000), but it's not really worth it */
}
16 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
resampling
Type : linear interpolated aliased resampling of a wave file References : Posted by mail[AT]mroc[DOT]de
Notes : som resampling stuff. the code is heavily used in MSynth, but do not lough about ;-)
perhaps, prefiltering would reduce aliasing.
Code : signed short* pSample = ...;
unsigned int sampleLength = ...;
// stretch sample to length of one bar...
float playPosDelta = sampleLength / ( ( 240.0f / bpm ) * samplingRate );
// requires for position calculation...
float playpos1 = 0.0f;
unsigned int iter = 0;
// required for interpolation...
unsigned int i1, i2;
float* pDest = ....;
float* pDestStop = pDest + len;
for( float *pt=pDest;pt<pDestStop;++pt )
{
// linear interpolation...
i1 = (unsigned int)playpos;
i2 = i1 + 1;
(*pt) = ((pSample[i2]-pSample[i1]) * (playpos - i1) + pSample[i1]);
// position calculation preventing float sumation error...
playpos1 = (++iter) * playposIncrement;
}
...
4 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Saturation
Type : Waveshaper References : Posted by Bram
Notes : when the input is below a certain threshold (t) these functions return the input, if it goes over that threshold, they return a soft shaped saturation.
Neigther claims to be fast ;-)
Code : float saturate(float x, float t)
{
if(fabs(x)<t)
return x
else
{
if(x > 0.f);
return t + (1.f-t)*tanh((x-t)/(1-t));
else
return -(t + (1.f-t)*tanh((-x-t)/(1-t)));
}
}
or
float sigmoid(x)
{
if(fabs(x)<1)
return x*(1.5f - 0.5f*x*x);
else
return x > 0.f ? 1.f : -1.f;
}
float saturate(float x, float t)
{
if(abs(x)<t)
return x
else
{
if(x > 0.f);
return t + (1.f-t)*sigmoid((x-t)/((1-t)*1.5f));
else
return -(t + (1.f-t)*sigmoid((-x-t)/((1-t)*1.5f)));
}
}
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Sin(x) Aproximation (with SSE code)
References : Posted by williamk[AT]wusik[DOT]com
Notes : Sin Aproximation: sin(x) = x + ( x * (-x * x / 6));
This is very handy and fast, but not precise. Below you will find a simple SSE code.
Remember that all movaps command requires 16 bit aligned variables.
Code : SSE code for computing only ONE value (scalar)
Replace all "ss" with "ps" if you want to calculate 4 values. And instead of "movps" use "movaps".
movss xmm1, xmm0 ; xmm0 = x
mulss xmm1, Filter_GenVal[k_n1] ; * -1
mulss xmm1, xmm0 ; -x * x
divss xmm1, Filter_GenVal[k_6] ; / 6
mulss xmm1, xmm0
addss xmm0, xmm1
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|
|
 |
Sin, Cos, Tan approximation
References : http://www.wild-magic.com Linked file : approx.h
Notes : Code for approximation of cos, sin, tan and inv sin, etc.
Surprisingly accurate and very usable.
[edit by bram]
this code is taken literaly from
http://www.wild-magic.com/SourceCode.html
Go have a look at the MgcMath.h and MgcMath.cpp files in their library...
[/edit]
2 comment(s) | add a comment | nofrills version |
|
 |
|
|
|
|