12-31-06 - 1

God I hate New Year's Eve. Fucking expectations.

I got the new Firefox (2.something) and WTF it still blows chunks. Pages take much longer to load & format than in IE and they pop all over during formating in evil ways. I had to get "SetBrowser" and "FireTune" to even get it working right. I wish I could fucking use it because I want the better security and ad blocking and such, but I'm not sure I can handle the craptacularity. Also, you can't drag & drop url's between IE and Firefox which makes it impossible to use a hybrid browsing approach. Blah blah I know about initialpaint delay.


12-30-06 - 2

The "Sunken City" in San Pedro seems pretty cool, I'd like to go. Nice blog here on the Ruins of So Cal

12-30-06 - 1

WTF, there's a "Google Music" search now !? Since when? Unfortunately it doesn't actually seem to be useful at all, since I can't tell it to only find full-length free mp3's.


12-29-06 - 1

The color R=60 G=60 B=40 is perceived as "olive green". Apparently I have trouble with that color, I often see it as "brown". Of course it's really a desaturated dark orange, which is what brown is, but any brown that's even close to being even on green is perceived as very green. Perhaps this is related to the whole "we see green better than any other color" thing, but maybe not.


12-25-06 - 1

Merry Christmas! I'm bored.

I'm not around any really little kids this year, which sucks.


12-22-06 - 3

We have set up Iraq to be a defacto Shi'ite country with strong ties to Iran. It's headed down that path and we don't have the will or the competence to stop it. As in all places, the "terrorists" are those who can't pretend to be part of the government. In Iraq, the Shi'ites are committing horrible acts of terror through execution and kidnapping squads which either dress up as police or are police; this government-sponsored terror is being more and more tolerated by the US because it provides "stability"; just in the days of Kissinger's realpolitik (and later when we supported Saddam), we will back the evil government if they can control the opposition and provide profit opportunities for America's businesses. On the other hand, the Sunnis are becoming disenfranchised and can see that as a minority they will be cast out of power, dislocated, and stripped of rights. Just like the Shi'ites they engage in reprisal killings and attacks against those who they see as threats, but since they don't have the cloak of government authority, they are the "terrorists".

12-22-06 - 2

The public transportation in SF is really good, it's super nice if you live in the city, you really don't need a car. Bart and Muni are great, but the buses sort of suck. There's no reason for them to suck and I propose two very easy fixes which would make them awesome. 1) Don't stop so often. Most of the bus lines through the city stop on every block. This makes the ride much slower and means they come to each stop much less often. This is not a small factor, it's like a 2X ride time factor, or maybe more. They could stop every 3 blocks. Presumably all this stopping is motivated by the disabled or elderly, but they had to walk to get to the stop anyway and a few blocks is not a significant increase. 2) Get rid of tickets/fares. A huge amount of the time is spent loading/unloading, and the majority of that is caused by tickets/fares. One problem is people have to get on the front instead of both doors because of fare checking. The other problem is just the clog at the entrance with people trying to run bills or show tickets. The amount of money taken in fares can't really be that significant. In my opinion the best solution would be to subsidise it completely from tax funds. This would be a big boon to the poor since the rich basically never ride the bus, which would be a massive aid to economic development. It lets people get to school or jobs more easily, and bus rides will not be abused since it's so unpleasant. Conservatives hate these grants for economic development, though they think massive tax cuts or business subsidies are justified to "stimulate the economy" (even though they don't).

12-22-06 - 1

Deborah Solomon is a fucking condescending ignorant bitch as usual. She contends that numerical ratings for books are foolish, that you need the personalization of criticism. This is an egotistical fallacy. One of the things which is proved by Netflix is that people are really not very unique at all. A simple global predictor like (0.6 * movie_average + 0.4 * customer_average) gets around 0.99 RMSE on the movie data. The standard personalized predictors in the literature (which find most similar other users and take their ratings) get around 0.98; the very best in the literature get around 0.95. The most sophisticated predictors get around 0.90 RMSE now. That's a microscopic improvement over the simple average-based predictor, which indicates that people are far more the same than they want to admit. The reason people do not believe this is that they like to pretend they are so unique, so interesting, that their taste is better than the average shmuck.

Meh, most of my logic is flawed here. I'm a condescending ignorant ass.


12-21-06 - 3

There's a hacky way of solving coupled non-linear systems which I've never really seen documented. The numerical people will puke, but it works incredibly well in almost all cases and is very simple to implement. I dunno, maybe this is well known but I've never seen it really described.

Say you have some function F(A,B) where A and B are vectors. You wish to solve F(A,B) = 0 , (or perhaps minimize an error E(A,B), so take the derivative to get F's). Do some algebra and substitutions until you have two seperate functions that depend only on the other variable :

A = G(B)
B = H(A)

Where G and H may be arbitrary nonlinear functions so you can't just plug them together and solve it. If B was the right answer for B then A =G(B) would give you the exact solution for A and vice-versa.

The hacky solution is basically just to set A from G, then B from H, and repeat. There are some simple hacks to make it better.

First imagine the solutions converging as a time series, so we have A_t and B_t. First of all it's important to have a pretty good initial guess for A_0 and B_0 (actually just for one or the other) because this method doesn't work if you're very far from the right answer.

The next hacky trick is to take less than the full step for the first few steps. A full step would be :

A_t_1 = G(B_t)
B_t_1 = H(A_t)
( A_t_1 should be read as A_(t+1) , it's A at the next time step)

Instead take a fractional step of size f (f in 0 - 1) :

A_t_1 = (1-f) * A_t + f * G(B_t)
B_t_1 = (1-f) * B_t + f * H(A_t)

This avoid oscillations and helps you settle in. You can start with f = 0.5 and quickly step it up 0.6,0.7. For some problems you might want to have "f" max out at 0.99 or something rather than 1.0

The next hack which helps a lot is to use A_t_1 instead of A_t when solving for B_t_1 :

A_t_1 = G(B_t)
B_t_1 = H(A_t_1)

This sort of lets you take a more exact step. If you want you can swap the order of A & B on every other step, or do some sort of fancier thing like :

A_int = 0.5 * A_t + 0.5 * G(B_t)
B_t_1 = H(A_int)
A_t_1 = G(B_t_1)

which is sort of like a centered-time integrator or something though I have no idea if that's actually better. Actually this whole thing sort of reminds me of implicit integrators.

Anyway, in practice this type of iteration converges incredibly fast. In problems where I've used Newton's method or Conjugate Gradient or something and it's taken like 500 iterations to converge, this iteration took 20 steps. Obviously it only works on certain types of problems and they have to have well behaved local minima, but when it works it works great.

To be a bit more concrete I'll give you a specific example. I've been examining the "weighted SVD", and one step of that is the approximation of a matrix with a rank-1 tensor (that's just an outer product of two vectors). The problem is :

Given matrix A_ij
and weights w_ij
Find vectors U_i and V_j such that the error
E = Sum_ij { w_ij * (A_ij - U_i*V_j) }
is minimized

If you take derivatives of E with respect to U and V you can solve for two equations :

U_i = Sum_j { A_ij * w_ij * V_j } / Sum_j { w_ij * (V_j^2) } = H(V)
V_j = Sum_i { A_ij * w_ij * U_i } / Sum_i { w_ij * (U_i^2) } = G(U)

Directly solving these is impossible (I think), but they're exactly in the form I need for my hacky iteration, and in fact it works super well.

I guess the reason no one likes this approach is that on general problems it can very badly fail to converge.

Addendum : this approach is also great for equations in one variable by substitution. For example say you have some evil equation to solve like :

3 + 7*x + sin(x) + sin(x)^2 = 0

Just set y = sin(x) and pretend you have equations in two variables. Now :

x = (- 3 - y - y*y)/7
y = sin(x)

Are the two equations and you just use the alternate stepping approach. The half-step iteration I described here converges to within FLT_EPSILON in 5 steps !!

12-21-06 - 2

I hate Christmas shopping so much. First off I think of how it's all a manufactured shit-storm for retailers to make money; then when I go into a nice shop I feel so uncomfortable with the people that are hovering, can I help yous and such, sometimes I feel like I have to buy something if I give them a lot of trouble, then I think fuck them I'll show by not buying anything; then I get into this thing where I've passed up a few decent gifts, so then I can't settle for anything worse than what I passed up, I have keep looking for something better as it becomes more and more obvious that's not going to happen; then I start doubting if the person would even want this crap I'm buying, what a waste of money they probably won't like it they'll just get it and think how thoughtless I am and throw it in the back of their closet, or maybe they'll keep it but every time they wear it / use it they'll think "what a piece of crap" and think of me; most things that people would actually want they've either gotten for themselves or know how to get better anyway since they actually like them and like to shop for them and know the good stores; you can never buy anyone clothing because clothing is all about the fit and 99% of christmas gift clothing goes straight to the thrift store.

12-21-06 - 1

WTF, why do reporters and newspapers even bother asking this administration questions anymore? You know they're just going to completely ignore what you're getting at and spout one of their stock phrases. In my newspaper the entire content of all the administration statements would be "The White House released more PR babble today." and then we'd move on to information about reality.

I hear a lot of women say that they admire or respect Condy Rice for her acheivements and for the fact that she's a woman with a top role in the government. That sentiment disgusts me. I find her of the same cloth as Colin Powell, who is also despicable. Someone who is clearly intelligent and rational is even more accountable for their actions, and by making themselves totally subservient to a madman they have given up all their integrity to further their careers. They know that they're lying, they know they're trying to cover up gross misdeeds and incompetence, they know their actions are leading to the death of innocents; sometimes you can even see it on their faces in an interview where they take a tough question and they almost wince before they shake it off and spout the awkward lies again. Someone like Rumsfeld I almost can excuse because he's clearly just loony tunes or drunk on his own kool-aid.


12-19-06 - 1

You have two terms A and B and wish to combine them. They might be errors, predictions, whatever. There are several interesting possibilities.

First of all, generally the output should be "linear" in A's and B's. That is, sqrt(A*B) is good but A*B is not, because it gives you 2 powers of A or B. We think like physics and assume these things have units, the output should have the same units.

Secondly, we must be aware of scales. If A & B are scaled in exactly the same way so their numeric values have the same significance, then (A+B) is a good combiner. If they are not scaled the same, then any forms which add A & B are not okay. In that case you only really have one option :

sqrt(A*B) * G(A/B)

Often A & B have the same significance which means the output should be symmetric in swap A <-> B. In that case the G function has limitted forms. I haven't thought about exactly what they can be, but it has to be things like (A/B) + (B/A). In fact if A & B aren't on a similar scale even that form is not okay.

If we assume A & B are on the same scale, then additive forms are okay and it opens up some other options. (A+B)/2 is obviously your first guess.

2AB / (A+B) is an interesting combiner. If an A&B are in [0,1] then this takes (0,x)->0 , (.5,.5) -> .5 and (1,1) -> 1, and sort of penalizes when they're not equal. It takes (x,x) -> x which is a nice property of any combiner when you're trying to make a combiner that can stand in for (A+B)/2.

sqrt(A^2 + B^2) is another, and then you can take simple multiples of these, which gives you forms like (A^2 + B^2)/(2AB).

Anyway the point is that there really aren't very many forms to choose from which satisfy the basic properties and you can easily try them all.


12-16-06 - 3

How to combine two (continuous) expert predictions :

Say you have two experts. They make predictions which have a Guassian error. The expert provides both the prediction (the center of the Gaussian) and his best estimate of the accuracy of the prediction (the sdev of the Gaussian, which is the sqrt of the variance), call this P & S, so you have {P1,S1} and {P2,S2}.

We're going to make a combine prediction by guessing a weight for each expert (here W1 and W2). The combined prediction is then :

Pred = (W1 * P1 + W2 * P2) / (W1 + W2)

(or) Pred = P1 + (W2 / (W1+W2) ) * (P2 - P1)
Pred = P1 + F * (P2 - P1)
F = 1 / (1 + W1/W2)

This latter form is more useful in practice because it allows you to apply arbitrary scaling to each term so you can run them through an lsqr or nnet or whatever. (assuming W1 and W2 never go to zero; if they do then just choose the other pred)

The ideal combination of experts in general is a context-dependent problem, but a general good way to weight them is via the entropy of the prediction. If you knew the instantaneous entropy of each prediction, H, the ideal weight would be :

W = e^( - Beta * H )

for some Beta which in general depends on the problem. In practice, Beta = 2 is usually very good.

The entropy of a Gaussian can be analytically integrated. The answer is :

H(Gauss) = (1/2) * ln(2pi) + ln(S)

Where S = the sdev (sigma), recall. Since we assumed our predictors were Gaussian we can use this. But we can simplify :

W = e^( - Beta * H ) = C * e^( - Beta * ln(S) ) = C * ( S^-Beta )

Where we put a bunch of constants in C, and they don't matter because it's an overall scaling of W which divides out.

F = 1 / (1 + W1/W2)
F = 1 / (1 + S1^-Beta/S2^-Beta
F = 1 / ( 1 + (S2/S1)^Beta )

If we use V = S^2 (V is the "variance" of the Gaussian, or the MSE), and choose the common case of Beta=2, then we have a very simple expression :

F = 1 / ( 1 + V2/V1 )

or F = V1 / (V1 + V2)

and this is the same as W1 = V2 and W2 = V1

Which is in fact a very good way to combine Gaussian predictions.

12-16-06 - 2

Volf's switching work is very hard to find on the web because he took his page down, and it hasn't been much publicized. The best links are at the bottom of Willems' CTW page , which in itself is a very nice page and gathers all the CTW papers which is great stuff (and also generally applicable to expert weighting).

12-16-06 - 1

Sean's released his game Beat It! from IGJ4; the concept is you play an arcade game and by doing so you make music, and hopefully as you play you just sort of get into a groove where you're making music and that's also beating the aliens. I don't think it completely works, but you definitely feel it once in a while and can tell the concept would work.


12-15-06 - 2

In the continuing needle in a haystack which is the mess of undocumented amazing things that Google offers, I just discovered code.google ; they have some super awesome things there, including SDK's to programatically interface with most of their stuff, as well as just useful code libraries such as perf tools which contains a very good malloc (supposedly). They also have their own "hash_map" which I imagine is pretty strong.

Related to my perpetual improved ranting method search, the new Google Blogger (now in beta) will have an SDK for code access, so I imagine it will be easy to write an app to throw raw text at it via Blogger GData

12-15-06 - 1

The next big step in the internet is the unification of the web, blogs, communities, forums, and recommender systems. Recommenders like Netflix do a good job of correlating me to people of similar taste; that should lead me to their blogs (on topics I want); similarly if I subscribe to someone's blog on a topic, that should add metadata which improves recommendations for me. All the communities and forums are cool but they're totally isolated in these little pockets; it should be one internet-wide community where you can see your "friends" and such across web sites, eg. if I browse CNet looking at product reviews I can see my friends' comments and add forum posts right there, etc.


12-13-06 - 1

Some random updates from the ML field :

1. The FANN Neural Net library is actually pretty decent. It's missing a lot of functions you would want, but it's reasonable easy to write them yourself.

2. SVDLIBC is a nice little implementation of the old "Lanczos" SVD algorithm which is fast and good. It comes from the old FORTRAN math goodness. One amazing little routine I found in there is "Machar" which will deduce the properties of your FPU by doing a few simple math operations!

3. If you search google for things like "SVD learning unknown missing" you will find about 100 papers; this is a major field of research, a lot of it driven by computer vision.

4. Simon's spilled the beans on his Netflix approach. His description of the "GHA" (Generalized Hebbian Algorithm) is way way clearer than the paper. The Hebbian is actually a type of Neural Network, and I find this approach much easier to understand if you just think of it in terms of doing a gradient descent to optimize the SVD vectors. I also found that Gorrell has released source code for her own GHA Lab


12-12-06 - 1

Tutorial on SVD and LSI for use in search; which is roughly the same way it's used for recommender systems.


12-10-06 - 1

Don Mitchell has some really nice papers on computer graphics, mainly on sampling and aliasing (or the lack thereof). It's a pleasure to read well written scientific papers, they're very clear and elegant. I highly recommend reading a few even if you aren't directly working on those subjects, they will make all your work better.


12-09-06 - 1

I've been playing with Google Adwords to try to improve the results for my sister's Pilates Studio . Wow, Adwords is a real piece of garbage and it's the thing that makes Google worth $100 gajillion dollars or whatever? It's like totally impossible to work with and get good results. I want to do completely obvious things, like for example I'd like the site to be prominent if someone searches "Pilates" and they live in NY, or if someone searches for "Pilates NY" and they live anywhere, but if they search "Pilates" and live anywhere, then I don't want to pay a lot for an add there. I have zero ability to do that in Adwords, you can either completely location-lock your listing or not. There's no way to tweak your settings and test results, they sort of have a search test thing but it doesn't seem at all useful; I want to try tweaking my cost per click and such to see what kind of different placing it gets me and the response is not real time so you can't see how things change. The whole keyword system seems rather messed up too. You'd think Google has a smart search engine, your add should just be placed on searches where your site would have ranked anyway, just maybe not on the first page, eg. you should get synonyms and context and all that. Instead you have to manually enter keywords and you have to manually specify synonyms etc. Lastly it's charging her a lot of money that's not coming from search clicks; the other charges are from adds in the "content network" and other places, and I can't find out anything about where those ads actually are to tell if they're useful or not. All in all, I'm extremely un-impressed.


12-06-06 - 1

The original developer has redone the classic game "Another World" for Windows XP in high res. You can get the demo here . It's one of my favorite games of all time; I have no idea if it still holds up at all. The gameplay even at the time was pretty annoying puzzle stuff, with that digital movement ala Prince of Persia or the old Oddworld stuff.


12-03-06 - 1

Evaluating the "Need A Win" hypothesis.

A lot of people in Sports Betting use the motivation of the two teams as a basis for choosing the side to bet. The theory is that the team that needs a win is more likely to play to full effort and beat the spread. I've always been skeptical of these theories and thought I'd try to statistically examine how important this theory actually is.

I examined all the NFL games from 1999 to 2006. For each game I made a rough evaluation of which of the teams "needed a win". Only games in weeks 8-14 were examined, which is the heart of the playoff run. Teams that "need a win" were any teams with a record >= 50% wins and with a record worse than 8-2. Generally these are the teams on the bubble for their division or a wild-card spot. Note that this isn't a totally precise measure of "need a win" but it should be good enough to see some statistical correlation if any exists.

Here are the results. There are four categories :

cat 0 = both teams don't need a win
cat 1 = road team needs a win
cat 2 = home team needs a win
cat 3 = both teams need a win

The results are (home wins)-(home losses)-(pushes)
ATS = Against The Spread

32 teams, 1992 games

cat 0 WLP : 827-578-0
cat 0 ATS : 678-683-44
cat 1 WLP : 84-125-0
cat 1 ATS : 103-100-6
cat 2 WLP : 138-65-0
cat 2 ATS : 94-102-7
cat 3 WLP : 105-69-1
cat 3 ATS : 91-76-8

In particular look at categories 1 and 2. The team that needs a win is 263-149 vs. teams that don't need a win. However, they are only 194-205 ATS !!

Now, is that W-L record even significant? The fact is most of the time you have a "need a win" vs. "dont need a win" matchup it's a pretty good team vs. a very bad team, so it's no surprise that they win a lot more.

For comparison I did the same study, but just grouped based on teams with a record >= 50% against teams with a record < 50% , eg. winning teams vs. losing teams.

32 teams, 1992 games
cat 0 WLP : 219-136-0
cat 0 ATS : 177-169-9
cat 1 WLP : 186-268-0
cat 1 ATS : 224-213-17
cat 2 WLP : 328-111-0
cat 2 ATS : 216-211-12
cat 3 WLP : 421-322-1
cat 3 ATS : 349-368-27

Winnings teams are 596-297 against losing teams, which is actually a better ratio than the "need a win" vs "dont need a win".

It's possible that a better measure of "need a win" would reveal a slight statistical significance, but the absense of any here indicates it is not strong at all.


12-02-06 - 1

So, this whole show is on Youtube if you click around the Related links; it's amazing, they did a ton of songs : RadioHead at Le Reservoir

Also Radiohead Giglink Dump is full of great live shows. Check out Thom Yorke's solo at the Bridge School.


12-01-06 - 2

I found these really nice J2K slides if you want to learn about JPEG2000.

I couldn't find this info easily anywhere on the web, so here you go. Lossless color transforms :

Color transform notes :

JPEG-LS (LOCO-I) uses :

{RGB} -> { (R-G), G, (B-G) }

	and the inverse is obvious
	BTW this is very nice in that you can go 8bit -> 8bit using modulo arithmetic,
	many of the others require 9 bit YUV coefficients.

J2K (aka RCT aka CREW) uses :

{RGB} ->
	Y = (R + 2G + B)/4
	U = R-G
	V = B-G
	G = Y - (U+V)/4;
	R = U + G
	B = V + G

	(the divides are floored)

YCoCg is :
(similar to "RCT" but decorrelates better)

lossy :

	Y = (R + 2G + B)/4
	Co= (R-B)/2
	Cg= (2G - R - B)/4

	G = Y + Cg
	R = Y - Cg + Co
	B = Y - Cg - Co

lossless :
	(Co and Cg are effectively scaled up by 2)

	Co = R-B
	t = B + (Co/2)
	Cg = G-t
	Y = t + (Cg/2)

	s = Y - (Cg/2)
	G = Cg + s
	B = s - (Co/2)
	R = B + Co

In general with the lossless YUV type transforms, the chromatic coefficients are roughly 2X bigger than they should be in a perceptual space in order to have the bits to get back to RGB exactly.

Note that YUV was created for perceptual uniformity, NOT for decorrelation, so it's no surprise that you can beat it for decorrelation. In fact, doing the 3x3 KLT and sending the coefficients is totally worth it if you're doing compression. (actually YUV was created for some weird use in televisions)

12-01-06 - 1

Something else I've never really seen written up clearly. We do ubyte to float color conversions all the time and don't really think much about it. There are two obvious ways to do it, and there are trade offs with each. You can either map the full range or preserve the end points, not both.


int colors are in [0,255]

Int pel "x" is considered to represent the span {x-0.5,x+0.5}/255

float colors are in [0,1]

int 0 -> float 0
int 255 -> float 1

Note that this way is nice in that it keeps the maximums the same,
 but it's bad in that it wastes color space.  That is, there's only 1/255
 precision instead of 1/256 which is totally possible.

Another way of seeing it is that the int range [0,255] is actually being
mapped to the float range { -0.5/255 , 255.5/255 } , that is, {0,1} is not
the full range.

One nice thing about this though is that it gives you some slop in your float->int.
As long as the float is in { -0.5/255 , 255.5/255 } you don't need to clamp.

// more pure quantizer way :
// int value [i] represents the bucket (i,i+1)/256

inline int		ColorFTOI(float f)
	return ftoi(f * 256.f);

inline float	ColorITOF(int i)
	return (i + 0.5f) * (1.f/256.f);

// way that maps 0<->0 and 255<->1.0

inline int		ColorFTOI(float f)
	return ftoi(f * 255.f + 0.5f);

inline float	ColorITOF(int i)
	return i * (1.f/255.f);

old rants