12/20/2011

12-20-11 - Grocery Store Lines

Grocery store lines are a microcosm for how fucking shitty almost every human is, in almost every way.

First of all, you have the fact that nobody shows any basic human courtesy to each other. I almost never see someone with a ton of items let ahead the person with one item. But of course when I let the person with one item go ahead of me, they invariably do something fucked up like ask for a pack of cigarettes (which always takes forever in US grocery stores) or pay with food stamps or some shit. (aside : why is it always such a clusterfuck to pay with food stamps in some groceries? they must have done it a million times, but the checker always acts like someone handed them monopoly money, and the manager has to get called, and forms are filled out, wtf). Of course the people who are paying with coupons and civil war scrip never warn the person lining up behind them that maybe they should pick a different line.

But when the lines get long you really start to see people's souls.

There's the people who stand around and chat right in the middle of the lines. I watch people over and over asking "are you in line? oh, no? okay". Hmm, maybe you should get the fuck out of the line area to have your chit chat!

Then there's the people who can't seem to run a line in a reasonable direction and wind up blocking all the aisles or running it into another line. Invariably it takes a manager to come over and tell people to "please line up over here" since god knows they aren't going to sort themselves out.

Then you get the people who start stamping around and huffing and quickly looking from one side to another like this is the greatest injustice since slavery. You can just see wheels spinning in their heads about how "ridiculous this is" and so on.

There are the people who think that being really pushy in the back of the line is going to speed things up. We're eight people away from the register and they keep jamming their cart into my feet because the person three ahead of us moved. I get out of line to grab a magazine (leaving my cart) and they push into the gap where my body was. Whoah, slow down dick-face, we're still twenty feet from the register, you can chill a little now.

On the flip side then is the people who are absurdly slow about getting their checkout done with. (and of course to double-down on dickishness, it's often the same people who were impatient and pushy when they were way back in line (*)).

There are two general classes of people who fail to check out quickly :

1. The epically incompetent. These people pay with a check, either because they are ancient geezers (excusable) or because they think cards are somehow inferior or checks are more convenient (inexecusable). They might go digging around in their purse for half an hour trying to find exact change, or somehow still don't know they can scan their card before the checker is done.

2. The intentionally slow. These people think everyone needs to chill and slow down; what's the rush? They might chat with the checker a bit. They think everyone else is rotten for being in such a hurry. OMG you epic douchebag; it's fine if you want to live a slow, relaxed, life, but it's not okay to impose that on everyone behind you in line. You probably drive slowly too and think that everyone behind you is in the wrong for wanting to go faster. You probably have a "keep your laws off my body" bumper sticker, and fail to see that your own behavior is the same kind of selfish forcing of your values on others.

(* = the double-dick seems to be the norm for airplane passengers; who are reliably annoying and pushy and do a lot of huffing when they are way back in line, but when they actually get up to the TSA guy they still have their shoes on, don't have their ID in hand, are drinking a gallon of liquid, and act like it's some big surprise. Same thing with the overhead bin stowage of course).

12/19/2011

12-19-11 - SRAM

SRAM is rolling out a big promotional campaign this year, trying to convince people that their components are actually superior.

They've signed up lots of the pro teams. Just as with car racing, do not be mislead by what the pros use. I see lots of morons on forums saying "well the race team uses this, it must be great". No, the reason the race team uses it is because they are paid to use it.

SRAM double-tap shifters are fucking *awful*. Absolutely retarded. Imagine your right mouse button was taken away and instead you had to double-tap the left to accomplish that function. Yep, it's horrible.

Double-press GUI is always horrible and always should only be used as a last resort. We use it sometimes in games because there just aren't enough buttons on console controllers, but a smart game designer knows that only the secondary functions should go on double-tap buttons (the same goes for "hold" buttons) and the twitch functions should be on their own dedicated button.

Actually it's even worse than that. They don't do the right thing when you're at the edge of the gear shift limits. So like if you are at the low end, you can't go any lower (which you would accomplish by double-tapping), it will still let you single tap (to up-shift). So you're riding up a steep hill and you want a lower gear, you go to double-tap, and oh fuck half way through it the lever won't let you do the double-tap, but you've already single-tapped. There's no way to back out of it, when you let go it will up-shift you and you'll be fucked.

The STI system is just one million billion times better. But it's patented. That's why all these shift levers are so dang expensive, because they're patented.

It's also why there has to be a new lever system every year, a new size of bottom bracket, a new headset system - it's so that the manufacturer can patent it and/or make an exclusive line so that they can rip you off. The old system was perfectly fine functionally, the problem with it was that generic brands were starting to come out with cheap decent components for that system. We can't have that.

It's just like medicine of course, though with medicine it's much more diabolical.

Certainly with medicine it's obvious that there should be laws that prevent the pointless pushing of the new expensive product when it's not actually any better than cheap old solutions.

But I think it would actually be in the world's best interest to have a similar law for everything. It would be hard to phrase and hard to enforce, but the idea is something like - you must make components that are compatible with others on the market unless the incompatibility is for a necessary functional reason. It's actually much better for the free market and competition of products can plug and play and the consumer can choose based on pice and functionality, not compatibilty with some bullshit proprietary interface.

One that annoys me is car parts; most of the car parts for a Porsche or BMW or whatever are actually identical to the ones for a cheaper car (like a VW for example) - but they intentionally make the interface ever so slightly different so that you can't just go buy the cheaper part. The parts are all made by Bosch or whoever major part supplier anyway, it's not like you get a better brand of part for the money. The interesting thing to me is that the car maker doesn't really benefit from this, it's the part maker who does, so there must be some kind of collusion where the car maker gets a kickback in exchange for using the proprietary part.

Maybe the most obvious example is car wheels. Wheels are wheels, there's no need for them to be car specific, but the auto manufacturers intentionally use different bolt spacings (5x130, 4x110, etc) so that you can't go buy cheap mass market wheels for your fancy car. You can cross-shop the exact same wheel with different bolt spacings, and the price difference can be 2X or more.

12/17/2011

12-17-11 - LZ Optimal Parse with A Star Part 4

Continuing ...
Part 1
Part 2
Part 3

So we have our A star parse from last time.

First of all, when we "early out" we still actually fill out that hash_node. That is, you pop a certain "arrival", then you evaluate the early out conditions and decide this arrival is not worth pursuing. You need to make a hash_node and mark it as a dead end, so that when you pop earlier arrivals that see this node, they won't try to visit it again.

One option would be to use a separate hash of just bools that mark dead ends. This could be a super-efficient smaller hash table of bit flags or bloom filters or something, which would save memory and perhaps speed.

I didn't do this because you can get some win from considering parses that have been "early outed". What you do is when you decide to "early out" an arrival, you will not walk to any future nodes that are not yet done, but you *will* consider paths that go to nodes that were already there. In pseudo-code :


pop an arrival

check arrival early outs and just set a flag

for all coding choices at current pos
{
  find next_node
  if next_node exists
    compute cost to end
  else
    if ! early out flag
       push next_node on arrivals stack
}

So the early out stops you from creating any new nodes in the graph walk that you wouldn't have visited anyway, but you can still find new connections through that graph. What this lets you do in practice is drive the early out thresholds tighter.

The other subtlety is that it helps a lot to actually have two (or more) stages of early out. Rather than just stop consider all exit coding choices once you don't like your arrival, you have a couple of levels. If your arrival looks sort of bad but not terrible, then you still consider some of the coding choices. Instead of considering 8 or 16 coding choices, you reduce it to 2 or 4 which you believe are likely advantageous.

The exact details depend on the structure of your back end coder, but some examples of "likely advantangeous" coding choices that you would consider in the intermediate early out case : if you have a "repeat recent offset" structure like LZX/LZMA, then those are obvious things to include in the "likely advantageous". Another one might be RLE or continue-previous type of match codes. Another would be if the literal codes below a certain number of bits with the current statistics. Also the longest match if it's longer than a certain amount.

Okay, so our A star is working now, but we have a problem. We're still just not getting enough early outs, and if you ran this on a big file it will take forever (sometimes).

The solution is to use another aspect we expect from our LZ back end, which is "semi-locality". Locality means that a decision we make now will not have a huge effect way in the future. Yes, it has some effect, because it may change the state and that affects the future, but over time the state changes so many times and adapts to future coding that the decision 4000 bytes ago doesn't matter all that much.

Another key point is that the bad (slow) case occurs when there are lots of parses that cost approximately the same. Because of our early out structure, if there is a really good cheap parse we will generally converge towards it, and then the other choices will be more expensive and they will early out and we won't consider too many paths. We only get into bad degeneracy if there are lots of parses with similar cost. And the thing is, in that case we really don't care which one we pick. So when we find an area of the file that has a huge branching factor that's hard to make a decision about, we are imperfect but it doesn't cost us much overall.

The result is that we can cut up the file to make the parse space tractable. What I do is work in "quanta". You take the current chunk of the file as your quantum and parse it as if it was its own little file. The parse at the beginning of the quantum will be mostly unaffected by the quantum cut, but the parse at the end will be highly affected by the false EOF, so you just throw it out. That is, advance through the first 50% or 75% of the parse, and then start the next quantum there.

There is one special case for the quantum cutting which is long matches that extend past the end of the quantum. What you would see is when outputting the first 50% of the parse, the last code will be a match that goes to the end of the quantum. Instead I just output the full length of the match. This is not ideal but the loss is negligible.

For speed you can go even further and use adaptive quantum lens. On highly degenerate parts of the file, there may be a huge node space to parse that doesn't get early-out'ed. When you detect one of these, you can just reduce the quantum len for that part of the file. eg. you start with a quantum length of 4096 ; if as you are parsing that quantum you find that the hash table occupancy is beyond some threshold (like 1 million nodes for example), you decide the branching factor is too great and reduce the quantum length to 2048 and resume parsing on just the beginning of that chunk. You might hit 1 million nodes again, then you reduce to 1024, etc.

That's it! Probably a followup post with some results numbers and maybe some more notes about subtle issues. I could also do several long posts about ideas I tried that didn't work which I think are sort of interesting.

12-17-11 - LZ Optimal Parse with A Star Part 3

Continuing ...
Part 1
Part 2

At the end of Part 2 we looked at how to do a forward LZSS optimal parse. Now we're going to add adaptive "state" to the mix.

Each node in the walk of parses represents a certain {Pos,State} pair. There are now too many possible nodes to store them all, so we can't just use an array to store all {Pos,State} nodes we have visited. So hopefully we will not visit them all, so we will store them in a hash table.

We are parsing forward, so for any node we visit (a {Pos,State} will be called a "node") we know how we got there. There can be many ways of reaching the same node, but we only care about the cheapest one. So we only need to store one entering link into each node, and the total cost from the beginning of the path to get to that node.

If you think about the flow of how the forward LZSS parse completes, it's sort of like an ice tendril reaching out which then suddenly crystalizes. You start at the beginning and you are always pushing the longest length choice first - that is, you are taking big steps into the parse towards the end without filling in all the gaps. Once you get to the end with that first long path (which is actually the greedy parse - the parse made by taking the longest match available at each step), then it starts popping backwards and filling in all the gaps. It then does all the dense work, filling backwards towards the beginning.

So it's like the parse goes in two directions - reaching from the beginning to get to the end (with node that don't have enough information), and then densely bubbling back from the end (and making final decisions). (if I was less lazy I would make a video of this).

Anyhoo, we'll make that structure more explicit. The hash table, for each node, stores the cost to get to the end from that node, and the coding choice that gives that cost.

The forward parse uses entry links, which I will henceforth call "arrivals". This is a destination node (a {pos,state}), and the cost from the beginning. (you don't need to store how you got here from the beginning since that can be reproduced at the end by rewalking from the beginning).


Full cost of parse through this node =

arrival.cost_from_head + hash_node.cost_to_tail

Once a node has a cost in the hash table, it is done, because it had all the information it needed at that node. But more arrivals can come in later as we fill in the gaps, so the full cost from the beginning of the parse to the end of the parse is not known.

Okay, so let's start looking at the parse, based on our simple LZSS pseudo-code from last time :


hash table of node-to-end costs starts empty
stack of arrivals from head starts empty

Push {Pos 1,state initial} on stack of arrivals

While stack is not empty :

pop stack; gives you an arrival to node {P,state}

see if node {P,state} is already in the hash
if so
{
  total cost is arrival.cost_from_head + hash_node.cost_to_tail
  done with this arrival
  continue (back to stack popping);
}

For each coding choice {C} at the current pos
{
  find next_state = state transition from cur state after coding choice C
  next_pos = P + C.len
  next_node = {next_pos,next_state]

  if next_node is in the hash table :
  {
    compute cost to end from code cost of {C} plus next_node.cost_to_tail
  }
  else
  {
    push next_node to the arrivals stack (*1)
  }
}

if no pushes were done
{
  then processing of current node is done
  choose the best cost to end from the choices above
  create a node {P,state} in the hash with that cost
}

(*1 = if any pushes are done, then the current node is also repushed first (before other pushes). The pushes should be done in order from lowest pos to highest pos, just as with LZSS, so that the deep walk is done first).

So, we have a parse, but it's walking every node, which is way too many. Currently this is a full graph walk. What we need are some early outs to avoid walking the whole thing.

The key is to use our intuition about LZ parsing a bit. Because we step deep first, we quickly get one parse for the whole segment (the greedy parse). Then we start stepping back and considering variations on that parse.

The parse doesn't collapse the way it did with LZSS because of the presence of state. That is, say I parsed to the end and now I'm bubbling back and I get back to some pos P. I already walked the long length, so I'm going to consider a shorter one. When I walk to the shorter one with LZSS, then states I need would already be done. But now, the nodes aren't done, but importantly the positions have been visited. That is -


At pos P, state S
many future node positions are already done
 (I already walked the longest match length forward)

eg. maybe {P+3, S1} and {P+5, S2} and {P+7, S3} have been done

I a shorter length now; eg. to {P+2,S4}

from there I consider {P+5, S5}

the node is not done, but a different state at P+5 was done.

If the state didn't matter, we would be able to reuse that node and collapse back to O(N) like LZSS.

Now of course state does matter, but crucially it doesn't matter *that much*. In particular, there is sort of a limit on how much it can help.

Consider for example if "state" is some semi-adaptive statistics. Those statistics are adaptive, so if you go far enough into the future, the state will adapt to the coding parse, and the initial state won't have helped that much. So maybe the initial state helps a lot for the next 8 coding steps. And maybe it helps at most 4 bits each time. Then having a better initial state can help at most 32 bits.

When you see that some other parse has been through this same position P, albeit with different state at this position, if that parse has completed and has a total cost, then we know it is the optimal cost through that node, not just the greedy parse or whatever. That is, whenever a hash node has a cost_to_tail, it is the optimal parse cost to tail. If there is a good parse later on in the file, the optimal parse is going to find that parse, even if it starts from a non-ideal state.

This is the form of our early outs :


When you pop an arrival to node {P,S} , look at the best cost to arrive to pos P for any state, 

if arrival.cost_from_head - best_cost_from_head[P] > threshold
  -> early out

if arrival.cost_from_head + best_cost_to_tail[P] > best_cost_total + threshold
  -> early out

where we've introduced two arrays that track the best seen cost to head & tail at each pos, regardless of state. We also keep a best total cost, which is initially set to infinity until we get through a total parse, and then is updated any time we see a new whole-walk cost.

This is just A star. From each node we are trying to find a lower bound for the cost to get to the end. What we use is previous encodings from that position to the end, and we assume that starting from a different state can't help more than some amount.

Next time, some subtleties.

12-17-11 - LZ Optimal Parse with A Star Part 2

Okay, optimal parsing with A star. (BTW "optimal" parsing here is really a misnomer that goes back to the LZSS backwards parse where it really was optimal; with a non-trivial coder you can't really do an optimal parse, we really mean "more optimal" (than greedy/lazy type parses)).

Part 1 was just a warmup, but may get you in the mood.

The reason for using A Star is to handle LZ parsing when you have adaptive state. The state changes as you step through the parse forward, so it's hard to deal with this in an LZSS style backwards parse. See some previous notes on backwards parsing and LZ here : 1 , 2 , 3

So, the "state" of the coder is something like maybe an adaptive statistical mode, maybe the LZMA "markov chain" state machine variable, maybe an LZX style recent offset cache (also used in LZMA). I will assume that the state can be packed into a not too huge size, maybe 32 bytes or so, but that the count of states is too large to just try them all (eg. more than 256 states). (*1)

(*1 - in the case that you can collapse the entire state of the coder into a reasonably small number of states (256 or so) then different approaches can be used; perhaps more on this some day; but basically any adaptive statistical state or recent offset makes the state space too large for this).

Trying all parses is impossible even for the tiniest of files. At each position you have something like 1-16 options. (actually sometimes more than 16, but you can limit the choices without much penalty (*2)). You always have the choice of a literal, when you have a match there are typically several offsets, and several lengths per offset to consider. If the state of the coder is changed by the parse choice, then you have to consider different offsets even if they code to the same number of bits in the current decision, because they affect the state in the future.

(*2 - the details of this depend on the back end of coder; for example if your offset coder is very simple, something like just Golomb type (NOSB) coding, then you know that only the shortest offset for a given length needs to be considered, another simplification used in LZMA, only the longest length for a given offset is considered; in some coders it helps to consider shorter length choices as well; in general for a match of Length L you need to consider all lengths in [2,L] but in practice you can reduce that large set by picking a few "inflection points" (perhaps more on this some day)).

Okay, a few more generalities. Let's revisit the LZSS backwards optimal parser. It came from a forward style parser, which we can implement with "dynamic programming" ; like this :


At pos P , consider the set of possible coding choices {C}

For each choice (ci), find the cost of the choice, plus the cost after that choice :
{

  Cost to end [ci] = Current cost of choice C [ci] + Best cost to end [ P + C[ci].len ]

}

choose ci as best Cost to end
Best code to end[ P ] = Cost to end [ best ci ]

You may note that if you do this walking forward, then the "Best cost to end" at the next position may not be computed yet. If so, then you suspend the current computation and step ahead to do that, then eventually come back and finish the current decision.

Of course with LZSS the simpler way to do it is just to parse backwards from the end, because that ensures the future costs are already done when you need them. But let's stick with the forward parse because we need to introduce adaptive state.

The forward parse LZSS (with no state) is still O(N) just like the backward parse (this time cost assumes the string matching is free or previously done, and that you consider a fixed number of match choices, not proportional to the number of matches or length of matches, which would ruin the O(N) property) - it just requires more book keeping.

In full detail a forward LZSS looks like this :


Set "best cost to end" for all positions to "uncomputed"

Push Pos 1 on stack of needed positions.

While stack is not empty :

pop stack; gives you a pos P

If any of the positions that I need ( P + C.len ) are not done :
{
  push self (P) back on stack
  push all positions ( P + C.len ) on stack
    in order from lowest to highest pos
}
else
{
  make a choice as above and fill "best cost to end" at pos P
}
If you could not make a choice the first time you visit pos P, then because of the order that we push things on the stack, when you come back and pop P the second time it's gauranteed that everything needed is done. Therefore each position is visited at most twice. Therefore it's still O(N).

We push from lowest to highest len, so that the pops are highest pos first. This makes us do later positions first; that way earlier positions are more likely to have everything they need already done.

Of course with LZSS this is silly, you should just go backwards, but we'll use it to inspire the next step.

To be continued...

12/12/2011

12-12-11 - Things I want and cannot find

1. A true "sauna" near Seattle. A hot room right next to a lake, so I can steam it up and then go jump in the cold lake. We're in the perfect place for it, we have lots of nice cold swimmable lakes, and there are tons of Swedes around here, and yet I can't find one.

There are plenty of "saunas" at spas and health clubs, but without the lake swim it's fucking bullshit. I imagine some rich guy has one at his lakefront home, but I can't get the hookup.

2. A secluded cabin rental. I like the idea of going out in the woods and writing code alone. I can wear some flannel and chop wood for the fire like a real Northwesterner. But all the cabin rentals I can find are in sort of "cabin communities" or near a highway, or some shit. I want a place where you can look out of the big picture window and just see scenery for miles.

3. A good river swim. I found a bunch in CA but I can't find any up here. An ideal river swim has a nice deep "hole" due to rocks or waterfall or something. It should be a 4-5 mile hike from the closest parking to cut down on traffic (or be up a rough road or something, not just right off a highway). Ideally it should not be straight out of snow melt so that it's not ball-breaking freezing cold even in the middle of summer.

4. A nice place to ride. A country lane, no cars, good pavement. God damn I miss this.

12-12-11 - Sense

One of the most important skills in an employee is the sense to know when to ask for help and when not to. To know when they should just make a decision on their own vs. ask to make sure their choice is okay. To know when they need to call a meeting about something vs. when not to disturb others about it.

It's incredibly rare actually to find someone who has the sense to get this just right. I think it's very undervalued.

When you're a manager, the most awesome thing you can have is an employee you can trust. That means no unpleasant surprises. If you give them a task, it will be done on time, or you will be notified with enough notice to take action. You won't find out that they're slipping when it's too late to do anything about it. You won't have them claim to be done and then upon inspection find out that they've done it all wrong. You can just assign the task off and then you don't have to worry about it any more. You don't have to follow up and keep pinging them for status updates.

Someone with a great deal of sense will just know to give you status updates at the appropriate intervals. Not too often that they waste your time, but not too infrequently - they should always come in just before you start wondering "WTF happened to this task?".

One of the most crucial things is knowing what decisions they need to get approval for. It sucks to have an employee who asks about every little thing. "should I put this button here or here? should I make another file for this code or put it in this file?" Just make a fucking decision yourself, I don't care! But it also sucks to have someone go off and do all kinds of crazy shit without asking, like "oh yeah I ripped out the old animation system and am doing a new one" ; uh, you did what? and you didn't ask me first? WTF. Both are very common.

Of course the definition of the "right amount of approval" depends on the manager, and a key part of having good "sense" is actually social adaptation - it's about adapting to your situation and learning what is wanted of you. Many of the type-A left-brain coders never get this; part of your job as an employee is always interacting with other human beings, even if it's only with your boss, and there is no rational absolute answer about the right way to communicate, you have to feel it out and adapt.

Of course part of the role of a good manager is to teach these things, and to help people who may have good skills but not much "sense".

It's actually more annoying in personal life than in business life. For example you're having a dinner party and somebody volunteers to bring the wine, and then they show up with none, or they show up with a box of ripple. WTF dude, I could have just gotten it myself, if you're going to drop the ball, you need to notify someone with sufficient warning.

The annoying thing about the non-business world is you can't check up on them; like "hey can you give me a status update on that wine purchasing?" because you would be considered a huge dick.


A lot of this goes along with what I call "basic professionalism". Like if I assign you a crucial task that I need done today, don't go home without checking in with me and telling me it's done or not. If you think I assigned you too much and you can't get it done in time, don't go pout, come and tell me about it.

Another aspect of "basic professionalism" is knowing when to shut up. Like if you think the company is going in the wrong direction - raise the issue to your managers, that's good, if you have a good boss they want that feedback. But after they call a meeting and everyone disagrees with you and the decision is made to go on the path you don't like - it's time to shut up about it. We don't want to hear complaints every day.

A related aspect is knowing who it's appropriate to say things to. When we have someone from the publisher touring the studio, that is not the time to point out that you don't like the design of the lead character.

"Basic professionalism" is sort of a level below having good "sense" but it's also actually surprisingly hard to find.


One of the worst situations is to have someone who is not great about "sense" or "basic professionalism" but is touchy about it. Most people are not perfect on these points, and that's okay, but if you're not then you need a certain amount of supervision. That's just the way work gets done, but some people act like it's a personal affront to be monitored.

Like they occasionally drop the ball on tasks, you decide, okay I just have to ask for daily status reports. Then they get all pissy about it, "don't you trust me" or it's "too much beaurocracy" blah blah.

Or if they don't come to you and ask questions at the appropriate time, then you have to pre-screen all their approaches. Like sometimes you assign them a task and they'll just go off and start doing it wrong without saying anything. Now what you have to do is when you assign a task you have to say "can you tell me how you're going to approach this?" to make sure they don't say something nutso.

12/09/2011

12-09-11 - Kittens

We want to get a kitten (since we have a stable house now), and I would like to just get a kitten from a home but WTF they don't exist any more.

When I was a kid, every couple of weeks some family in the neighborhood would have kittens and put out a sign. You could go to their house and see the kittens play and pick one. You could see if they were coming from a good home where they got socialized. You could see how old they were to know they weren't separated from their mom at too young an age.

There just isn't any of this anymore. It seems to have all been corporatized into kitten adoption centers.

Yeah yeah yeah you should adopt an a deformed adult cat that drools and has mange. No thanks. Part of the reason why I can't just find a normal home to adopt from is all the pet-adoption-nazis make it so that you can't use craigslist (or whatever) to find pets.

Also all the adoption agencies have strict spay/neuter at time of adoption rules. When I was a kid when we would get a cat sometimes we would not spay right away so that we could get a batch of kittens and keep one and give the result away. It was delightful to have a line of generations and a bunch of kittens to play with. That tradition seems to be all gone. The result is that the babies only come from strays or weirdos outside the system. It's sort of like if all law abiding citizens were castrated, then the only children in the world would be from criminals.

Boo.


Also, in other cat news, it turns out our professional cat sitter grossly overfed our cat while we were away in Hawaii. This despite verbal and written instructions on the correct amount to feed her. So she's sickly obese on our return.

How fucking hard is it to follow basic instructions? Jesus christ, I'm trying not to be a rich old crank, but I can't help thinking things like "it's so hard to find good help" and that the poor are poor because they're fucking retarded. (*). Half a cup means fucking half a cup, not "oh, I'll feed her until she stops eating like she's starving". You are the fucking help, you don't get to make your own decisions when I gave you specific orders.

What makes it even worse is that she (the cat sitter) gave us the usual condescending "I know so much about cats" bullshit when we interviewed her. Hey lady, I've watched an episode of the Dog Whisperer too, I'm not impressed by your amateur pet psychiatry.

* = you would think that it should be easy to find someone who could like get your groceries for you, or build you a fence, or pay your bills, or whatever, but it's actually really hard. It's amazing how badly the average person will fuck up the most basic assignments. To get someone that is smart enough that you can trust them to do those things, you have to hire someone in the top 1% of intellects, someone who could make $100k a year. It's actually sort of easy to hire someone really smart who costs a lot of money, and it's easy to hire someone who is just manual labor that you have to constantly supervise, but to hire someone in between that you can trust enough not to supervise but doesn't cost a fortune is hard because people are epic fuck ups.

12/08/2011

12-08-11 - Some Semaphores

In case you don't agree with Boost that Semaphore is too "error prone" , or if you don't agree with C++0x that semaphore is unnecessary because it can be implemented from condition_var (do I need to point out why that is ridiculous reasoning for a library writer?) - here are some semaphores for you.

I've posted a fastsemaphore before, but here's a more complete version that can wrap a base semaphore.


template< typename t_base_sem >
class fastsemaphore_t
{
private:
    t_base_sem m_base_sem;
    atomic<int> m_count;

public:
    fastsemaphore_t(int count = 0)
    :   m_count(count)
    {
        RL_ASSERT(count > -1);
    }

    ~fastsemaphore_t()
    {
    }

    void post()
    {
        if (m_count($).fetch_add(1,mo_acq_rel) < 0)
        {
            m_base_sem.post();
        }
    }

    void post(int count)
    {
        int prev = m_count($).fetch_add(count,mo_acq_rel);
        if ( prev < 0)
        {
            int num_waiters = -prev;
            int num_to_wake = MIN(num_waiters,count);
            // use N-wake if available in base sem :
            // m_base_sem.post(num_to_wake);
            for(int i=0;i<num_to_wake;i++)
            {
                m_base_sem.post();
            }
        }
    }
    
    bool try_wait()
    {
        // see if we can dec count before preparing the wait
        int c = m_count($).load(mo_acquire);
        while ( c > 0 )
        {
            if ( m_count($).compare_exchange_weak(c,c-1,mo_acq_rel) )
                return true;
            // c was reloaded
            // backoff here optional
        }
        return false;
    }
        
    void wait_no_spin()
    {
        if (m_count($).fetch_add(-1,mo_acq_rel) < 1)
        {
            m_base_sem.wait();
        }
    }
    
    void wait()
    {
        int spin_count = 1; // ! set this for your system
        while(spin_count--)
        {
            if ( try_wait() ) 
                return;
        }
        
        wait_no_spin();
    }
    
    
    int debug_get_count() { return m_count($).load(); }
};

when m_count is negative it's the number of waiters (plus or minus people who are about to wait, or about to be woken).

Personally I think the base semaphore that fastsem wraps should just be your OS semaphore and don't worry about it. It only gets invoked for thread wake/sleep so who cares.

But you can easily make Semaphore from CondVar and then put fastsemaphore on top of that. (note the semaphore from condvar wake N is not awesome because CV typically doesn't provide wake N, only wake 1 or wake all).

Wrapping fastsem around NT's Keyed Events is particularly trivial because of the semantics of the Keyed Event Release. NtReleaseKeyedEvent waits for someone to wake if there is noone. I've noted in the past that Win32 event is a lot like a semaphore with a max count of 1 ; a problem with building a Semaphore from normal Event would be that you Set it when it's already Set, you effectively run into the max count and lose your Set, but this is impossible with KeyedEvent. With KeyedEvent you get exactly one wake from Wait for each Release.

So, if we wrap up keyed_event for convenience :


struct keyed_event
{
    HANDLE  m_keyedEvent;

    enum { WAITKEY_SHIFT = 1 };

    keyed_event()
    {
        NtCreateKeyedEvent(&m_keyedEvent,EVENT_ALL_ACCESS,NULL,0);
    }
    ~keyed_event()
    {
        CloseHandle(m_keyedEvent);
    }

    void wait(intptr_t key)
    {
        RL_ASSERT( (key&1) == 0 );
        NtWaitForKeyedEvent(m_keyedEvent,(PVOID)(key),FALSE,NULL);
    }

    void post(intptr_t key)
    {
        RL_ASSERT( (key&1) == 0 );
        NtReleaseKeyedEvent(m_keyedEvent,(PVOID)(key),FALSE,NULL);
    }
};

Then the base sem from KE is trivial :


struct base_semaphore_from_keyed_event
{
    keyed_event ke;

    base_semaphore_from_keyed_event() { }
    ~base_semaphore_from_keyed_event() { }
    
    void post() { ke.release(this); }   
    void wait() { ke.wait(this); }
};

(note this is a silly way to use KE just for testing purposes; in practice it would be shared, not one per sem - that's sort of the whole point of KE).

(note that you don't ever use this base_sem directly, you use it with a fastsemaphore wrapper).

I also revisited the semaphore_from_waitset that I talked about a few posts ago. The best I can come up with is something like this :


class semaphore_from_waitset
{
    waitset_simple m_waitset;
    std::atomic<int> m_count;

public:
    semaphore_from_waitset(int count = 0)
    :   m_count(count), m_waitset()
    {
        RL_ASSERT(count >= 0);
    }

    ~semaphore_from_waitset()
    {
    }

public:
    void post()
    {
        m_count($).fetch_add(1,mo_acq_rel);
        m_waitset.notify_one();
    }

    bool try_wait()
    {
        // see if we can dec count before preparing the wait
        int c = m_count($).load(mo_acquire);
        while ( c > 0 )
        {
            if ( m_count($).compare_exchange_weak(c,c-1,mo_acq_rel) )
                return true;
            // c was reloaded
        }
        return false;
    }

    void wait(wait_thread_context * cntx)
    {
        for(;;)
        {
            // could spin a few times on this :
            if ( try_wait() )
                return;
    
            // no count available, get ready to wait
            waiter w(cntx);
            m_waitset.prepare_wait(&w);
            
            // double check :
            if ( try_wait() )
            {
                // (*1)
                m_waitset.retire_wait(&w);
                // pass on the notify :
                int signalled = w.flag($).load(mo_acquire);
                if ( signalled )
                    m_waitset.notify_one();
                return;
            }
            
            w.wait();
            m_waitset.retire_wait(&w);
            // loop and try again
        }
    }
    
    void wait()
    {
        wait_thread_context cntx;
        wait(&cntx);
    }
};

The funny bit is at (*1). Recall before we talked about a race that can happen if two threads post and two other threads pop. If one of the poppers gets through to *1 , it dec'ed the sem but is still in the waitset, one pusher might then signal this thread, which is a wasted signal, and the other waiter will not get a signal, and you have a "deadlock" (not a true deadlock, but an unexpected permanent sleep, which I will henceforth call a deadlock).

You can fix that by detecting if you recieved a signal while you were in the waitset. That's what's done here now. While it is not completely ideal from a performance perspective, it's a rare race case, and even when it happens the penalty is small. I still don't recommend using semaphore_from_waitset unless you have a comprehensive waitset-based system.

(note that in practice you would never make a wait_thread_context on the stack as in the example code ; if you have a waitset-based system it would be in the TLS)

Another note :

I have mentioned before the idea of "direct handoff" semaphores. That is, making it such that thread wakeup implies you get to dec count. For example "base_semaphore_from_keyed_event" above is a direct-handoff semaphore. This is as opposed to "optimistic" semaphores, in which the wakeup just means "you *might* get to dec count" and then you have to try_wait again when you wake up.

Direct handoff is neat because it gaurantees a minimum number of thread wakeups - you never wake up a thread which then fails to dec count. But they are in fact not awesome. The problem is that you essentially have some of your semaphore count tied up in limbo while the thread wakeup is happening (which is not a trivial amount of time).

The scenario is like this :


1. thread 1 does a sem.wait

2. thread 2 does a sem.post 
  the sem is "direct handoff" the count is given to thread 1
  thread 1 starts to wake up

3. thread 3 (or thread 2) now decides it can do some consuming
  and tries a sem.wait
  there is no sem count so it goes to sleep

4. thread 1 wakes up and processes its received count

You have actually increased latency to process the message posted by the sem, by the amount of time between steps 3 and 4.

Basically by not pre-deciding who will get the sem count, you leave the opportunity for someone else to get it sooner, and sooner is better.

Finally let's have a gander at the Linux sem : sem_post and sem_wait

If we strip away some of the gunk, it's just :


sem_post()
{

    atomic_add( & sem->value , 1);

    atomic_full_barrier (); // (*1)

    int w = sem->nwaiters; // (*2)

    if ( w > 0 )
    {
        futex_wake( & sem->value, 1 );  // wake 1
    }

}

sem_wait()
{
    if ( try_wait() ) return;

    atomic_add( & sem->waiters , 1);

    for(;;)
    {
        if ( try_wait() ) break;

        futex_wait( & sem->value, 0 ); // wait if sem value == 0
    }

    atomic_add( & sem->waiters , -1);
}

Some quick notes : I believe the barrier at (*1) is unnecessary ; they should be doing an acq_rel inc on sem->value instead. However, as noted in the previous post about "producer-consumer" failures, if your producer is not strongly synchronized it's possible that this barrier helps hide/prevent bugs. Also at (*2) in the code they load nwaiters with plain C which is very sloppy; you should always load lock-free shared variables with an explicit load() call that specifies memory ordering. I believe the ordering constraint there is the load of nwaiters needs to stay after the store to value; the easiest way is to make the inc on value be an RMW acq_rel.

The similarity with waitset should be obvious, but I'll make it super-clear :


sem_post()
{

    atomic_add( & sem->value , 1);
    atomic_full_barrier ();

    // waitset.notify_one :
    {
        int w = sem->nwaiters;
        if ( w > 0 )
        {
            futex_wake( & sem->value, 1 );  // wake 1
        }
    }
}

sem_wait()
{
    if ( try_wait() ) return;

    // waitset.prepare_wait :
    atomic_add( & sem->waiters , 1);

    for(;;)
    {
        // standard double-check :
        if ( try_wait() ) break;

        // waitset.wait()
        // (*3)
        futex_wait( & sem->value, 0 ); // wait if sem value == 0
    }

    // waitset.retire_wait :
    atomic_add( & sem->waiters , -1);
}

It's exactly the same, but with one key difference at *3 - the wait does not happen if count is not zero, which means we can not receive the wait wakeup from futex_wake if we don't need it. This removes the need for the re-pass that we had in the waitset semaphore.

This futex semaphore is fine, but you could reduce the number of atomic ops by storing count & waiters in one word.

12/05/2011

12-05-11 - Surprising Producer-Consumer Failures

I run into these a lot, so let's have a quick glance at why they happen.

You're trying to do something like :


Thread1 :

Produce 1
sem.post

Thread2 :

Produce 2
sem.post

Thread 3 :

sem.wait
Consume 1

Thread 4 :

sem.wait
Consume 2

and we assert that the Consume succeeds in both cases. Produce/Consume use a queue or some other kind of lock-free communication structure.

Why can this fail ?

1. A too-weak semaphore . Assuming out Produce and Consume are lock-free and not necessarily synchronized on a single variable with something strong like an acq_rel RMW op, we are relying on the semaphore to synchronize publication.

That is, in this model we assume that the semaphore has something like an "m_count" internal variable, and that both post and wait do an acq_rel RMW on that single variable. You could certainly make a correct counting semaphore which does not have this behavior - it would be correct in the sense of controlling thread flow, but it would not provide the additional behavior of providing a memory ordering sync point.

You usually have something like :


Produce :
store X = A
sem.post // sync point B

Consume:
sem.wait // sync point B
load X  // <- expect to see A

you expect the consume to get what was made in the produce, but that is only gauranteed if the sem post/wait acts as a memory sync point.

There are two reasons I say sem should act like it has an internal "m_count" which is acq_rel , not just release at post and acquire at wait as you might think. One is you want sem.wait to act like a #StoreLoad, so that the loads which occur after it in the Consume will see preceding stores in the Produce. An RMW acq_rel is one way to get a #StoreLoad. The other is that by using an RMW acq_rel on a single variable (or behaving as if you do), it creates a total order on modifications to that variable. For example if T3 seems T1.post and T2.post and then does its T3.wait , T4 cannot see T1.post T3.wait T4.wait or any funny other order.

Obviously if you're using an OS semaphore you aren't worrying about this, but there are lots of cases where you use this pattern with something "semaphore-like" , such as maybe "eventcount".

2. You're on POSIX and forget that sem.wait has spurious wakeups on POSIX. Oops.

3. Your queue can temporarily appear smaller than it really is.

Say, as a toy example, adding a node is done something like this :


new_node->next = NULL;

old_head = queue->head($).exchange( new_node );
// (*)
new_node->next = old_head;

There is a moment at (*) where you have truncated the queue down to 1 element. Until you fix the next pointer, the queue has been made to appear smaller than it should be. So pop might not get the items it expects to get.

This looks like a bad way to do a queue, but actually lots of lock free queues have this property in more or less obvious ways. Either the Push or the Pop can temporarily make the queue appear to be smaller than it really is. (for example a common pattern is to have a dummy node, and if Pop takes off the dummy node, it pushes it back on and tries again, but this causes the queue to appear one item smaller than it really is for a while).

If you loop, you should find the item that you expected in the queue. However, this is a nasty form of looping because it's not just due to contention on a variable; if in the example above the thread is swapped out while it sits at point (*), then nobody can make progress on this queue until that thread gets time.

The result I find is that ensuring that waking from sem.wait always implies there is an item ready to pop is not worth the trouble. You can do it in isolated cases but you have to be very careful. A much easier solution is to loop on the pop.

12/03/2011

12-03-11 - Worker Thread system with reverse dependencies

In the previous episode we looked at a system for doing work with dependencies.

That system is okay; I believe it works, but it has two disadvantages : 1. It requires some non-standard synchronization primitives such as OR waits, and 2. There is a way that it can fail to do work as soon as possible; that is, there is the possibility for moments when work could be done but the worker that could do it is asleep. It's one of our design goals to not let that happen so let's see why it happens :

The problem basically is the NR (not ready) queue. When we have no RTR (ready to run) work, we popped one item from the NR queue and waited on its dependencies. But there could be other items later in the NR queue which become ready sooner. If the items in the NR queue become ready to run in order, this doesn't occur, but if they can become ready in different orders, we could miss out on chances to do work.

Anyhoo, both of these problems go away and everything becomes much simpler if we reformulate our system in terms of "forward dependencies" instead of "backward dependencies".

Normal "dependencies" are backwards; that is, A depends on B and C, which were created earlier in time. The opposite direction link I will call "permits" (is there a standard term for this?). That is, B and C permit A. A needs 2 permissions before it can run.

I propose that it is conceptually easier to set up work in terms of "dependencies", so the client still formulates work items with dependencies, but when they are submitted to be run, they are converted into "permissions". That is, A --> {B,C} is changed into B --> {A} and C --> {A}.

The main difference is that there is no longer any "not ready" queue at all. NR items are not held in any global list, they are only pointed to by their dependencies. Some dependency back in the tree should be ready to run, and it will then be the root that points through various NR items via permission links.

With no further ado, let's look at the implementation.

The worker thread becomes much simpler :


worker :

wait( RTR_sem );

pop RTR_queue and do work

that's it! Massively simpler. All the work is now in the permissions maintenance, so let's look at that :

How do we maintain permissions? Each item which is NR (not ready) has a (negative) count of the # of permissions needed before it can run. Whenever an item finishes, it walks its permission list and incs the permit count on the target item. When the count reaches zero, all permissions are done and the item can now run.

A work item now has to have a list of permissions. In my old system I had just a fixed size array for dependencies; I found that [3] was always enough; it's simply the nature of work that you rarely need lots of dependencies (and in the very rare cases that you do need more than 3, you can create a dummy item which only marks itself complete when many others are done). But this is not true for permissions, there can be many on one item.

For example, a common case is you do a big IO, and then spawn lots of work on that buffer. You might have 32 work items which depend on the IO. This only needs [1] when expressed as dependencies, but [32] when expressed as permissions. So a fixed size array is out and we will use a linked list.

The maintenance looks like this :


submit item for work :

void submit( work_item * wi , work_item * deps[] , int num_deps )
{

    wi->permits = - num_deps;

    if ( num_deps == 0 )
    {
        RTR_queue.push( p );
        RTR_sem.post();
        return;
    }

    for(int i=0;i<num_deps;i++)
    {
        deps[i]->lock();

        if ( ! deps[i]->is_done )
        {
            deps[i]->permits_list.push( wi );
        }
        else
        {
            int prev = wi->permits.fetch_add(1); // needs to be atomic
            if ( prev == -1 ) // permitted (do this also if num_deps == 0)
            {
                RTR_queue.push( p );
                RTR_sem.post();
            }
        }

        deps[i]->unlock();
    }

}


when an item is completed :

void complete( work_item * wi )
{
    wi->lock();

    set wi->is_done

    swap wi->permits_list to local permits_list

    wi->unlock();

    for each p in permits_list
    {
        int prev = p->permits.fetch_add(1);

        if ( prev == -1 )
        {
            // p is now permitted

            RTR_queue.push( p );
            RTR_sem.post();
        }
    }
}

the result is that when you submit not-ready items, they go into the permits list somewhere, then as their dependencies get done their permits count inc up towards zero, when it hits zero they go into the RTR queue and get picked up by a worker.

The behavior is entirely the same as the previous system except that workers who are asleep because they have no RTR work can wake up when any NR item becomes RTR, not just when the single one they popped becomes RTR.

One annoyance with this scheme is you need to lock the item to maintain the permits_list ; that's not really a big deal (I use an indexed lock system similar to Nt Keyed Events, I don't actually put a lock object on each item), but I think it's possible to maintain that list correctly and simply lock free, so maybe we'll revisit that.

ADDENDUM : hmm , not easy to do lock free. Actually maintaining the list is not hard, and even doing it and avoiding races against the permitted count is not hard, the problem is that the list is in the work item and items can be deleted at any time, so you either need to hold a lock on the item to prevent deletion, or you need something like RCU or SMR.

12-03-11 - RAD - Hawaii Branch

It's a pretty nice place to work. The ergonomics of the picnic table are not half bad actually. Very glad I brought my keyboard; wish the laptop screen was bigger.

12/02/2011

12-02-11 - Natural Expression

It's so nice when you find the "natural" way to express a coding problem. All of a sudden everything because so much simpler and the answers just start popping out at you. Like oh, and I can do this here, and this automatically happens just the way I wanted. Tons of old code just disappears that was trying to solve the problem in the "un-natural" way.

It doesn't change the code; in the end it all becomes assembly language and it can do the same thing, but changing the way you write it can change the way you think about it. Also when you find an simple elegant way to express things, it sort of makes it feel "right", whereas if you are getting the same thing done through a series of kludges and mess, it feels horrible, even though they are accomplishing the same thing.

It reminds me of physics. I think some of the greatest discoveries the past century in physics were not actually discoveries of any phenomenom, but just ways to write the physics down. In particular I cite Dirac's Bra-Ket notation and Feynman's path integrals.

Neither one added any new physics. If you look at it in a "positivist" view point, they did nothing - the actual observable predictions were the same. The physics all existed in the equations which were already known. But they opened up a new understanding, and just made it so much more natural and easier to work with the equations, and that can actually have huge consequences.

Dirac's bra ket for example made it clear that quantum mechanics was about Hilbert spaces and Operators. Transformation between different basis spaces became a powerful tool, and very useful and elegant things like raising and lowering operators popped out. Quantum mechanics at the time was sort of controversial (morons like Einstein were still questioning it), and finding a clear elegant solid way to write it down made it seem more reasonable. (physicists have a semi-irrational distrust of any physical laws that are very complicated or vague or difficult to compute with; they also have a superstition that if a physical law can be written in a very concise way, it must be true; eg. when you write Maxwell's equations as d*F = J).

Feynman's path integrals came along just at a time when Quantum Field Theory was in crisis; there were all these infinities which make the theory impossible to calculate with. There were some successful computations, and it just seemed like the right way to extend QM to fields, so people were forging ahead, but these infinities made it an incomplete (and possibly wrong) theory. The path integral didn't solve this, but it made it much easier to see what was actually being computed in the QFT equations - rather than just a big opaque integral that becomes infinity and you don't know why, the path integral lets you separate out the terms and to pretend that they correspond to physical particles flying around in many different ways. It made it more obvious that QFT was correct, and what renormalization was doing, and the fact that renormalization was a physically okay way to fix the infinities.

(while I say this is an irrational superstition, it has been the fact that the laws of physics which are true wind up being expressable in a concise, elegant way (though that way is sometimes not found for a long time after the law's discovery); most programmers have the same supertition, when we see very complex solutions to problems we tend to turn up our noses with distate; we imagine that if we just found the right way to think about the problem, a simple solution would be clear)

(I know this history is somewhat revisionist, but a good story is more important than accuracy, in all things but science)

Anyhoo, it's nice when you get it.

11/30/2011

11-30-11 - Basic sketch of Worker Thread system with dependencies

You have a bunch of worker threads and work items. Work items can be dependent, on other work items, or on external timed events (such as IO).

I've had some trouble with this for a while; I think I finally have a scheme that really works.

There are two queues :


RTR = ready to run : no dependencies, or dependencies are done

NR = not ready ; dependencies still pending

Each queue has an associated semaphore to count the number of items in it.

The basic work popping that each worker does is something like :


// get all available work without considering sleeping -
while( try_wait( RTR_sem ) )
{
    pop RTR_queue and do work
}

// (optionally spin a few times here and check RTR_sem)

// I may have to sleep -

wait( RTR_sem OR NR_sem ); // (*1)

if ( wakeup was from RTR_sem )
{
    pop RTR_queue and do work
}
else
{
    NRI (not ready item) = pop NR_queue
    deps = get dependencies that NRI needs to wait on

    wait( deps OR RTR_sem ); // (*3)

    if ( wakeup was from RTR_sem )
    {
        push NRI back on NR_queue and post NR_sem  // (*4)
        pop RTR_queue and do work
    }
    else
    {
        wakeup was because deps are now done
        NRI should be able to run now, so do it
        (*2)
    }  
}

*1 : the key primitive here is the ability to do a WFMO OR wait, and to know which one of the items signalled you. On Windows this is very easy, it's just WaitForMultipleObjects, which returns the guy who woke you. On other platforms it's trickier and probably involves rolling some of your own mechanisms.

Note that I'm assuming the semaphore Wait() will dec the semaphore at the time you get to run, and the OR wait on multiple semaphores will only dec one of them.

*2 : in practice you may get spurious wakeups or it may be hard to wait on all the dependencies, so you would loop and recheck the deps and possibly wait on them again.

How this differs from my previous system :

My previous system was more of a traditional "work stealing" scheme where each worker had its own queue and would try to just push & pop works from its own queue. This was lower overhead in the fast path (it avoids having a single shared semaphore that they have to contend on, for example), but it had a lot of problems.

Getting workers to go to sleep & wake up correctly in a work stealing scheme is a real mess. It's very hard to tell when you have no work to do, or when you have enough work that you need to wake a new worker, because you don't atomically maintain a work count (eg. a semaphore). You could fix this by making an atomic pair { work items, workers awake } and CAS that pair to maintain it, but that's just a messy way of being a semaphore.

The other problem was what happens when you have dependent work. You want a worker to go to sleep on the dependency, so that it yeilds CPU time, but wakes up when it can run. I had that, but then you have the problem that if somebody else pushes work that can immediately run, you want to interrupt that wait on the dependency and let the worker do the ready work. The semaphore OR wait fixes this nicely.

If you're writing a fractal renderer or some such nonsense then maybe you want to make lots of little work items and have minimal overhead. But that's a very special purpose rare case. Most of the time it's much more important that you do the right work when possible. My guiding principles are :

If there is no work that can be done now, workers should go to sleep (yield CPU)
If there is work that can be done now, workers should wake up
You should not wake up a worker and have it go back to sleep immediately
You should not have work available to do but the workers sleeping

Even in the "fractal renderer" sort of case, where you have tons of non-dependent work items, the only penalty here is one extra semaphore dec per item, and that's just a CAS (or a fetch_add) assuming you use something like "fastsemaphore" to fast-path the case of being not near zero count.

There is one remaining issue, which is when there is no ready-to-run work, and the workers are asleep on the first semaphore (they have no work items). Then you push a work item with dependencies. What will happen in the code sketch above is that the worker will wake up, pop the not ready item, then go back to sleep on the dependency. This violates article 3 of the resolution ("You should not wake up a worker and have it go back to sleep immediately").

Basically from *1 to *3 in the code is a very short path that wakes from one wait and goes into another wait; that's always bad.

But this can be fixed. What you need is "wait morphing". When you push a not-ready work item and you go into the semaphore code that is incrementing the NR_sem , and you see that you will be waking a thread - before you wake it up, you take it out of the NR_sem wait list, and put it into the NRI's dependency wait list. (you leave it waiting on RTR_sem).

That is, you just leave the thread asleep, you don't signal it to wake it up, it stays waiting on the same handle, but you move the handle from NR_sem to the dependency. You can implement this a few ways. I believe it could be done with Linux'es newer versions of futex which provide wait morphing. You would have to build your semaphore and your dependency waiting on futex, which is easy to do, then wait morph to transfer the wait. Alternatively if you build them on "waitset" you simply need to move an item from one waitset to the other. This can be done easily if your waitset uses a mutex to protect its internals, you simply lock both mutexes and move the waitable handle with both held.

The net result with wait morphing is very nice. Say for example are you workers are asleep. You create a work item that is dependent on an IO and push it. None of the workers get woken up, but one of them is changed from waiting on work available to waiting on the dependency. When the IO completes it wakes that worker and he runs. If somebody pushed a ton of work in the mean time, all the workers would be woken and they would do that work, and the dependent work would be pushed back on the NR queue and set aside while they did RTR work.

ADDENDUM : at the spot marked (*4) :


push NRI back on NR_queue and post NR_sem // (*4)
pop RTR_queue and do work

In real code you need do something a bit more complex here. What you do is something like :

if ( NRI is ready ) // double check
{
  RTR_sem.post() // we woke from RTR_sem , put it back
  do NRI work
}
else
{
  push NRI onto NR_lifo and post NR_sem
  pop RTR_queue and do work
}

we've introduced a new queue , the NR_lifo which is a LIFO (eg. stack). Now whenever you get an NR_sem post, you do :

// NR_sem just returned from wait so I know an NR item is available :

NRI = NR_lifo.pop()
if ( NRI == NULL )
  NRI = NR_queue.pop()

the item must be in one or the other and we prefer to take from the LIFO first. Basically the LIFO is a holding area for items that were popped off the FIFO and were not yet ready, so we want to keep trying to run those before we go back to the FIFO. You can use a single semaphore to indicate that there is an item in either queue.

11-30-11 - Some more Waitset notes

The classic waitset pattern :

check condition

waiter w;
waitset.prepare_wait(&w);

double check condition

w.wait();

waitset.retire_wait(&w);

lends itself very easily to setting a waiter flag. All you do is change the double check into a CAS that sets that flag. For example say your condition is count > 0 , you do :

if ( (count&0x7FFFFFFF) == 0 )
{
    waiter w;
    waitset.prepare_wait(&w);

    // double check condition :
    int c = count.fetch_or( 0x80000000 ); // set waiter flag and double check
    if ( (c&0x7FFFFFFF) == 0 )
        w.wait();

    waitset.retire_wait(&w);
}

then in notify, you can avoid signalling when the waiter flag is not set :

// publish :
int c = count.atomic_inc_and_mask(1,0x7FFFFFFF);
// notify about my publication if there were waiters :
if ( c & 0x80000000 )
  waitset.notify();

(note : don't be misled by using count here; this is still not a good way to build a semaphore; I'm just using an int count as a simple way of modeling a publish/consume.


I was being obtuse before when I wrote about the problems with waitset OR. It is important to be aware of those issues when working with waitsets, because they are inherent to how waitsets work and you will encounter them in some form or other, but of course you can do an OR if you extend the basic waitset a little.

What you do is give waiter an atomic bool to know if it's been signalled, something like :


struct waiter
{
  atomic<bool> signalled;
  os_handle  waitable_handle;
}

(a "waiter" is a helper which is how you add your "self" to the waitset; depending on the waitset implementation, waitable_handle might be your thread ID for example).

Then in the waitset notify you just do :


if ( w->signalled.exchange(true) == false )
{
   Signal( w->waitable_handle );
}
else
    step to next waiter in waitset and try him again.

That is, you try to only send the signal to handles that need it.

If we use this in the simple OR example from a few days ago, then both waiting threads will wake up - two notify_ones will wake two waiters.

While you're at it, your waiter struct may as well also contain the origin of the signal, like :


if ( w->signalled.exchange(true) == false )
{
    // non-atomic assignment :
    w->signal_origin = this; // this is a waitset
    Signal( w->waitable_handle );
}

That way when you wake from an OR wait you know why.

(note that I'm assuming your os_handle only ever does one state transition - it goes from unsignalled to signalled. This is the correct way to use waitset; each waiter() gets a new waitable handle for its lifetime, and it only lives for the length of one wait. In practice you actually recycle the waiters to avoid creating new ones all the time, but you recycle them safely in a way that you know they cannot be still in use by any thread (alternatively you could just have a waiter per thread in its TLS and reset them between uses))

(BTW of course you don't actually use atomic bool in real code because bool is too badly defined)

11/28/2011

11-28-11 - Some lock-free rambling

It helps me a lot to write this stuff down, so here we go.

I continually find that #StoreLoad scenarios are confusing and catch me out. Acquire (#LoadLoad) and Release (#StoreStore) are very intuitive, but #StoreLoad is not. I think I've covered almost this exact situation again, but this stuff is difficult so it's worth revisiting many times. (I find low level threading to be cognitively a lot like quantum mechanics, in that if you do it a lot you become totally comfortable with it, but if you stop doing it even for a month it is super confusing and bizarre when you come back to it, and you have to re-work through all the basics to convince yourself they are true).

(Aside : fucking Google verbatim won't even search for "#StoreLoad" right. Anybody know a web search that is actually verbatim? A whole-word-only option would be nice too, and also a match case option. You know, like basic text search options from like 1970 or so).

The classic case for needing #StoreLoad is WFMO. The very simple scenario goes like this :


bool done1 = false;
bool done2 = false;

// I want to do X() when done1 & done2 are both set.

Thread1:

done1 = true;
if ( done1 && done2 )
    X();

Thread2:

done2 = true;
if ( done1 && done2 )
    X();

This doesn't work.

Obviously Thread1 and Thread2 can run in different orders so done1 and done2 become set in random order. But one thread or the other should see them both set. But they don't; the reason is that the memory visibility can be reordered. This is a pretty clear illustration of the thing that trips up many people - threads can interleave both in execution order and in memory visibility order.

In particular the bad execution case goes like this :


done1 = false, done2 = false

T1 sets done1 = true
  T1 sees done1 = true (of course)
  T2 still sees done1 = false (store is not yet visible to him)

T2 sets done2 = true
  T2 sees done2 = true
  T1 still sees done2 = false

T1 checks done2 for (done1 && done2)
  still sees done2 = false
  doesn't call X()

T2 checks done1
  still sees done1 = false
  doesn't call X()

later
T1 sees done2=true
T2 sees done1=true

when you write it out it's obvious that the issue is the store visibility is not forced to occur before the load. So you can fix it with :

Thread1:

done1 = true;
#StoreLoad
if ( done1 && done2 )
    X();

As noted previously there is no nice way to make a StoreLoad barrier in C++0x. The best method I've found is to make the loads into fetch_add(0,acq_rel) ; that works by making the loads also be stores and using a #StoreStore barrier to get store ordering. (UPDATE : using atomic_thread_fence(seq_cst) also works).


The classic simple waitset that we have discussed previously is a bit difficult to use in more complex ways.

Refresher : A waitset works with a double-check pattern, like :


signalling thread :

set condition
waitset.notify();

waiting thread :

if ( ! condition )
{
    waitset.prepare_wait()

    // double check :
    if ( condition )
    {
        waitset.cancel_wait();
    }
    else
    {
        waitset.wait();
    }
}

we've seen in the past how you can easily build a condition var or an eventcount from waitset. In some sense waitset is a very low level primitive and handy for building higher level primitives from. Now on to new material.

You can easily use waitset to perform an "OR" WFMO. You simply add yourself to multiple waitsets. (you need a certain type of waitset for this which lets you pass in the primitive that you want to use for waiting). To do this we slightly extend the waitset API. The cleanest way is something like this :


instead of prepare_wait :

waiter create_waiter();
void add_waiter( waiter & w );

instead of wait/cancel_wait :

~waiter() does cancel/retire wait 
waiter.wait() does wait :

Then an OR wait is something like this :

signal thread 1 :

set condition1
waitset1.notify();

signal thread 2 :

set condition2
wiatset2.notify();


waiting thread :

if ( condition1 ) // don't wait

waiter w = waitset1.create_waiter();

// double check condition1 and first check condition2 :

if ( condition1 || condition2 ) // don't wait
  // ~w will take you out of waitset1

waitset2.add_waiter(w);

// double check :

if ( condition2 ) // don't wait

// I'm now in both waitset1 and waitset2
w.wait();

Okay. This works fine. But there is a limitation which might not be entirely obvious.

I have intentionally not made it clear if the notify() in the signalling threads is a notify_one (signal) or notify_all (broadcast). Say you want it to be just notify_one , because you don't want to wake more threads than you need to. Say you have this scenario :


X = false;
Y = false;

Thread1:
X = true;
waitsetX.notify_one();

Thread2:
Y = true;
waitsetY.notify_one();

Thread3:
wait for X || Y

Thread4:
wait for X || Y

this is a deadlock. The problem is that both of the waiter threads can go to sleep, but the two notifies might both go to the same thread.

This is a general difficult problem with waitset and is why you generally have to use broadcast (for example eventcount is built on waitset broadcasting).

You may think this is an anomaly of trying to abuse waitset to do an OR, but it's quite common. For example you might try to do something seemingly simple like build semaphore from waitset.


class semaphore_from_waitset
{
    waitset_simple m_waitset;
    std::atomic<int> m_count;

public:
    semaphore_from_waitset(int count = 0)
    :   m_count(count), m_waitset()
    {
        RL_ASSERT(count >= 0);
    }

    ~semaphore_from_waitset()
    {
    }

public:
    void post()
    {
        m_count($).fetch_add(1,mo_acq_rel);
        // broadcast or signal :
        // (*1)
        //m_waitset.notify_all();
        m_waitset.notify_one();
    }

    bool try_wait()
    {
        // see if we can dec count before preparing the wait
        int c = m_count($).load(mo_acquire);
        while ( c > 0 )
        {
            if ( m_count($).compare_exchange_weak(c,c-1,mo_acq_rel) )
                return true;
            // c was reloaded
        }
        return false;
    }

    void wait(HANDLE h)
    {
        for(;;)
        {
            if ( try_wait() )
                return;
    
            // no count available, get ready to wait
            ResetEvent(h);
            m_waitset.prepare_wait(h);
            
            // double check :
            if ( try_wait() )
            {
                m_waitset.retire_wait(h);
                // (*2)
                // pass on the notify :
                m_waitset.notify_one();
                return;
            }
            
            m_waitset.wait(h);
            m_waitset.retire_wait(h);
            // loop and try again
        }
    }
};

it's totally straightforward in the waitset pattern, except for the broadcast issue. If *1 is just a notify_one, then at *2 you must pass on the notify. Alternatively if you don't have the re-signal at *2 then the notify at *1 must be a broadcast (notify_all).

Now obviously if you have 10 threads waiting on a semaphore and you inc the count by 1, you don't want all 10 threads to wake up so that just 1 of them can dec the count and get to execute. The re-signal method will wake 2 threads, so it's better than broadcast, but still not awesome.

(note that this is easy to fix if you just put a mutex around the whole thing; or you can implement semaphore without waitset; the point is not to reimplement semaphore in a bone-headed way, the point is just that even very simple uses of waitset can break if you use notify_one instead of notify_all).

BTW the failure case for semaphore_from_waitset with only a notify_one and no resignal (eg. if you get the (*1) and (*2) points wrong) goes like this :


the problem case goes like this :

    T1 : sem.post , sem.post
    T2&T3 : sem.wait

    execution like this :

    T2&3 both check count and see zereo
    T1 now does one inc and notify, noone to notify yet
    T2&3 do prepare_wait
    T2 does its double-check, sees a count and takes it (does not retire yet)
    T3 does its double-check, sees zero, and goes to sleep
    T1 now does the next inc and notify
    -> this is the key problem
    T2 can get the notify because it is still in the waiter list
        (not retired yet)
    but T3 needs the notify

The key point is this spot :

            // double check :
            if ( try_wait() )
            {
                // !! key !!
                m_waitset.retire_wait(h);

you have passed the double check and are not going to wait, but you are still in the waiter list. This means you can be the one thread chosen to receive the signal, but you don't need it. This is why resignal works.

11/25/2011

11-25-11 - Sustainability

I've always had a sour feeling about the "sustainability" movement, but I haven't been quite sure why exactly. As a knee-jerk reaction, I feel uneasy about any of the cultish movements where people get overly devoted to a narrow worldview, and tend to get into a dogma where adherence to the movement is more important than logically pursuing the original goals. So for example there are lots of current movements which I basically agree with, like "nose to tail" and "locavorism" and "minimalism" and so on, I think the basic ideas are great, but the movements themselves tend to be weird and actually kind of ruin the idea that I like so much by making it dogmatic.

(eg. if you eat pig's ears because you like pig's ears, that's cool. If you eat pig's ears because you got the whole pig and don't want to throw parts away, that's cool. If you eat pig's ears because you are trying to be a good "nose to tail"'er , that's fucking stupid.)

Anyhoo, I had a few realizations about what is that bothers me so much about "sustainability". First the obvious ones that I've known for a while :

"sustainability" is so expensive that it's only accessible to 10% of the population. When the vast majority of the population can't afford those products, they are inherently unsustainable, as in they do not support human life and they do not signficantly reduce the amount of factory farming , clear cutting , etc. A lifestyle which is only accessible to the rich cannot transform the earth.

The majority of "sustainable" products are unproven and may in fact not be sustainable, it's just a marketing word that doesn't correspond to any fact of actual low long-term impact on the earth. The fact is that the central valley of california and the fields of iowa have sustained "unsustainable" factory farming for the past 100 years or so, and despite predictions of imminent collapse, they are still feeding hundreds of millions of people for very low prices. On the other hand we get new coconut charcoal and bamboo or hemp or whatever which we don't really know how it will affect the earth in mass production on the long term.

Buying a bunch of new products because they are "sustainable" is of course highly ironic. The most destructive thing that modern society does is buy new junk every time there's a new trend, and this appears to be just another new trend, people throw out their unfashionable "unsustainable" stuff to buy new approved stuff, and will throw that out for the next trend. (also ironically, "minimalism" generally tends to involve buying new stuff).

High-paid low-yield gentleman farmers are inherently unsustainable. You cannot support a 7+ billion person planet with a good quality of life if the cost of a piece of lumber or a piece of fruit is so high and takes so much labor. Our quality of life (per capita) is entirely based on the fact that those things are so cheap and easy, so that we can spend more time producing TV shows and iPods. Now the more extreme hippie-ish end of the sustainability movement might espouse a true back-to-the-land lifestyle change where in fact people do spend more time laboring and don't get TV shows and iPods, but that is a small fringe, the main thrust wants the spoils of civilization.

Now a little equivocation. Buying "sustainable" junk is obviously a form of charity. When you spend much more on a "sustainable" version of a product, you are essentially donating the difference. Where does that donation go? Some (I suspect small) portion of it actually goes to benefiting the earth. Most of the rest goes to profit of the product maker. On that level, it is a very bad form of charity; your charity dollars would have a much greater direct benefit on the earth if you just bought normal products and donated the difference to direct action.

But it is a bit more complicated than that. For the most part I'm not delighted by exercising political expression through purchasing (it's far too easy to manipulate and take advantage of, and in the end the only thing that They care about is that you keep buying things like a good little consumer, so you really aren't winning) - however I can't deny that it does sometimes work. When industry sees that lots of consumers are willing to waste their dollars on "green" products, they do sometimes change their practices for the better, and the net result can be a greater impact than the amount of charity dollars suggest. That is, there is a sort of leverage if businesses think that the "political buyers" will continue spending lots of money far into the future, the businesses will make a change based on lots of *future* dollars. Thus something like only a few tens of millions of dollars in charity spending can actually create a hundred-million dollar product line transformation.

As an aside, I should note that there are lots of small scale "sustainable" endeavors that are basically irrelevant because they are inherently small scale and cannot ever have a significant effect on the planet. For example the reclaimed lumber movement, it's okay, I have no major objection to it (though it's not entirely clear that it's the best value for your charity lumber eco dollars), but it's just irrelevant to any large scale analysis because it can't significantly reduce commercial lumber use. The only sustainable businesses that matter are the ones that have the possibility to go large scale.

Anyhoo, the thing that occurred to me last night was that the large scale sustainable industry is basically built on the back of unsustainable industry.

What I mean is, the large-scale mass produced "sustainable" industry (eg. bamboo flooring, "sustainable" chocolate, etc) is largely about making products in the 3rd world and exporting them to the 1st world. First of all this is sort of inherently unsustainable and hypocritical because it relies on a massive income gap for affordability, essentially you have to have people in subsistence living conditions to subsidize this product, and a good liberal who is spending their charity dollars to direct the world towards a better future should not include that in their better future. But more directly, the workers in those sustainable factories could not live a decent life on their low wages without unsustainable industry. The only way they can be paid so low is because they can get cheap factory farm corn to eat, and cheap sneakers and clothes and everything they need to live. If they had to buy the expensive sustainable junk, they would have to have huge wages, which would make the product even more expensive, which would make it impossible.

11/23/2011

11-23-11 - This is not okay

Fuck this shit. I'm going to Hawaii.

11/22/2011

11-22-11 - The Mature Programmer

1. The Mature Programmer

The mature programmer manages their own time and productivity well. The MP knows that maintenance is as much work as the initial writing and code always takes longer than you think. The MP knows that any changes to code can introduce bugs, no matter how seemingly trivial. The MP knows that premature optimization is foolish and dangerous. The MP knows that sexy coding like writing big complex systems from scratch is rarely the best way to go. The MP does not get into ego competitions about who has the prettiest code. The MP acheives the best final result in the minimum amount of time.

When I started at Oddworld, I was watching lots of game companies get into dick-waving contests about who had the greatest home-rolled graphics engine, and I was also watching lots of indie developers spend massive amounts of time on their "one true" product and never actually ship it. I resolved that we would not fall into those traps - we would be humble and not reach too far, we would not let our egos stop us from licensing code or using old fashioned solutions to problems, we would stay focused on the end product - any sexy code that didn't produce a visible benefit in the actual shipping game was nixed. For the most part I think we succeeded in that (there were a few digressions that were mainly due to me).

But the way of the Mature Programmer can be a trap which comes back to bite you.

The problem is that writing code in this way is not very much fun. Sure there's the fun of making the product - and if you're working on a game and believe in the game and the team, then just seeing the good product come out can give you motivation. But if you don't have that, it can be a real slog.

Most of us got into programming not for the end products that we create, but because the programming itself is a joy. Code can be beautiful. Code can be a clever, artistic, exciting creation, like a good mathematical proof. The Mature Programmer would say that "clever code is almost always dangerous code". But fuck him. The problem is that when you get carried away with being "mature" you suck the joy right out coding.

You need to allow yourself a certain amount of indescretions to keep yourself happy with your code. Sure those templates might not actually be a good idea, but you enjoy writing the code that way - fine, do it. Yes, you are optimizing early and it just makes the code harder to maintain and harder to read and more buggy - but you love to do that, fine, do it.

Obviously you can't go overboard with this, but I think that I (and many others) have gone overboard with being mature. Basically in the last ten years of my evolution as a coder I have become less of a wild card "hot shot" and more of a productivity manager, an efficient task analyzer and proactive coordinater of code-actualizing solutions. It's like a management beaurocracy of one inside my head. It's horrible.

I think there are two factors to consider : first is that being "mature" and productive can cause burnout which winds up hurting your productivity, or it can just make coding unpleasant so you spend fewer hours at it. Most "mature" coders brag about the fact that they can get as much done in 6 hours as they used to do in 14. But those 14 hours were FUN, you coded that long because you loved it, you couldn't get to sleep at night because you wanted to code more; now the 6 hours is all sort of unpleasant because instead of rolling your own solution you're just tying together some java and perl packages. Second is that being productive is not the only goal. We are coding to get some task done and to make money, but we're also coding because we enjoy it, and actually being less productive but enjoying your coding more may be a net +EV.

2. The healthy opposition of a producer

Many programmers in normal coding jobs hate having the interference of a producer (or corporate management, or the publisher, or whatever). This person enforces stupid schedules and won't let us do the features we want, and urrgh we hate them! These coders long to be able to make their own schedules and choose their own tasks and be free.

It's actually a very healthy and much more relaxing in many ways to have that opposition. When you have to schedule yourself or make your own decisions about tasks, you become responsible for both the creative "reach for the sky" side and the responsible "stay in budget" side. It's almost impossible to do a good job of both sides. This can happen if you are an indie or even if you are a powerful lead with a weak producer.

Most creative industries know that there is a healthy opposition in having the unconscrained creative dreamer vs. the budget-enforcing producer. You don't want the dreamer to be too worried about thinking about schedules or what's possible - you just want them to make ideas and push hard to get more.

When you have to cut features or crunch or whatever, it's nice to have that come from outside - some other force makes you do it and you can hate them and get on with it. It's nice to have that external force to blame that's not on your team; it gives you a target of your frustration, helps bonding, and also gives you an excuse to get the job done (because they told you to).

When you have to balance dreams vs schedules on your own, it adds an intellectual burden to every task - as you do each task you have to consider "is this worth the time? is this the right task to do now? should I do a simpler version of this?" which greatly reduces your ability to focus just on the task itself.

3. Coding standards

It's kind of amazing to me how many experienced programmers still just don't understand programming. The big difficulty in programming is that the space of the ways to write something are too large. We can get lost in that space.

One of the problems is simply the intellectual overload. Smart coders can mistakenly believe that they can handle it, but it is a burden on everyone. Every time you write a line of code, if you have to think "should I use lower case or mixed caps?" , or "should I make this a function or just write it in line?" , your brain is spending masses of energy on syntactic decisions and doesn't have its full power for the functionality. Strict coding standards are actually an intellectual relief because they remove all those decisions and give you a specific way to do the syntax. (The same of course goes for reading other people's code - your eyes can immediately start looking at the functionality, not try to figure out the current syntax)

The other big benefit of coding standards is creating a "meta language" which is smaller than the parent language and enforces certain invariants. By doing that you again reduce the space that the brain has to consider. For example you might require that all C macros behave like functions (eg. don't eat scopes and don't declare variables). Now when I see one I know I don't have to worry about those things. Or you might require that globals are never externed and only get accessed through functions called "GetGlobal_blah". It doesn't really matter what they are as long as they are simple, clear, uniform, and strictly enforced, because only if they are totally reliable can you stop thinking about them.

4. The trap of "post-Mature Programmer" ism.

Many great coders of my generation have gone through the strict clean rules-following coder phase and have moved onto the "post" phase. The "post-mature programmer" knows the importance of following strict coding style rules or not indulging themselves too much, but also sees the benefit of bending those rules and believes that they can be a bit more free about deciding on what to do for each situation.

I believe that they/we mostly get this wrong.

The best analogy I can think of is poker. Most successful poker players go through several phases. First you think you're ever so clever and you can bluff and trap people and play all sorts of weird lines. Once you move up levels and start playing serious poker this delusion is quickly wiped out and you realize you need to go back to fundemantals. So then most people will go through the TAG "standard line" phase where they learn the right thing to do in each situation and the standard way to analyze hands, and they will be quite successful with this. (note that "standard line" doesn't mean nitty, it involves things like squeeze plays and even check-shove river bluffs, but it's based on playing a balanced range and studying EV). But then they are so successful with their solid play that they start to think they can get away with "mixing it up", playing hands that are almost certainly not profitable because they think they are good enough post-flop to make up for it (eg. Durrr style), or imagining that by playing some minus EV hands it helps their image and pays off later.

This is almost always wrong. Limping AA is almost always wrong, opening 72o UTG is almost always wrong - maybe you've done some analysis and you've decided it's the right thing at this table at this moment (for example limping AA because the people behind you attack limpers way too much and they think you would never limp AA so they will get stuck easily). It's wrong.

(telling yourself that your current bad play is made up for with later "image value" is one of the great rationalizations that poker players use an excuse to justify their bad play. programmers due to same with a set of excuses like "performance" that are really just rationalizing justifications for their bad practices; with poker, EV in the hand is worth more than EV in the bush; that is, the later image value you might win is so small and dubious and depends on various things working out just right that it's almost never correct to give up known current value for possible future value. (a particularly simple case of this is "implied odds" which bad players use an excuse to chase hands they shouldn't))

The problem is that when you open yourself up to making any possible move at any moment, there is simply too much to consider. You can't possibly go through all those decisions from first principles and make the right choice. Even if you could, there's no way you can sustain it for thousands of hands. You're going to make mistakes.

The same is true in coding; the post-MP knows the value of encapsulating a bit of functionality into a struct + helpers (or a class), but they think I'm smart enough I can decide not to do that in this particular case. No! You are wrong. I mean, maybe you are in fact right in this particular case, but it's not a good use of your brain energy to make that decision, and you will make it wrong some times.

There is a great value in having simple rules. Like "any time I enter a pot preflop, I come in for a raise". It may not always be the best thing to do, but it's not bad, and it saves you from making possibly big mistakes, and most importantly it frees up your brain for other things.

The same thing happens with life decision making. There's a standard set of cliches :

Don't marry the first person you sleep with
Don't get in a serious relationship off a rebound
Don't buy anything if the salesman is pushing it really hard
Take a day to sleep on any big decision
Don't lend money to poor friends
etc.
you may think "I'm smart, I'm mature, I don't need these rules, I can make my own decision correctly based on the specifics of the current situation". But you are wrong. Sure, following the rules you might miss out on the truly optimum decision once in a while. But it's foolish arrogance to think that your mind is so strong that you don't need the protection and simplicity that the rules provide.

In poker the correct post-solid-player adjustment is very very small. You don't go off making wild plays all the time, that's over-confidence in your abilities and just "spew". A correctly evolved player basically sticks to the solid line and the standard way of evaluating, but knows how to indentify situations where a very small correction is correct. Maybe the table is playing too tight preflop, so in the hijack position you start opening the top 35% of hands instead of the top 25% of hands. You don't just start opening every hand. You stay within the scope of the good play that you understand and can do without rethinking your whole approach.

The same is true in programming I believe; the correct adjustment for post-mature coding is very small; you don't have to be totally dogmatic about making every member variable private, but you also don't just stop encapsulating classes at all.

old rants