cbloom rants

4/30/2016

Oodle Kraken Pareto Frontier

The whole idea of Kraken is to excel in the size-decodespeed tradeoff. From the beginning of its design, every decision was considered in terms of the size vs decodespeed.

So how does it do?

To visualize the size-decodespeed Pareto frontier, I like to use an imaginary loading problem. You want to load compressed data and then decompress it. One option is the null compressor (or "memcpy") that just loads uncompressed data and then memcpys it. Or you can use compressors that give smaller file sizes, thus quicker loads, but take more time to decompress. By dialing the virtual disk speed, you can see which compressor is best at different tradeoffs of space vs. speed.

Of course you may not actually be just trying to optimize load time, but you can still use this imaginary loading problem as a way to study the space-speed tradeoff. If you care more about size, you look at lower imaginary disk speeds, if you core more about CPU use, you look at higher imaginary disk speeds.

I like to show "speedup" which is the factor of increase in speed using a certain compressor to do load+decomp vs. the baseline (memcpy). So the left hand y-intercepts (disk speed -> 0) show the compression ratio, and the right hand y-intercepts side show the decompression speed (disk speed -> inf), and in between shows the tradeoff. (By "show" I mean "is linearly proportional to, so you can actually look at the ratio of Y-intercepts in the Pareto curve and it tells you compression ratio on the left and decompression speed on the right).

03-02-15 - Oodle LZ Pareto Frontier and 05-13-15 - Skewed Pareto Chart

The chart showing "millis" shows time, so obviously lower is better. I show a few ways to combine load & decomp. Sum is the time to load then decomp sequentially. Max is the larger of the two, which is the time if they were perfectly overlapped (not usually possible). Personally I think "l2 sum" is most useful. This is the sqrt of sum of squares, which is a kind of sum that biases towards the larger; it's kind of like a mix of "sum" and "max", it means you want the sum to be small, but you also want them to similar times.

Kraken tops the Pareto curve for decodespeed vs size for a huge range of virtual disk speed; from around 2 mb/s up to around 300 mb/s.

Of course the Pareto curve doesn't tell the whole picture. For one thing I don't have encode speed in the equation here at all (and sometimes you need to look at the memory use tradeoff too, so it's really a size-decodespeed-encodespeed-memoryuse surface). For another you sometimes have strict requirements, like I must hit a certain file size, and then you just pick the fastest decoder that meets that requirement. One thing the curves do clearly tell you is when a curve just completely covers another (that is, is greater in Y for all values of X), then that compressor is just never needed (in space & decodespeed).


Silesia :


Game Test Set :


Just for comparison to earlier posts on my Blog about the other Oodle compressors, here's a run of lzt99 with the full Oodle compressor set.

I've included ZStd for reference here just to show that Kraken jumping way up is not because the previous Oodle compressors were bad - far from it, they were already Pareto optimal. (I like to compare against ZStd not because it's bad, but because it's excellent; it's the only other compressor in the world I know of that's even close to Oodle in terms of good compression ratio and high speed (ZStd also has nice super-fast encode speeds, and it's targetted slightly differently, it's better on text and Oodle is better on binary, it's not a direct apples comparison)).

You can see that LZHLW is just completely covered by Kraken, and so is now deprecated in Oodle. Even LZNIB and BitKnit are barely peeking out of the curve, so the range where they are the right answer is greatly reduced to more specialized needs. (for example BitKnit still romps on strongly structured data, and is useful if you just need a size smaller than Kraken)

4/28/2016

Performance of Oodle Kraken

A continuing series reporting on the performance of recently released Oodle Kraken

First some general notes about Oodle before the big dump of numbers.

Oodle is not intended to solve every problem in data compression. Oodle Data Compression is mainly designed for compress-once load-many usage patterns, where the compressed size and decode speed is important, but encode speed is not as important. Things like distribution, packaging, when you bake content into a compressed archive and then serve it to many people. I do consider it a requirement to keep the encoders faster than 1 MB/s on my ancient laptop, because less than that is just too slow even for content baking.

Like most data compressors, Oodle has various compression level options that trade off encoder speed for compressed size. So there are faster and slower encoders; some of the fast modes are good tradeoffs for space-speed. Kraken turns out to be pretty good at getting high compression even in the fast modes. I'll do a post that goes into this in the future.

Oodle is mainly intended for packing binary data. Because Oodle is built for games, we focus on the types of data that games ship, such as textures, models, animations, levels, and other binary structured data that often contains various types of packed numberic data. Oodle is good on text but is not really built for text. In general if you are focusing on a specific type of data (eg. text/html/xml) you will get the best results with domain-specific preprocessors, such as xwrt or wbpe. Oodle+wbpe is an excellent text compressor.

Oodle Kraken is intended for high compression use cases, where you care about size. There's a whole family of compressors now that live in the "just below lzma (7zip)" compression ratio domain, such LZHAM, Brotli, Oodle BitKnit, and ZStd. In general the goal of these is to get as close to lzma compression ratio as possible while providing much better speed. This is where Kraken achieves far more than anyone has before, far more than we thought possible.

Today I will be focusing on high compression modes, looking at decode speed vs. compression ratio. All compressors will generally be run in their max compression mode, without going into the "ridiculously slow" below 1 MB/s range. (for Oodle this means running -zl6 Optimal2 but not -zl7 or higher).

Okay, on to lots of numbers.


Rather than post an average on a test set, which can be misleading due to the selection of the test set or the way the results are averaged (total compressed size or average ratio?), I'll post results on a selection of individual files :

-------------------------------------------------------
"silesia_mozilla"

by ratio:
lzma        :  3.88:1 ,    2.0 enc mbps ,   63.7 dec mbps
lzham       :  3.56:1 ,    1.5 enc mbps ,  186.4 dec mbps
Oodle Kraken:  3.51:1 ,    1.2 enc mbps ,  928.0 dec mbps
zstdmax     :  3.24:1 ,    2.8 enc mbps ,  401.0 dec mbps
zlib9       :  2.51:1 ,   12.4 enc mbps ,  291.5 dec mbps
lz4hc       :  2.32:1 ,   36.4 enc mbps , 2351.6 dec mbps

by encode speed:
lz4hc       :  2.32:1 ,   36.4 enc mbps , 2351.6 dec mbps
zlib9       :  2.51:1 ,   12.4 enc mbps ,  291.5 dec mbps
zstdmax     :  3.24:1 ,    2.8 enc mbps ,  401.0 dec mbps
lzma        :  3.88:1 ,    2.0 enc mbps ,   63.7 dec mbps
lzham       :  3.56:1 ,    1.5 enc mbps ,  186.4 dec mbps
Oodle Kraken:  3.51:1 ,    1.2 enc mbps ,  928.0 dec mbps

by decode speed:
lz4hc       :  2.32:1 ,   36.4 enc mbps , 2351.6 dec mbps
Oodle Kraken:  3.51:1 ,    1.2 enc mbps ,  928.0 dec mbps
zstdmax     :  3.24:1 ,    2.8 enc mbps ,  401.0 dec mbps
zlib9       :  2.51:1 ,   12.4 enc mbps ,  291.5 dec mbps
lzham       :  3.56:1 ,    1.5 enc mbps ,  186.4 dec mbps
lzma        :  3.88:1 ,    2.0 enc mbps ,   63.7 dec mbps
-------------------------------------------------------
"lzt99"

by ratio:
lzma        :  2.65:1 ,    3.1 enc mbps ,   42.3 dec mbps
Oodle Kraken:  2.46:1 ,    2.3 enc mbps ,  957.1 dec mbps
lzham       :  2.44:1 ,    1.9 enc mbps ,  166.0 dec mbps
zstdmax     :  2.27:1 ,    3.8 enc mbps ,  482.3 dec mbps
zlib9       :  1.77:1 ,   13.3 enc mbps ,  286.2 dec mbps
lz4hc       :  1.67:1 ,   30.3 enc mbps , 2737.4 dec mbps

by encode speed:
lz4hc       :  1.67:1 ,   30.3 enc mbps , 2737.4 dec mbps
zlib9       :  1.77:1 ,   13.3 enc mbps ,  286.2 dec mbps
zstdmax     :  2.27:1 ,    3.8 enc mbps ,  482.3 dec mbps
lzma        :  2.65:1 ,    3.1 enc mbps ,   42.3 dec mbps
Oodle Kraken:  2.46:1 ,    2.3 enc mbps ,  957.1 dec mbps
lzham       :  2.44:1 ,    1.9 enc mbps ,  166.0 dec mbps

by decode speed:
lz4hc       :  1.67:1 ,   30.3 enc mbps , 2737.4 dec mbps
Oodle Kraken:  2.46:1 ,    2.3 enc mbps ,  957.1 dec mbps
zstdmax     :  2.27:1 ,    3.8 enc mbps ,  482.3 dec mbps
zlib9       :  1.77:1 ,   13.3 enc mbps ,  286.2 dec mbps
lzham       :  2.44:1 ,    1.9 enc mbps ,  166.0 dec mbps
lzma        :  2.65:1 ,    3.1 enc mbps ,   42.3 dec mbps
-------------------------------------------------------
"all_dds"

by ratio:
lzma        :  2.37:1 ,    2.1 enc mbps ,   40.8 dec mbps
Oodle Kraken:  2.18:1 ,    1.0 enc mbps ,  684.6 dec mbps
lzham       :  2.17:1 ,    1.3 enc mbps ,  127.7 dec mbps
zstdmax     :  2.02:1 ,    3.3 enc mbps ,  289.4 dec mbps
zlib9       :  1.83:1 ,   13.3 enc mbps ,  242.9 dec mbps
lz4hc       :  1.67:1 ,   20.4 enc mbps , 2226.9 dec mbps

by encode speed:
lz4hc       :  1.67:1 ,   20.4 enc mbps , 2226.9 dec mbps
zlib9       :  1.83:1 ,   13.3 enc mbps ,  242.9 dec mbps
zstdmax     :  2.02:1 ,    3.3 enc mbps ,  289.4 dec mbps
lzma        :  2.37:1 ,    2.1 enc mbps ,   40.8 dec mbps
lzham       :  2.17:1 ,    1.3 enc mbps ,  127.7 dec mbps
Oodle Kraken:  2.18:1 ,    1.0 enc mbps ,  684.6 dec mbps

by decode speed:
lz4hc       :  1.67:1 ,   20.4 enc mbps , 2226.9 dec mbps
Oodle Kraken:  2.18:1 ,    1.0 enc mbps ,  684.6 dec mbps
zstdmax     :  2.02:1 ,    3.3 enc mbps ,  289.4 dec mbps
zlib9       :  1.83:1 ,   13.3 enc mbps ,  242.9 dec mbps
lzham       :  2.17:1 ,    1.3 enc mbps ,  127.7 dec mbps
lzma        :  2.37:1 ,    2.1 enc mbps ,   40.8 dec mbps
-------------------------------------------------------
"baby_robot_shell.gr2"

by ratio:
lzma        :  4.35:1 ,    3.1 enc mbps ,   59.3 dec mbps
Oodle Kraken:  3.77:1 ,    1.5 enc mbps ,  878.3 dec mbps
lzham       :  3.77:1 ,    1.6 enc mbps ,  162.5 dec mbps
zstdmax     :  2.77:1 ,    5.7 enc mbps ,  405.7 dec mbps
zlib9       :  2.19:1 ,   13.9 enc mbps ,  332.9 dec mbps
lz4hc       :  1.78:1 ,   40.1 enc mbps , 2364.4 dec mbps

by encode speed:
lz4hc       :  1.78:1 ,   40.1 enc mbps , 2364.4 dec mbps
zlib9       :  2.19:1 ,   13.9 enc mbps ,  332.9 dec mbps
zstdmax     :  2.77:1 ,    5.7 enc mbps ,  405.7 dec mbps
lzma        :  4.35:1 ,    3.1 enc mbps ,   59.3 dec mbps
lzham       :  3.77:1 ,    1.6 enc mbps ,  162.5 dec mbps
Oodle Kraken:  3.77:1 ,    1.5 enc mbps ,  878.3 dec mbps

by decode speed:
lz4hc       :  1.78:1 ,   40.1 enc mbps , 2364.4 dec mbps
Oodle Kraken:  3.77:1 ,    1.5 enc mbps ,  878.3 dec mbps
zstdmax     :  2.77:1 ,    5.7 enc mbps ,  405.7 dec mbps
zlib9       :  2.19:1 ,   13.9 enc mbps ,  332.9 dec mbps
lzham       :  3.77:1 ,    1.6 enc mbps ,  162.5 dec mbps
lzma        :  4.35:1 ,    3.1 enc mbps ,   59.3 dec mbps
-------------------------------------------------------
"win81"

by ratio:
lzma        :  2.95:1 ,    2.5 enc mbps ,   51.9 dec mbps
lzham       :  2.77:1 ,    1.6 enc mbps ,  177.6 dec mbps
Oodle Kraken:  2.70:1 ,    1.0 enc mbps ,  877.0 dec mbps
zstdmax     :  2.64:1 ,    3.5 enc mbps ,  417.8 dec mbps
zlib9       :  2.07:1 ,   16.8 enc mbps ,  269.6 dec mbps
lz4hc       :  1.91:1 ,   28.8 enc mbps , 2297.6 dec mbps

by encode speed:
lz4hc       :  1.91:1 ,   28.8 enc mbps , 2297.6 dec mbps
zlib9       :  2.07:1 ,   16.8 enc mbps ,  269.6 dec mbps
zstdmax     :  2.64:1 ,    3.5 enc mbps ,  417.8 dec mbps
lzma        :  2.95:1 ,    2.5 enc mbps ,   51.9 dec mbps
lzham       :  2.77:1 ,    1.6 enc mbps ,  177.6 dec mbps
Oodle Kraken:  2.70:1 ,    1.0 enc mbps ,  877.0 dec mbps

by decode speed:
lz4hc       :  1.91:1 ,   28.8 enc mbps , 2297.6 dec mbps
Oodle Kraken:  2.70:1 ,    1.0 enc mbps ,  877.0 dec mbps
zstdmax     :  2.64:1 ,    3.5 enc mbps ,  417.8 dec mbps
zlib9       :  2.07:1 ,   16.8 enc mbps ,  269.6 dec mbps
lzham       :  2.77:1 ,    1.6 enc mbps ,  177.6 dec mbps
lzma        :  2.95:1 ,    2.5 enc mbps ,   51.9 dec mbps
-------------------------------------------------------
"enwik7"

by ratio:
lzma        :  3.64:1 ,    1.8 enc mbps ,   79.5 dec mbps
lzham       :  3.60:1 ,    1.4 enc mbps ,  196.5 dec mbps
zstdmax     :  3.56:1 ,    2.2 enc mbps ,  394.6 dec mbps
Oodle Kraken:  3.49:1 ,    1.5 enc mbps ,  789.7 dec mbps
zlib9       :  2.38:1 ,   22.2 enc mbps ,  234.3 dec mbps
lz4hc       :  2.35:1 ,   27.5 enc mbps , 2059.6 dec mbps

by encode speed:
lz4hc       :  2.35:1 ,   27.5 enc mbps , 2059.6 dec mbps
zlib9       :  2.38:1 ,   22.2 enc mbps ,  234.3 dec mbps
zstdmax     :  3.56:1 ,    2.2 enc mbps ,  394.6 dec mbps
lzma        :  3.64:1 ,    1.8 enc mbps ,   79.5 dec mbps
Oodle Kraken:  3.49:1 ,    1.5 enc mbps ,  789.7 dec mbps
lzham       :  3.60:1 ,    1.4 enc mbps ,  196.5 dec mbps

by decode speed:
lz4hc       :  2.35:1 ,   27.5 enc mbps , 2059.6 dec mbps
Oodle Kraken:  3.49:1 ,    1.5 enc mbps ,  789.7 dec mbps
zstdmax     :  3.56:1 ,    2.2 enc mbps ,  394.6 dec mbps
zlib9       :  2.38:1 ,   22.2 enc mbps ,  234.3 dec mbps
lzham       :  3.60:1 ,    1.4 enc mbps ,  196.5 dec mbps
lzma        :  3.64:1 ,    1.8 enc mbps ,   79.5 dec mbps
-------------------------------------------------------

In chart form :

(lz4 decode speed is off the top of the chart)

I'm not including the other Oodle compressors here just to keep things as simple as possible. If you do want more compression than Kraken, and care about decode speed, then Oodle LZNA or BitKnit are much faster to decode than lzma (7zip) at comparable or better compression ratios.

Visit radgametools.com to learn more about Kraken and Oodle.

4/27/2016

Release the Kraken!

Announcing Oodle Kraken! - a new compression algorithm in the Oodle Data Compression library.

Oodle Kraken offers high compression with incredible decode speed, the likes of which has never been seen before.

Oodle Kraken decodes 3X faster than ZLib, and gets way more compression. There's really nothing even close to Kraken, its performance is revolutionary - it's never been possible to get such high compression and blazing fast decode speed before.

On the "Silesia" test set, the performance is :

Kraken :  4.05 to 1 compression ratio,   919.8 decode mb/s
zlib9  :  2.74 to 1 compression ratio,   306.9 decode mb/s
lzma   :  4.37 to 1 compression ratio,    78.8 decode mb/s

(speed measured on a Core i7-3770 3.4 Ghz , x64, single threaded)

Oodle Kraken's high compression makes it great for downloadables and distribution. Kraken's fast decode speed makes it amazing for in-game loading, paging and streaming, with less CPU load taking time away from your game. There's no need to decompress Kraken data when you install - just load compressed data directly. Kraken is so fast that loading compressed data and then decompressing is faster than just loading uncompressed data off disk!

Visit radgametools.com to learn more about Kraken and Oodle.

I'll be writing more about Oodle Kraken in the coming weeks, such as performance analysis on other data sets and vs. other compressors.

The overall performance on Silesia :

And a chart of the performance on each file in Silesia. This is a loglog scatter plot, with log2 of decode speed on the x axis and log2 of compression ratio on the y axis. (so for example from LZMA to Kraken on "sao" is 4.68 steps on the x axis, that means 25.76X faster)

Oodle Kraken is superb on the AMD Jaguar chip that's in the PS4 and XBox One. Most decompressors really struggle to run fast on those platforms, but not Kraken.

It's hard to write about Oodle Kraken without sounding ridiculous. It's really a huge step in data compression, where huge steps almost never happen.

6/09/2015

Goodbye and Hello

Well, fuck me. Google has broken the older GData API for posting to blogger.

I understand progress has to happen and so on, and older APIs have to get retired sometimes. Well, no not really; that's not actually what happens in the modern world. I'm sick of the slap-dash upgrading and deprecating that has nothing to do with necessity and is just random chaos. You all can fuck around with it if you want. Not me. I love algorithms. I love programming when the problems are inherent, mathematical problems. Not problems like this fucking API doesn't do what it says it does, or this shit does different things in different versions so I have to detect that and hack it and, oh crap my platform sdk updated and nothing works any more and fuck me.

I think this blog (on Blogger) is probably dead. I can't be bothered to fix my poster (working in C# is a nightmare). Goodbye.

You can read the raw text blog at http://www.cbloom.com/rants.html

also Hello!

For a little while I've been writing a new blog.

It was inspired by el trastero | de IƱigo Quilez which is a fabulous blog. el trastero has some little technical thoughts some times, but also personal stuff, and lots of humanity. I love it. It's how my blog started, and somewhere along the way I lost the point. So I started writing a new blog, inspired by my old blog.

It's called "rambles", and it's here : http://www.cbloom.com/rambles.html

It's personal and inappropriate and you probably shouldn't read it.

Goodbye and hello.

5/25/2015

05-25-15 - The Anti-Patent Patent Pool

The idea of the Anti-Patent Patent Pool is to destroy the system using the system.

The Anti-Patent Patent Pool is an independent patent licensing organization. (Hence APPP)

One option would be to just allow anyone to use those patents free of charge.

A more aggressive option would be a viral licensing model. (like the GPL, which has completely failed, so hey, maybe not). The idea of the viral licensing model is like this :

Anyone who owns no patents may use any patent in the APPP for free (if you currently own patents, you may donate them to the APPP).

If you wish to own patents, then you must pay a fee to license from the APPP. That fee is used to fund the APPP's activities, the most expensive being legal defense of its own patents, and legal attacks on other patents that it deems to be illegal or too broad.

(* = we'd have to be aggressive about going after companies that make a subsidiary to use APPP patents while still owning patents in the parent corporation)

The tipping point for the APPP would be to get a few patents that are important enough that major players need to either join the APPP (donate all their patents) or pay a large license.

The APPP provides a way for people who want their work to be free to ensure that it is free. In the current system this is hard to do without owning a patent, and owning a patent and enforcing it is hard to do without money.

The APPP pro-actively watches all patent submissions and objects to ones that cover prior art, are obvious and trivial, or excessively broad. It greatly reduces the issuance of junk patents, and fights ones that are mistakenly issued. (the APPP maintains a public list of patents that it believes to be junk, which it will help you fight if you choose to use the covered algorithms). (Obviously some of these activities have to be phased in over time as the APPP gets more money).

The APPP provides a way for small companies and individuals that cannot afford the lawyers to defend their work to be protected. When some evil behemoth tries to stop you from using algorithms that you believe you have a legal right to, rather than fight it yourself, you simply donate your work to the APPP and they fight for you.

Anyone who simply wants to ensure that they can use their own inventions could use the APPP.

Once the APPP has enough money, we would employ a staff of patent writers. They would take idea donations from the groundswell of developers, open-source coders, hobbyists. Describe your idea, the patent writer would make it all formal and go through the whole process. This would let us tap into where the ideas are really happening, all the millions of coders that don't have the time or money to pursue getting patents on their own.

In the current system, if you just want to keep your idea free, you have to constantly keep an eye on all patent submissions to make sure noone is slipping in and patenting it. It's ridiculous. Really the only safe thing to do is to go ahead and patent it yourself and then donate it to the APPP. (the problem is if you let them get the patent, even if it's bogus it may be expensive to fight, and what's worse is it creates a situation where your idea has a nasty asterisk on it - oh, there's this patent that covers this idea, but we believe that patent to be invalid so we claim this idea is still public domain. That's a nasty situation that will scare off lots of users.)

Some previous posts :

cbloom rants 02-10-09 - How to fight patents
cbloom rants 12-07-10 - Patents
cbloom rants 04-27-11 - Things we need
cbloom rants 05-19-11 - Nathan Myhrvold


Some notes :

1. I am not interested in debating whether patents are good or not. I am interested in providing a mechanism for those of us who hate patents to pursue our software and algorithm development in a reasonable way.

2. If you are thinking about the patent or not argument, I encourage you to think not of some ideal theoretical argument, but rather the realities of the situation. I see this on both sides of the fence; those who are pro-patent because it "protects inventors" but choose to ignore the reality of the ridiculous patent system, and those on the anti-patent side who believe patents are evil and they won't touch them, even though that may be the best way to keep free ideas free.

3. I believe part of the problem with the anti-patent movement is that we are all too fixated on details of our idealism. Everybody has slightly different ideas of how it should be, so the movement fractures and can't agree on a unified thrust. We need to compromise. We need to coordinate. We need to just settle on something that is a reasonable solution; perhaps not the ideal that you would want, but some change is better than no change. (of course the other part of the problem is we are mostly selfish and lazy)

4. Basically I think that something like the "defensive patent license" is a good idea as a way to make sure your own inventions stay free. It's the safest way (as opposed to not patenting), and in the long run it's the least work and maintenance. Instead of constantly fighting and keeping aware of attempts to patent your idea, you just patent it yourself, do the work up front and then know it's safe long term. But it doesn't go far enough. Once you have that patent you can use it as a wedge to open up more ideas that should be free. That patent is leverage, against all the other evil. That's where the APPP comes in. Just making your one idea free is not enough, because on the other side there is massive machinery that's constantly trying to patent every trivial idea they can think of.

5. What we need is for the APPP to get enough money so that it can be stuffing a deluge of trivial patents down the patent office's throat, to head off all the crap coming from "Intellectual Ventures" and its many brothers. We need to be getting at least as many patents as them and making them all free under the APPP.


Some links :

en.swpat.org - The Software Patents Wiki
Patent Absurdity � How software patents broke the system
Home defensivepatentlicense
FOSS Patents U.S. patent reform movement lacks strategic leadership, fails to leverage the Internet
PUBPAT Home

5/21/2015

05-21-15 - Umm

I sent a lawyer an email yesterday.

Today they sent me back an email saying :

"I need your email address so I can send you the documents you need to sign"

Umm... you are not inspiring great confidence in your abilities.

Also, pursuant to my last post about spam - pretty much all my correspondence with lawyers over the past few months, Google decides to put in the spam folder. I keep thinking "WTF why didn't this lawyer get back to me - oh crap, go check the spam". Now, I'm totally down with the comic social commentary that Google is making ("ha ha, all email from lawyers is spam, amirite? lol"). But WTF your algorithms are insanely broken. I mean, fucking seriously you suck so bad.

05-21-15 - Software Patents are Fucking Awesome

Awesome. Spotted on encode.ru. It was inevitable I suppose :

"System and method for compressing data using asymmetric numeral systems with probability distributions"

By these tards :

Storleap

Someone in the UK go over and punch them in the balls.

For those not aware of the background, ANS is probably the biggest invention in data compression in the last 20 years. Its inventor (Jarek Duda) has explicitly tried to publish it openly and make it patent-free, because he's awesome.

In the next 10 years I'm sure we will get patents for "using ANS with string-matching data compression", "using ANS with block mocomp data compression", "using ANS as a replacement for Huffman coding", "deferred summation with ANS", etc. etc. Lots of brilliant inventions like that. Really stimulating for innovation.

(as has happened over and over in data compression, and software in general in the past; hey let's take two obvious previously existing things; LZ string matching + Huffman = patent. LZ + hash table = patent. JPEG + arithmetic = patent. Mocomp + Huffman = patent. etc. etc.)

(often glossed over in the famous Stac-Microsoft suit story is the question of WHAT THE FUCK the LZS patent was supposed to be for? What was the invention there exactly? Doing LZ with a certain fixed bit encoding? Umm, yeah, like everyone does?)

Our patent system is working great. It obviously protects and motivates the real inventors, and doesn't just act as a way for the richest companies to lock in semi-monopolies of technologies they didn't even invent. Nope.

Recently at RAD we've made a few innovations related to ANS that are mostly in the vein of small improvements or clever usages, things that I wouldn't even imagine to patent, but of course that's wrong.

I've also noticed in general a lot of these vaporware companies in the UK. We saw one at RAD a few years ago that claimed to use "multi-dimensional curve interpolation for data compression" or some crackpot nonsense. There was another one that used alternate numeral systems (not ANS, but p-adic or some such) for god knows what. A few years ago there were lots of fractal-image-compression and other fractal-nonsense startups that did ... nothing. (this was before the VC "pivot" ; hey we have a bunch of fractal image patents, let's make a text messaging app)

They generally get some PhD's from Cambridge or whatever to be founders. They bring a bunch of "industry luminaries" on the board. They patent a bunch of nonsense. And then ...

... profit? There's a step missing where they actually ever make anything that works. But I guess sometimes they get bought for their vapor, or they manage to get a bullshit patent that's overly-general on something they didn't actually invent, and then they're golden.

I wonder if these places are getting college-backed "incubation" incentives? Pretty fucking gross up and down and all around. Everyone involved is scum.

(In general, universities getting patents and incubating startups is fucking disgusting. You take public funding and student's tuition, and you use that to lock up ideas for private profit. Fucking rotten, you scum.)


On a more practical note, if anyone knows the process for objecting to a patent in the UK, chime in.

Also, shame on us all for not doing more to fight the system. All our work should be going in the Anti-Patent Patent Pool.


Under the current first-to-file systems, apparently we are supposed to sit around all day reading every patent that's been filed to see if it covers something that we have already invented or is "well known" / public domain / prior art.

It's really a system that's designed around patents. It assumes that all inventions are patented. It doesn't really work well with a prior invention that's just not patented.

Which makes something like the APPP even more important. We need a way to patent all the free ideas just as a way to keep them legally free and not have to worry about all the fuckers who will rush in and try to patent our inventions as soon as we stop looking.

05-21-15 - LZ-Sub

LZ-Sub decoder :

delta_literal = get_sub_literal();

if ( delta_literal != 0 )
{
    *ptr++ = delta_literal + ptr[-lastOffset];
}
else // delta_literal == 0
{
    if ( ! get_offset_flag() )
    {
        *ptr++ = ptr[-lastOffset];
    }
    else if ( get_lastoffset_flag() )
    {
        int lo_index = get_lo_index();
        lastOffset = last_offsets[lo_index];
        // do MTF or whatever using lo_index
        
        *ptr++ = ptr[-lastOffset];
        // extra 0 delta literal implied :
        *ptr++ = ptr[-lastOffset];
    }
    else
    {
        lastOffset = get_offset();
        // put offset in last_offsets set
        
        *ptr++ = ptr[-lastOffset];
        *ptr++ = ptr[-lastOffset];
        // some automatic zero deltas follow for larger offsets
        if ( lastOffset > 128 )
        {
            *ptr++ = ptr[-lastOffset];
            if ( lastOffset > 16384 )
            {
                *ptr++ = ptr[-lastOffset];
            }
        }   
    }

    // each single zero is followed by a zero runlen
    //  (this is just a speed optimization)
    int zrl = get_zero_runlen();
    while(zrl--)
        *ptr++ = ptr[-lastOffset];
}

This is basically LZMA. (sub literals instead of bitwise-LAM, but structurally the same) (also I've reversed the implied structure here; zero delta -> offset flag here, whereas in normal LZ you do offset flag -> zero delta)

This is what a modern LZ is. You're sending deltas from the prediction. The prediction is the source of the match. In the "match" range, the delta is zero.

The thing about modern LZ's (LZMA, etc.) is that the literals-after-match (LAMs) are very important too. These are the deltas after the zero run range. You can't really think of the match as just applying to the zero-run range. It applies until you send the next offset.

You can also of course do a simpler & more general variant :

Generalized-LZ-Sub decoder :


if ( get_offset_flag() )
{
    // also lastoffset LRU and so on not shown here
    lastOffset = get_offset();
}

delta_literal = get_sub_literal();

*ptr++ = delta_literal + ptr[-lastOffset];

Generalized-LZ-Sub just sends deltas from prediction. Matches are a bunch of zeros. I've removed the acceleration of sending zero's as a runlen for simplicity, but you could still do that.

The main difference is that you can send offsets anywhere, not just at certain spots where there are a bunch of zero deltas generated (aka "min match lengths").

This could be useful. For example when coding images/video/sound , there is often not an exact match that gives you a bunch of exact zero deltas, but there might be a very good match that gives you a bunch of small deltas. It would be worth sending that offset to get the small deltas, but normal LZ can't do it.

Generalized-LZ-Sub could also give you literal-before-match. That is, instead of sending the offset at the run of zero deltas, you could send it slightly *before* that, where the deltas are not zero but are small.

(when compressing text, "sub" should be replaced with some kind of smart lexicographical distance; for each character precompute a list of its most likely substitution character in order of probability.)

LZ is a bit like a BWT, but instead of the contexts being inferred by the prefix sort, you transmit them explicitly by sending offsets to prior strings. Weird.

5/17/2015

05-17-15 - The Google Spam Filter is Intentionally Bad

I'm convinced at this point that Google intentionally filters spam wrong.

Not in a nefarious way, like haha we're going to send your good mails to "spam" and let the crap through! Take that!

But actually in a sort of more deeply evil way. A capitalist way. They specifically *want* to allow through mass-mailings from corporations that are they do not consider spam.

In my opinion, those are all spam. There is not a single corporate mass-mailing that I ever intentionally subscribed to.

Basically there's a very very easy spam filtering problem :

Easy 1. Reject all mass-mailings. Reject all mailings about sales, products, offers. Reject all mailings about porn or penises or nigerian princes.

Easy 2. Allow through all mail that's hand-written by a human to me. Particularly to one that I have written to in the past.

That would be fine with me. That would get 99.99% of it right for me.

They don't want to solve that problem. Instead they try to solve the much-harder problem of allowing through viagra offers that are for some reason not spam. For the email user who *wants* to get mass-mail offers of 50% off your next order.

I just don't understand how "yeah, let's go out to dinner" from my friend, who is responding with quote to a fucking mail that I sent goes in the in the Spam box, but "get direct email mass-marketing secrets to double your business!" goes in my inbox. How can it be so bad, I just really don't understand it. Fucking the most basic keyword include/exclude type of filter could do better.

I should have just written my own, because it's the kind of problem that you want to be constantly tweaking on. Every time a mail is misclassified, I want to run it through my system and see why that happened and then try to fix it.

It would be SOOO fucking easy for them. Being in a position as a central mail processor, they can tell which mails are unique and which are mass-sent, and just FUCKING BLOCK ALL THE MASS-SENT MAIL. God dammit. You are fucking me up and I know you're doing it intentionally. I hate you.


I mean, fuck. It's ridiculous.

They are responding to a mail I sent. The mail I sent is fucking quoted right there. I sent the fucking mail from gmail so you can confirm it's for real. I sent to their address with gmail. AND YOU PUT THEIR REPLY IN SPAM. WTF WTF WTF

But this is not spam :

Report: creative teamwork is easier with cloud-based apps

Businesses Increase Screening of Facebook, Twitter Before Hiring

Trying to solve the Prospecting Paradox?

I'd like to add you to my professional network on LinkedIn


Maybe I'm being a bit overly simplistic and harsh. Maybe there are mass-mailings that look spammish, but you actually want to get? Like, your credit card bill is due?

I'm not sure. I'm not sure that I ever need to get any of that. I don't need those "shipping confirmation" emails from Amazon. If they just all got filed to the "mass mail" folder, I could go look for them when I need them.


I want to make my own private internet. And then not allow anyone else to use it because you'd all just fuck it up.

5/16/2015

05-16-15 - Threading Primitive - monitored semaphore

A monitored semaphore allows two-sided waiting :

The consumer side decs the semaphore, and waits on the count being positive.

The producer side incs the semaphore, and can wait on the count being a certain negative value (some number of waiting consumers).

Monitored semaphore solves a specific common problem :

In a worker thread system, you may need to wait on all work being done. This is hard to do in a race-free way using normal primitives. Typical ad-hoc solutions may miss work that is pushed during the wait-for-all-done phase. This is hard to enforce, ugly, and makes bugs. (it's particularly bad when work items may spawn new work items).

I've heard of many ad-hoc hacky ways of dealing with this. There's no need to muck around with that, because there's a simple and efficient way to just get it right.

The monitored semaphore also provides a race-free way to snapshot the state of the work system - how many work items are available, how many workers are sleeping. This allows you to wait on the joint condition - all workers are sleeping AND there is no work available. Any check of those two using separate primitives is likely a race.

The implementation is similar to the fastsemaphore I posted before.

"fastsemaphore" wraps some kind of underlying semaphore which actually provides the OS waits. The underlying semaphore is only used when the count goes negative. When count is positive, pops are done with simple atomic ops to avoid OS calls. eg. we only do an OS call when there's a possibility it will put our thread to sleep or wake a thread.

"fastsemaphore_monitored" uses the same kind atomic variable wrapping an underlying semaphore, but adds an eventcount for the waiter side to be triggered when enough workers are waiting. (see who ordered event count? )

Usage is like this :


To push a work item :

push item on your queue (MPMC FIFO or whatever)
fastsemaphore_monitored.post();

To pop a work item :

fastsemaphore_monitored.wait();
pop item from queue

To flush all work :

fastsemaphore_monitored.wait_for_waiters(num_worker_threads);

NOTE : in my implementation, post & wait can be called from any thread, but wait_for_waiters must be called from only one thread. This assumes you either have a "main thread" that does that wait, or that you wrap that call with a mutex.

template <typename t_base_sem>
class fastsemaphore_monitored
{
    atomic<S32> m_state;
    eventcount m_waiters_ec;
    t_base_sem m_sem;

    enum { FSM_COUNT_SHIFT = 8 };
    enum { FSM_COUNT_MASK = 0xFFFFFF00UL };
    enum { FSM_COUNT_MAX = ((U32)FSM_COUNT_MASK>>FSM_COUNT_SHIFT) };
    enum { FSM_WAIT_FOR_SHIFT = 0 };
    enum { FSM_WAIT_FOR_MASK = 0xFF };
    enum { FSM_WAIT_FOR_MAX = (FSM_WAIT_FOR_MASK>>FSM_WAIT_FOR_SHIFT) };

public:
    fastsemaphore_monitored(S32 count = 0)
    :   m_state(count<<FSM_COUNT_SHIFT)
    {
        RL_ASSERT(count >= 0);
    }

    ~fastsemaphore_monitored()
    {
    }

public:

    inline S32 state_fetch_add_count(S32 inc)
    {
        S32 prev = m_state($).fetch_add(inc<<FSM_COUNT_SHIFT,mo_acq_rel);
        S32 count = ( prev >> FSM_COUNT_SHIFT );
        RR_ASSERT( count < 0 || ( (U32)count < (FSM_COUNT_MAX-2) ) );
        return count;
    }

    // warning : wait_for_waiters can only be called from one thread!
    void wait_for_waiters(S32 wait_for_count)
    {
        RL_ASSERT( wait_for_count > 0 && wait_for_count < FSM_WAIT_FOR_MAX );
        
        S32 state = m_state($).load(mo_acquire);
        
        for(;;)
        {
            S32 cur_count = state >> FSM_COUNT_SHIFT;

            if ( (-cur_count) == wait_for_count )
                break; // got it
        
            S32 new_state = (cur_count<<FSM_COUNT_SHIFT) | (wait_for_count << FSM_WAIT_FOR_SHIFT);
            
            S32 ec = m_waiters_ec.prepare_wait();
            
            // double check and signal what we're waiting for :
            if ( ! m_state.compare_exchange_strong(state,new_state,mo_acq_rel) )
                continue; // retry ; state was reloaded
            
            m_waiters_ec.wait(ec);
            
            state = m_state($).load(mo_acquire);
        }
        
        // now turn off the mask :
        
        for(;;)
        {
            S32 new_state = state & FSM_COUNT_MASK;
            if ( state == new_state ) return;
        
            if ( m_state.compare_exchange_strong(state,new_state,mo_acq_rel) )
                return; 
                
            // retry ; state was reloaded
        }
    }

    void post()
    {
        if ( state_fetch_add_count(1) < 0 )
        {
            m_sem.post();
        }
    }

    void wait_no_spin()
    {
        S32 prev_state = m_state($).fetch_add((-1)<<FSM_COUNT_SHIFT,mo_acq_rel);
        S32 prev_count = prev_state>>FSM_COUNT_SHIFT;
        if ( prev_count <= 0 )
        {
            S32 waiters = (-prev_count) + 1;
            RR_ASSERT( waiters >= 1 );
            S32 wait_for = prev_state & FSM_WAIT_FOR_MASK;
            if ( waiters == wait_for )
            {
                RR_ASSERT( wait_for >= 1 );
                m_waiters_ec.notify_all();
            }
            
            m_sem.wait();
        }
    }
    
    void post(S32 n)
    {
        RR_ASSERT( n > 0 );
        for(S32 i=0;i<n;i++)
            post();
    }
       
    bool try_wait()
    {
        // see if we can dec count before preparing the wait
        S32 state = m_state($).load(mo_acquire);
        for(;;)
        {
            if ( state < (1<<FSM_COUNT_SHIFT) ) return false;
            // dec count and leave the rest the same :
            //S32 new_state = ((c-1)<<FSM_COUNT_SHIFT) | (state & FSM_WAIT_FOR_MASK);
            S32 new_state = state - (1<<FSM_COUNT_SHIFT);
            RR_ASSERT( (new_state>>FSM_COUNT_SHIFT) >= 0 );
            if ( m_state($).compare_exchange_strong(state,new_state,mo_acq_rel) )
                return true;
            // state was reloaded
            // loop
            // backoff here optional
        }
    }
     
       
    S32 try_wait_all()
    {
        // see if we can dec count before preparing the wait
        S32 state = m_state($).load(mo_acquire);
        for(;;)
        {
            S32 count = state >> FSM_COUNT_SHIFT;
            if ( count <= 0 ) return 0;
            // swap count to zero and leave the rest the same :
            S32 new_state = state & FSM_WAIT_FOR_MASK;
            if ( m_state($).compare_exchange_strong(state,new_state,mo_acq_rel) )
                return count;
            // state was reloaded
            // loop
            // backoff here optional
        }
    }
           
    void wait()
    {
        int spin_count = rrGetSpinCount();
        while(spin_count--)
        {
            if ( try_wait() ) 
                return;
        }
        
        wait_no_spin();
    }

};

old rants