William Edwards, Coder

Aug 21

The average of all photos is … Emergent Orange!

A long time ago, when the internet was young, Jim Bumgardner (a.k.a. KrazyDad) discovered that, if you average the pixels in enough images you find on the internet, the resultant picture is always much the same colour… and that colour is Orange!  Emergent orange, as its now called.

Of course when I heard about this yesterday I had to put it to the test!

My script is here, and here’s some results:

image

image

Now that’s a bit more orange!  A bit, anyway :)

From this image, the average of all Ludum Dare screen shots, we can infer that the average Ludum Dare contestant uses Windows 7!

(Don’t worry, we still respect them/you!)

Try it yourself!  If you have time before you enter this weekend’s Ludum Dare contest, that is…

Aug 14

Python annotations and type checking

Someone got at Guido!  He comes back from some talk with the idea of blessing mypy type checking annotations in Python!

Now those following along may recall my own type-checking library obiwan that also uses annotations.  Its on pypi so you can use it already today!  What are you waiting for?

I’m keen on static checkers, although I prefer static type inference and have ideas along those lines such as Static Single Type Assignment (SSTA).

Fundamentally, though, restricted Python isn’t Python.  We want a dynamic language, and this hits on the limits of static checking.  Obiwan’s runtime checking is a compliment to a static checker and its ability to check external data e.g. tainted JSON input from an API is a major plus too.

What we want is for static checkers, auto-completion in IDEs and runtime checkers like obiwan to all use the same syntax and conventions.

But I don’t like the mypy syntax; its baroque and it even overloads comments (yuck!).  I much prefer obiwan’s style and would like annotations to be supported in-block as well as on the boundaries i.e. declarations.

Consider this mypy example:

  from typing import List, Dict

  def word_count(input: List[str]) -> Dict[str, int]:
      ...

Obiwan would instead say:

  def word_count(input: [str]) -> {str, int}:
      ...

Which I consider far clearer and pythonic.

My reply to the list is here.

FWIW, I don’t imagine anything will come from this.  There is unlikely to be consensus, and although I’d like a solid PEP that puts annotations into blocks it would so risk slowing down programs that I think it unlikely to be adopted.  So we may be left with a clear message that people are to use mypy or whatever, but no actual solid move forwards.

Jun 27

Devoxx4Kids Inspiration!!

Roy van Rijn makes the really really great Devoxx and Devoxx4Kids videos.  Really really upbeat and inspiring!

Next time, send your kids too!

Jun 26

Lots of programmers make nests.

When you’re working alone on something, its very tempting to make a whole library of code to support yourself programming something.. to put off solving real problems and so on ;)

Here’s an example I read today of over-engining sprite sheets (been there, done that myself!)  That game looks absolutely stunningly awesome, of course!

But here’s a game engine I’ll call out as having potential: image

IceEntity by NicoM1; check it out!  A nest you might be comfortable sitting in yourself?

Seems from these links that I’m all nostalgic for a minimalist artsy XKCD-inspired platform puzzle adventure game… maybe an idea for LD30 August 22-25th 2014?

Jun 04

An Impartial Press

Well we all hear reports (on the BBC) about how the separatists in Ukraine are fed a distorted pro-Russian media diet… but generally the BBC tries to be objective.  However, this choice of “stock photo” jumped out at me:

Yes we know its just vapour, but I think we’re in the minority… the overall spin this picture gives the story for the general reader is rather different?

See also: Salami Tactics

Jun 03

Compressing Scrabble Dictionaries

The newest rec.math Al Zimmermann Programming Contest is a solitaire Scrabble puzzle called ‘Alphabet City’.

Wait, what do you mean you’re not entering that programming contest??  Its a really really nice clean problem and it’s addictive as 2048!

To lookup scrabble words quickly, people use a GADDAG.  The GADDAG is a tree data-structure.  Each node represents a character, and has pointers to all the child nodes that continue all possible words.

The key thing is that every rotation of each word is stored in the GADDAG.  To have a GADDAG of the words ALL, BALL and CALL, for each word we add all rotations.  This is LLA@, LA@L, A@LL for ALL, and LLAB@, LAB@L, AB@LL, B@ALL for BALL and so on.  (@ is my choice of ‘start-of-word’ marker; its a good choice because in ASCII its the character before A.  When you have arrays of child nodes [A..Z], this can easily be extended to have a slot pointing at the start-of-word node [@..Z].  And it looks nice enough in the graphs too.)

To speed up Scrabble searches, the prefix of each word is stored reversed which facilitates ‘hook’ searches.  To search for all words with one letter then A then another empty then L (?A?L), you start with the first known letter - A - and look for all those with one character in front (a child with a @ child), terminated by an L.

A naive sparse implementation of a GADDAG takes absolutely loads of memory: over a GB is not startling.  Almost all those nodes will be duplicates of other parts of the tree and almost all possible children will exist.  So we really want to get the memory usage down if we want to avoid our program waiting on main memory all the time: bottom cache is typically in the tens of nanoseconds away, whereas main memory is around a hundred nanoseconds away.  Getting the GADDAG small enough to fit into cache, even the bottom level cache (as in slowest and biggest), can mean an order of magnitude speedup!  So its well worth compressing your GADDAG.

The original paper describes compression and gives an example of partial compression.  I will talk these through with graphs:

image

Here’s the uncompressed tree.  Sorry it takes so much screen space .. it has a lot of nodes in it, which is a point I want to make :)

3 words, 11 rotations and 27 nodes.

Lets start compressing it:

The first thing to note is that, unlike the ALL, BALL, CALL example on NullWords blog, I am storing the terminating characters (the blue nodes) in a separate set off the node.  This is how the prefix LLA is followed by a single node with both B and C in it.

Read More

May 22

pycon 2014 Sweden: the bad bits

I’m just back from pycon 2014 Sweden.

And all the tweets and commentary will be positive, as they always are. If you want to say something negative its best to not say it at all?

Hmm, not me.  Someone has to give this feedback; we can’t sweep it under the carpet.

Read More

May 02

Making Image Mosaics

imageAs followers of the blog know (thx for the mails!), I’ve been making mosaics of Ludum Dare contest entries.  It started basic, and its got better.  Each of the ‘pixels’ in the Mona Lisa is actually a screenshot of a game!  Here’s how:

We have an target image, for example the Mona Lisa, and we have some large number of small input images which we want to arrange so that, when squinted at from a distance, it approximates the target image.

Read More

Apr 30

[video]

Apr 29

[video]

Apr 07

I’m giving a Mill CPU talk in Växjö, Sweden 2014-04-25

There are perhaps a few seats available to interested members of the public if you’re in or near Växjö at the end of the month :)

Announcing a presentation at 13.00h, Friday, April 25 at: 

Linnéuniversitetet (Linnaeus University) 
Vejdes Plats 6, Linnéuniversitetet 
Building B, Room 3021 
352 52 Växjö Sweden 

The new Mill CPU architecture 

A presentation by Will Edwards (Mill Computing) 

The new Mill CPU architecture brings DSP-like efficiency and performance to general purpose computing. Offering a 10x power/performance gain over conventional out-of-order superscalar architectures, the Mill family of CPUs scales from phones to supercomputers. 

The Mill is an extremely wide-issue VLIW design, able to issue up to 33 operations per cycle. The Mill is inherently a vector machine and can vectorize and pipeline almost all loops. The Mill is a belt machine (as distinct from a stack or register machine) and has a fine grained 
security model that facilitates microkernels without performance penalties. 

This talk will give a high-level introduction to the Mill programming model, with an opportunity for the audience to ask more detailed 
questions in areas of interest.

Mar 25

Mill Security Talk now online

The Mill CPU’s Security talk is now online:

Mar 12

Russia’s Salami Tactics in Crimea

Well worth a re-watch all the way through :)

Snowden was right about encryption - so what can programmers do about it?

Right now, the NSA - and China, France, Britain, Russia and doubtless all others - hoover up and store every byte.  And they go fishing.

We need to move back to a world where they have to break into your computer - something they’ll only do if they target you personally, not generally - to spy on you.

And we can do that by encrypting everything.  Everything.

Encrypted data cannot be compressed and is therefore much harder to store.  And if they do invest in storing it anyway, forward privacy can hopefully keep much of it private in the future too.

How to encrypt everything?  Normal websites won’t adopt TLS unless forced to; browsers should show very nasty banners on HTTP connections, perhaps?  Big brands won’t want that floating over them.

Browsers should only use encrypted comms to do DNS lookup too.  So much is in the hands of the browsers.

CDNs should rewrite everything to be TLS.

Also, the *ahem* porn industry ought to adopt TLS outright.  They don’t want their customers scared away, right?

BitTorrent has traditionally used encryption to prevent ISPs from traffic shaping.  But BitTorrent encryption is obfuscation at best, and the ciphers picked were picked to minimise CPU.  These days, we have more CPU.  BitTorrent needs strong encryption.

How else can big swathes of the internet be encrypted quickly?

Mar 05

Table-based Template Translation in C++

This is a great technique I devised for simplifying the optimization of translation functions.  Its broadly applicable to performance-critical code beyond software graphics.  I’d love to know if there’s a proper name for this pattern, and if anyone else has ever used it and for what?

At UIQ (a Symbian smartphone UI maker) we needed really fast bitmap blitting performance in the Symbian software graphics library and we just weren’t getting it.

If two bitmaps are the same format, you can just memcpy (or blend or whatever) pixels from one to the other.

If two bitmaps are different formats, you have to translate the source bitmap’s pixels to the destination bitmap’s format.

However, the Symbian libraries themselves were shooting us in the foot.  There were some special-cased blits for same-format source and destinations, and a hand-coded special-case for 16-bit colour with a separate 8-bit alpha mask bitmap into a 24-bit destination with 32-bit alignment, and that was it.

All other conversions went via two virtual method calls… one to convert from source to a 32-bit RGBA universal internal format, and the other to convert to destination format.

Who would use virtual inheritance per-pixel for critical-path colour conversion on a constrained device?  I know their name ;)

Read More