Tom Baker

Måske tænker du: Hvem er ham Tom Baker og hvad laver han her på bloggen? Forhåbentlig ikke, men her er en forklaring.

Tom Baker er en engelsk skuespiller, der lagt stemmer til karakterer i animationsfilm og computerspil, samt spillet med i blandt andet Black Adder. Men vigtigst af alt, var han den 4. doktor i den Bristiske serie “Doctor Who” – endda den doktor der levede længst, nemlig 7 sæsoner. Han er også den doktor, som var min indgang til serien, som jeg som barn slugte råt. For mig er han nok den eneste Doctor.

Tom Bakers inkarnation af The Doctor var angiveligt medvirkende årsag til, at Anita Sengupta endte i Nasa’s Jet Propulsion Laboratory (Ja, ved det godt. Collect underpants -> profit, men pointen er, at Science Fiction er inspirerende). Det var også science fiction, der vakte min egen interesse for computere og software. For mit eget vedkommende, var det en kombination af Doctor Who, Star Trek, 2001: A Space Odyssey og Alien der gav en diffus drøm om at sende intelligent software ud i rummet.

Kilden til inspiration er sammenlignelig og Sengupta og jeg er åbenbart noget der ligner jævnaldrende, dog er Sengupta væsentlig mere badass end mig, hvilket gør, at jeg nu glæder mig ustyrligt til at høre hendes Keynote på GOTO; i København. Her vil hun tale om en opgave, som hun løste for Nasa: Den 5. august, 2012 landede Curiosity på Mars overflade. Senguptas arbejde er årsagen til, at Curiosity landede på Mars og ikke blev knust mod overfladen – og det glæder jeg mig til at høre om. Hun har gjort, hvad jeg stadig kun drømmer om. Indtil GOTO; må jeg nøjes med dette interview fra BBC.

Advertisements

Konferencer i efteråret

Det er ikke nogen hemmelighed, at jeg holder af konferencer. Henover sommeren har jeg kigget på efterårets konferencer, og jeg har også kigget på budgettet, så jeg har udvalgt to, som jeg har tænkt mig at tage på.

GOTO; – 5-6. Oktober
Dette er en fast, tilbagevendende konference på mit program. Den er vigtig for mig af flere årsager. Dels er der det faglige input, men også det at møde de andre deltagere, at sludre med talerne fylder meget.
I år afholdes konferencen i København og jeg er meget spændt på hvordan det bliver. Trifork har tidligere forsøgt sig med at afholde Goto i København, men jeg kan forstå det var med begrænset succes. Jeg deltog, og jeg syntes den var meget anderledes en konferencen i Aarhus. Jeg kunne bedst lide Aarhus. Det er måske også derfor, at jeg er en smule skeptisk overfor at konferencen nu flytter til Hovedstaden. Det ærgrer mig også en smule, for jeg holder meget af Aarhus by og det var rart at få et par dage væk, hvor dagligdagen kom på behørig afstand. Jeg er bange for det ikke sker denne gang. Jeg overvejer endda at booke hotel, selvom jeg kunne cykle hjem, blot for at få fornemmelsen af, at hverdag og arbejde er langt væk.

Til dagligt arbejder jeg med Big Data, Predictive Analysis og Machine Learning, så det er selvfølgelig med det i baghovedet, at jeg kigger på årets program. Det er ikke noget spor, der er decideret navngivet “Big Data”, men noget der ligner: “The State of Data”. I beskrivelsen nævnes “machine learning”, “data analytics” og “scalability techniques”, hvilket jeg i min naive, håbefulde verden, læser som værende netop mit område. Heldigvis er hele tirsdagen sat af til dette spor.

Mit barnehjerte kan selvfølgelig også blive tilfredsstillet, håber jeg, med sporet “Robotics and Drones” mandag eftermiddag. Her glæder jeg mig til at høre “The New Frontier of Robotics”, som jeg håber handler om krydsfeltet mellem robotter og kunstig intelligens. I hvert fald er dette taleren Søren Tranberg Hansens bagrund.
…og så selvfølgelig keynoten “Curiosity’s Entry Descent and Landing on Mars”, vis titel vist giver sig selv.

Normalt kigger jeg også talerlisten igennem og vælge nogle talks på den baggrund, men i år synes jeg ikke rigtig at der er nogen som springer i øjnene. Jeg skal selvfølgelig høre Dave Thomas, fordi han er Dave Thomas. Det er ikke så vigtigt hvad han taler om, han er altid værd at lytte til. I år taler han så tilfældigvis om The State og Data, så jeg er dobbelt heldig.
Et andet navn der springer i øjnene, er Janne Jul Jensen, der skal tale om UX og det heldigvis som keynote, så det kommer ikke til at kollidere med mine andre ønsker.

Spark Summit, Amsterdam – 27-29. oktober
Spark Summit er for mig en ny konference, der – som navnet angiver – er centreret om Apache Spark projektet. Da jeg ikke har været til denne konference før, ved jeg selvfølgelig ikke hvad jeg skal forvente. Der er dog nogle interessant punkter i planen, som har fanget min interesse og er årsagen til, at jeg tager til konferencen:

“Building a REST Job Server for interactive Spark as a service”
At the moment we are running a lot of batch jobs, and I’m very interested to see, if I could transform some of them to more interactive services. My hope is to get some pointers from this talk.

“A Scalable Implementation of Deep Learning on Spark”
I also use a lot with Machine Learning algorithms in my daily work, but not any deep learning algorithms, but is of course interested to learn about the possibilities in Spark.

“Using Natural Language Processing on Non-Textual Data with MLLib”
Sidste sommer kodede jeg en dims der kunne klassificere tekster. Den var ret god til at klassificere og kunne klassificere i 3 niveauer:

  • Hvad man kunne kalde “den større sammenhæng” – om det var nationalt, europæisk eller globalt emne.
  • Tekstens overordnede emne, f.eks. kultur, økonomi, politik og et par stykker mere.
  • Tekstens indhold, f.eks. VM i fodbold, Tour de France eller koncertanmeldelse – for at nævne nogle stykker.

Algoritmen var en Bayesian learning algoritme, som jeg havde hånd-tweaket med lidt ML-fu fra min værktøjkasse. Algoritmen fungerede ret godt. Jeg har siden brugt samme algoritme til at klassificere andet data en tekster, og har haft en del succes med det. Derfor er jeg ret spændt på at høre, hvad andre har forsøgt sig med og hvilke resultater de har fået.

“Combining the Strengths of MLlib, scikit-learn, and R”
Jeg er storforbruger af både MLlib og scikit-learn, men dog ikke R, så hvis de på nogen måde kan kombineres på måder jeg ikke har tænkt på, er jeg interesseret i at høre om det. Om ikke andet, finder jeg det enormt motiverende at høre, at andre har tænkt tanker, der minder om mine egne.

Legacy systems revisited

This post is actually not inspired by this years GOTO Conference, but the one 2 years ago in Copenhagen. I can’t remember if it was in a talk or it was during my discussing with Dave Thomas in one of the breaks, that he started talking about “Legacy systems”. I was kind of star struck, so I didn’t question anything he said – well, not until I got home anyway. When I was looking through my old posts, I found this entry, where I asked how others would define the term “Legacy System”. Nobody really coined it and I’ve been meaning to follow up on this post for 2 years now, but never got around to it before now, so here goes. This is my definition.

Technology needs to be dead or dying
If the system is build on a brand new techology, it is hard to argue that is has any kind of legacy to it. It has to be something that has been, something that was state of the art, but not any longer. The technology was either adequate or simply the best available at the time the system was build, but the world/company moved on and the chosen techology had a hard time keeping up with new demands.

Systems has to have substantual/significant value to the company
If the systems does not have any value to the company, it can be turned off. The revenue stays the same and nobody will miss it, therefor the cost of having it running needs to be less than the loss of shutting it down.

The rate of new features added is going towards zero
If features are still add to the system at a high rate, I would argue that it is a system still under development. This to me indicates, that its technology can keep up with demands, thus it is not dying. The system needs to be in a state where it either sufficiently solves the business needs or the pain and cost of adding extra features are higher than the revenue from adding them.

The time spent on new features is less than time spent on bugfixing and maintenance
Though the rate of adding new features is going towards zero, I will still argue, that if more time is spent adding features that maintaining the system, it is still under development. Every time a feature is added, the value of the system increases and if it increases more than the cost of running it, the company is investing thus the system is still under development, therefor it is not a legacy system. It has to be in a state where the company invests less in adding new features than the cost of running the system.

* * *

So this basically sums up to be an old system that have a substantual/significant value to the company, a system that does not grow in value but needs to be maintained, so it does not devaluate.

What is “simple”

So I was all high on simplicity yesterday and of course Frank had to ask me the question that made me crach land again: “So, what do you mean by ‘simple’?” I almost didn’t hear this mornings talk from Brian Goetz about lambdas in Java, since me head was crunching that question. What is “simple”, what does it all mean?

It struck me that I need to be able to measure complexity to answer that question. I need some way of comparing to things and say, that one is more complex than the other. Looking at code, at a system, at a framework, we have en intuitive understanding of simple. But it is not only intuitive, it is subjective. If something is easy to do or easy to understand. Does that make i simple?

I’ve been doing karate for years and it is pretty easy for me to round house kick someone and kill them 3 times before they hit the ground. Does that make it easy to do, thus simple? Not really. Karate is hard and complex, I just practiced a lot. So easy to do, does not make simple.

What about easy to understand? I understood most of Brian Goetz talk about lambas in Java without a big effort. Of course I had to think really hard about some af the stuff, but in overall, it was pretty straight forward. But I’ve been programming for 25 years now in anything from assembler to prolog, and I’ve been a full time clojure programmer for almost 2 years now. I should know about lambdas and low level stuff by now. That gives me an edge. It is only easy because of all the other stuff I know about the subject.

That still leaves the question unanswered. Should complexity then be messaured by LOC? That’s at least a objective messaurement and maybe better. If I can express the same algorithm in half the lines of code, it is simpler right?

Here’s the Adler checksum algorithm in C:

const int MOD_ADLER = 65521;
uint32_t adler32(unsigned char *data, int32_t len) 
{
    uint32_t a = 1, b = 0;
    int32_t index;
    for (index = 0; index < len; ++index)
    {
        a = (a + data[index]) % MOD_ADLER;
        b = (b + a) % MOD_ADLER;
    }
    return (b << 16) | a;
}

12 lines of code

and here it is in clojure:

(def base 65521)
(defn cumulate [[a b] x]
    (let [a-prim (rem (+ a (bit-and x 255)) base)]
         [a-prim (+ b a-prim)]))
(derive clojure.lang.LazySeq ::collection)
(defmulti checksum class)
(defmethod checksum String [data]
    (checksum  (lazy-seq (.getBytes data))))
(defmethod checksum ::collection [data]
    (let [[a b] (reduce cumulate [1 0] data)]
          (bit-or (bit-shift-left b 16) a)))

11 lines of code

The latter should the be less complex, right?

I’ll leave this one open and hope for a discussion in the comments.

Keep it simple

To sum up the first part of day 1 of GOTO Conference, I must say there’s a pattern in otherwise unrelated talks: simplicity.

“Programming is hard”, says Donald Knuth. It seems that the speakers all agree, we should stop making it even harder. We tend to, says Russell Miles. “We do it out of boredom”. He argues, that if the task at hand is boring, doesn’t inspire us or is simply just to simple, we tend to try and make it interesting, by making it more complicated. “Uh, I always wanted to look into python. Maybe and can solve the task using that”. And – oh – look and behold. Trouble down the road. But Russel also have a suggestion on how to avoid the complexty we tend to introduce our selfes. He call it O.R.E. – or Organize, Reduce and Encapsulate.

He argues that organizing is nothing more than identifying if a part of a module has to know about the outside world or not. If it does, it is integration, if it doesn’t it is core. Database access? Integration. Rest services? Integration. Business rule? Well, if all data is supplied by the integration parts, I guess it’s core.
He also argues, that integrations should be kept at a minimum af knowledge about the other parts to avoid entanglement. Just pass simple data documents and parse whatever you can understand. “Be liberal in what you accept”. By deciding that data should be immutable, he argues that this also helps in reducing complexity. He says, that to his experience, it will end up looking like functional programming and have a reduce complexity and entanglement. I’m already doing functional programming in clojure, so if it is true, that it ends up looking like functional programming, I would say that he is more or less correct in the rest of his arguments.

Simplicity also seems to be the keyword in the next talk with Mathias Meyer about Travis CI, a hosted,continous integration platform. In the talk, he tels the story about how they redesigned the system from a monolithic architecture to a scalable architecture, build from small, distributed parts. But more about that in a later post. Still have some digesting to do.

Glæder mig

Jeg skal til konference i næste uge og jeg glæder mig: 3 dage med de ypperste i min branche. Jeg er endda blevet inviteret, så jeg har VIP-billet med det, som ville svare til backstace-adgang. Det er jeg selvfølgelig ekstra glad for, men det er nu ikke den primære årsag.

De fleste forstår intuitivt, at for at holde sig i form, skal man træne regelmæssigt. Sidder man stille og inaktiv, forfalder kroppen, for den spilder ikke energi på at vedligeholde noget som ikke bliver brugt. Snart bliver hverdagens største udfordring også det maximale man kan præstere.

Sådan er det også med hjernen…

Hvis ikke man træner sin hjerne, giver den nogle udfordringer ud over det sædvanlige, forfalder evnen til at overskue komplekse problemstillinger. Hvis man ikke tager på konferencer eller efteruddanner sig, sidder man reelt og bliver dummere.

Ses vi på GOTO?

On Ticket goto GOTO;

Whoop, whoop. I’m in luck. It seems I have found a sponsor for a ticket for GOTO;!

Now that I dare hoping to attend, I have started to study the details of the program. One of the tracks has a day called “Distributed Systems Renaissance”, and that caught my interest, because distributed systems is a part of my day job.

One talk in particular caught my interest: “The Smallest Distributed System”. In distributed systems, one of the challenges is being robust to systems falling out and comming back online. This could be due to errors like network problems or due to planned downtime. No matter what, the rest of the system should continue to run and when the missing part is back online, it should be able to catch up whith what have happend while being down. Think of cash machines, that can run whithout connection to the bank. It happily hands out money and when it reconnects to the bank, all of the transations are transmitted. This also implies, that the account balance might not reflect how much money you actually have left the account, as some transactions might still be only in the cash machine. But eventually all transactions will be transmitted, so eventually the balance will be correct. This is more or less what eventual consistency is about, and this talk promis to address these issues.

Where I work, we have a produktion system holding about 1.2 million customers. Most of it is build from the ground up, and for the newest and most central parts, we have used things like Clojure and Datomic – and it’s distributed. Though I have been interested in and been building distributed systems for years now, there a still loads of stuff I should and could learn. I’m really hoping this talk can give me some more insight.

But man – when a glance through the program. It’s like a candy store and I’m starving.