Category Archives: Uncategorized

Switching Gears

New Horizons

So I started a job at a new company, a famous search engine! One consequence is that I am no longer using Windows and the Microsoft Stack, instead, Linux, shells, and emacs.

What do I miss from Windows development

F#

This is a really cool functional language that has some nice facilities for parallel and async programming. Unfortunately, I wasn’t able to use it at my work.

Visual Studio Intellisense

Eclipse has this and Emacs has some completion and search facilities. But, Visual Studio’s blows both of these away in it’s speed. With Eclipse I often find it faster to type out the whole token instead of waiting for it to complete something.

var keyword in C#

Using java now, I am frustrated by its verboseness. This is especially a problem when one makes heavy use of Generics.

lambda expressions

Many classes here require function objects. The workaround is to usually create an anonymous class that implements one method on an interface. The syntax is much bulkier than C#, let alone F#.

C# generics

Java’s generics feel clumsy and less powerful. The main source of this clumsiness is the constant compiler warnings one gets and has to supress when using generics in java.

What do I love about Linux development

Emacs

Ok, this is an insanely powerful and addictive tool. Many people hate it but once you start to use it you want to spend more and more time using it. It has all these nooks and crannies to discover. For instance.

org-mode

Org-Mode is a combination of a note-taking mode, to-do list, and publishing framework. People have found many ways to use it. In fact, this blog post is written within org-mode and published using org2blog without ever leaving emacs. org-mode has tags to make it super easy to embed source by bracketing it with tags.

Other Emacs stuff

There is a killer twitter application built into emacs for reading and publishing all your tweets. And the Calc application is a very advanced RPN Calculator with symbolic computation capabilities. The power of using these different tools from within emacs is the ability to use emacs text editing and scripting capabilities in a consistent way in each of these. Also, it makes integration of input and output from these different things very easy.

Bash Shell

Windows shell is really sad and the paradigm there is to do everything with the mouse. There is something liberating about being able to do so many things without lifting your hands off the keyboard, through, emacs, the shell, and all the utilities on Linux. Little things like the history and grep and the ease with which one can whip together shell scripts to automate things. I know one can install things like cygwin but it is all a bit of hack on windows.

Effective Java

Yes, its just a book, but I haven’t seen many books of this depth on programming. It illuminates many pitfalls and gotchas and emphasizes immutability, correctness, and efficiency. I think any programmer would benefit from reading this.

Intangibles

The feel of programming with the Linux stack is different, IMO. With Linux there are so many choices for tools and how to combine them. In contrast, on the windows platform, almost the entire stack comes from Microsoft from the OS, to the language and tools. While this simplifies certain things it feels limiting in a way. In Linux, I feel like I am exploring a vast ocean of possibilities.

Advertisement

Permutation Generation In F-Sharp

Someone asks you to generate all the perumtations from a set of objects and you might think that this is an easy problem  And it isn’t that challenging but I had to think about it for a bit to come up with away to do this.  I thought it would be fun to code this up in F# to see if my algorithm was correct and to learn about the language a bit more.

The algorithm I came up with is essentially a depth first search like approach.  Taking the first element and printing it and then moving it to the ancestor list and printing it with the next one and so on.  This will print the permutations in lexicographic order providing the input elements are sorted.  Here is the source code below.

#light
/// List printing function need a generic version
let ListPrinter format a n =
 printf  "%A. " n
 a|> List.iter (fun x -> (printf format x))
 printf "\n"

/// Print all the permutation of a list of elements
/// Algorithm Pseudo Code
/// PermutationGenerator(list items)
///    PermuationHelper([], items)
/// end
///
/// PermutationHelper(list ancestors, list descendants)
///  if descandants is null return
///  foreach descendant
///    newAncestors  = ancestors + descendant
///    print(newAncestors)
///    newDescandants = descendants - descendant
///    PernumationHelper(newAncestors, newDescandants)
///  end
/// end
let PermutationGenerator printer items =
 ///internal counter function
 let counter =
  let count = ref 0I
  fun () -> count.contents <- count.contents + 1I; !count
 /// Helper recursive function that does the work
 let rec PermutationHelper ancestors descendants =
  if ancestors <> [] then printer ancestors (counter())

  match descendants with
  | [] -> ()
  | y::ys -> let fn = (fun (front, back : 'a list) curr ->
                 let newFront = front @ [curr]
                 let newBack = back.Tail
                 let newAncestors = ancestors @ [curr]
                 let newDescendants = front @ newBack
                 PermutationHelper newAncestors newDescendants
                 (newFront, newBack))
   in List.fold_left fn ([],descendants) descendants |> ignore
 // call the nested function
 PermutationHelper [] items

let items = ["apple";"banana"; "canteloupe";]
 //"cherry"; "grape";"orange";
 //"peach";"pear"; "strawberry"]
let numbers = [ 1..5 ]

let chars = ['a'..'e']

let StringPrinter = ListPrinter "%s "
let NumericPrinter = ListPrinter "%d"
let CharPrinter = ListPrinter "%c"

let StringPermutations = PermutationGenerator StringPrinter
let NumericPermutations = PermutationGenerator NumericPrinter
let CharPermutations = PermutationGenerator CharPrinter

You can load the above into the F# interactive window and run the following command and see the following results.

> StringPermutations items;;
1I. apple
2I. apple banana
3I. apple banana canteloupe
4I. apple canteloupe
5I. apple canteloupe banana
6I. banana
7I. banana apple
8I. banana apple canteloupe
9I. banana canteloupe
10I. banana canteloupe apple
11I. canteloupe
12I. canteloupe apple
13I. canteloupe apple banana
14I. canteloupe banana
15I. canteloupe banana apple

Some of the more interesting parts as someone learning functional programming include the use of curried functions, the fold operator, and pattern matching.

PermutationGenerator is a generic function that can work with any list of items of  type a’ list as long as a corresponding printer for the items is supplied of type  a’ list -> bigint -> unit.  The basic algorithm is listed above PermutationGeneration in the comments.  Although, F# supports loops the functional way to iterate over elements is to use a function like map or fold, so the foreach loop is replaced by List.fold_left with fn essentially being the body of the loop.

I then build a ListPrinter function that takes a format string.  From this I create printers for three different types using partial application or currying, StringPrinter, NumericPrinter, and CharPrinter.  This allows the creation of three new more specialized functions from the ListPrinter.  Currying is also used to create three specific Permutation Generators, StringPermutations, NumericPermutations, CharPermutations from PermutationGenerator by applying the particular printer.

There are other techniques for generating permutations.  Some involve shuffling of positions of elements after selecting the number of items.  Another technique is to use factoradics which give a way to select elements or permutations based on the factorial number system.

If you can’t beat them…

I had previously tried to write and host my own blogging engine in asp.net mvc.  However, I learned that writing a great blog engine is a lot harder than it seems.  Some of the obstacles I ran into were,

  1. Time consuming – between work and grad school I didn’t have time to pull it off
  2. Tons of features – having not blogged much I thought I only needed to build something that displayed posts and comments but the sheer number of features blog engines support today is daunting
  3. Need great themes – even if you do build a great blog engine it is only as good as the themes available for it
  4. Learning while building it – asp.net mvc was a beta product at the time and many of the patterns and best practices were not fully formed

It wasn’t all for naught as I learned a ton about asp.net mvc which was helpful as I went on to do several projects at work with this technology.  And I think if I was starting something like this again I would have a better idea on how I would approach such a project.

That being said, I came away with a great respect for people who do this kind of open source work and actually produce great tools.

And for blogging wordpress is hard to beat.  I haven’t seen anything else that comes close.  So, I will be importing posts from my previous blog and I hope to post more frequently than once a year.

Birthday Paradox and Probability of Collisions

The birthday paradox is the phenomenon where even with a small fraction of the number of days in the year the probability of two people having the same birthday becomes large.  For instance if there are 36 people in a room (10% of 365) there is a 83% probability that two of them have the same birthday.  With only 23 people we are already over 50%.

This same logic doesn’t only apply, though, to birthdays.  One can calculate the probability of collision for any set of elements given the number of samples.  For instance, assume you want to assign a unique number to each of your customers.  And let’s say you were going to choose the number at random from a huge pool of numbers say 999,999,999. If you anticipate having 100,000 customers then one would expect with probability 99.3% to sample a number that was previously sampled for another customer, even though you have only selected .01% of the available numbers. I thought it would be fun to write an F# program to calculate this.

The probability is calculated as follows P = 1 – Probability of no collisions

It is easier to calculate the probability that there aren’t collisions and subtract that from one. We can use simple counting principles to calculate the probability of choosing n samples without any collisions. For example if we have 365 days as the elements and we have 3 people (samples), then the first person has 365 possibble days he could choose without collision. The second person has 364 possible days out of 365 and the third 363 possible days. So

Probabiliby of No Collisions = (365/365)*(364/365)*(363/365)
and
Probabiliby of Collision = 1 – (365/365)*(364/365)*(363/365)

Here is the following code which implements the above algorithm in F#

let collisionProbability  elements samples =
 let probNthSample n =
  let a = (float  elements)
  let b = (float  (elements - n + 1))
  b / a
 let rec probNoCollision  counter answer =
  match counter with
  | 0 -> answer
  | _ -> probNoCollision (counter - 1) (answer *  (probNthSample counter))
 in
  1.0 - (probNoCollision samples 1.0)

Some things I like about F# are its terseness. You don’t have to provide type annotations for everything unlike in C# or Java. The compiler infers them based on how they are used. Here, I am also using nested functions, which are great for breaking up an algorithm into smaller bits while still maintaining the scope of the functions to where they will be used. Also, nested functions can use values in scope. For instance elements is not passed into probNthSample. This algorithm was coded using tail-recursion to iterate over the different samples. Typically one would use a loop for this in imperative programming, but in a functional style recursion is used more often. (One could also make a list of the samples and fold_left.) Tail-recursion prevents the stack size from growing by not keeping previously called procedures on the stack if there is no additional work to be done. In the inner probNoCollision method, this is the case. The last thing that is used here is pattern matching. Stylistically, I like pattern matching better than if statements.

The probNoCollision function calculates what it name says. It multiplies together the probability of all the samples not colliding.

probNthSample is a helper function which just gets a float of the probability of the nth sample not colliding. For instance, in the birthday problem for the 2nd sample this method would return 364/365.

The last step just returns 1-probNoCollision.

To run this you could load this in the F# interactive window. And then run the command
collisionProbability 365 23;;
collisionProbability 999999999 100000;;

I am really enjoying F# and OCAML programming so far. I will be posting an article soon on permutations that I also coded in F#. This will also be in a functional style and will show more techniques of code reuse that are powerful, that one doesn’t typically enjoy in languages like C#.

Update –

I have rewritten the above function in a more functional style.


let collisionProb2 elements samples =
 let probNoCollision answer n =
  let a = (float elements)
  let b = (float (elements - n + 1))
  answer * b / a
 let answer = seq { 1 .. samples } |> Seq.fold probNoCollision 1.0
 1.0 - answer

This version of the code is more succinct and uses the fold function, as opposed to explicitly recursing over the elements.

A simple example of the Hole In the Middle Pattern

I mentioned the Hole in the Middle pattern in my previous blog article, What’s In a Language? So, I wanted to actually use the pattern in C# with anonymous delegates, because that is the platform that I use at work primarily. One of the uses that jumped out was in a database layer. Typically when you are writing a database layer you have boilerplate code in every method to open and close connections and possibly handle exceptions and log exceptions. This is repetitive code that often gets pasted into every method. One approach for doing this is to create a code snippet that you can use from a snippet library. Another is to use the Hole in the Middle pattern.Here is a simple example of the idea:

public delegate T DatabaseCodeBlock<T>(SqlConnection conn);

public class HoleInTheMiddle
{
  ConnectionHelper connHelper =new
  ConnectionHelper();

  public T ExecuteProcedure<T>(DatabaseCodeBlock<T> block)
  {
    SqlConnection conn = connHelper.GetSQLConnection();

    try
    {
      return block(conn);
    }
    catch(Exception ex)
    {
      LogError(ex.ToString());
      throw;
    }
    finally
    {
      conn.Close();
    }
  }
}

public class DbLayer
{
  HoleInTheMiddle holeInTheMiddle = new HoleInTheMiddle();

  public DataTable GetCustomerByCustomerKey(int customerKey)
  {
    return holeInTheMiddle.ExecuteProcedure<DataTable>( conn =>
      {
        RunSpParmsClass parm = new RunSpParmsClass();

        parm.AppendEx("@CustomerID", SqlDbType. Int, ParameterDirection.Input, 4, customerKey);
        DataSet ds= dbHelper.RunSpGetDataset("GetCustomer", parm, conn);

        if (ds !=null && ds.Tables[0].Rows.Count > 0)
        {
          return ds.Tables[0];
        }
        else
        {
          return null;
        }
       }
     );
   }
}

Now some of you are wondering why go through this trouble when you can use the Microsoft Data Access Application Block. Good question, but the point of this sample is to illustrate one way of using this design pattern. Now let’s take a look at the code.  I have created a method ExecuteProcedure that has code to open and close a connection and to handle exceptions.  It then fires a delegate of type DatabaseCodeBlock.  This is the code block that we supply to fill the hole in this helper function.  I am using generics to strongly type the return type of the DatabaseCodeBlock and the ExecuteProcedure method.  This is not really neccessary as you could just set the value of a local variable in your anonymous delegate that you could return from the method that is calling ExecuteProcedure, but I think using generics makes for cleaner code.

Now the intersting part of this is the power of the anonymous delegate we are using for the DatabaseCodeBlock. Anonymous Delegates are what are known as closures and they can capture local variables and reference them in the code block.  Even if this code block is executed in a another context.  This allows one to not have to define different delegate interfaces if you want to use different patterns or to build some parameter array of inputs.  Instead, you can just pass in your anonymous delegate and use any local variable in it magically.  So in our example we are getting a Customer record by using the local customerKey variable.  Now imagine we have a different stored procedure that we want to invoke that requires three or four parameters.  If these are in the local scope of where the delegate is created then we can access them.  Easy as pie!  The last point is a subtle one.  I suggest reading up on closures to get a better feel for some of the power of this way of doing things.