Posts tagged correctness

Typed embedding of STLC into Haskell

5 February 2015 (programming haskell language correctness)

Someone posted to the Haskell subreddit this blogpost of Lennart where he goes step-by-step through implementing an evaluator and type checker for CoC. I don't know why this post from 2007 showed up on Reddit this week, but it's a very good post, pedagogically speaking. Go and read it.

In this post, I'd like to elaborate on the simply-typed lambda calculus part of his blogpost. His typechecker defines the following types for representing STLC types, terms, and environments:

data Type = Base
          | Arrow Type Type
          deriving (Eq, Show)

type Sym = String

data Expr = Var Sym
          | App Expr Expr
          | Lam Sym Type Expr
          deriving (Eq, Show)

The signature of the typechecker presented in his post is as follows:

type ErrorMsg = String
type TC a = Either ErrorMsg a
newtype Env = Env [(Sym, Type)] deriving (Show)

tCheck :: Env -> Expr -> TC Type

My approach is to instead create a representation of terms of STLC in such a way that only well-scoped, well-typed terms can be represented. So let's turn on a couple of heavy-weight language extensions from GHC 7.8 (we'll see how each of them is used), and define a typed representation of STLC terms:

{-# LANGUAGE GADTs, StandaloneDeriving #-}
{-# LANGUAGE DataKinds, KindSignatures, TypeFamilies, TypeOperators #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-} -- sigh...

import Data.Singletons.Prelude
import Data.Singletons.TH
import Data.Type.Equality

-- | A (typed) variable is an index into a context of types
data TVar (ts :: [Type]) (a :: Type) where
    Here :: TVar (t ': ts) t
    There :: TVar ts a -> TVar (t ': ts) a
deriving instance Show (TVar ctx a)

-- | Typed representation of STLC: well-scoped and well-typed by construction
data TTerm (ctx :: [Type]) (a :: Type) where
    TConst :: TTerm ctx Base
    TVar :: TVar ctx a -> TTerm ctx a
    TLam :: TTerm (a ': ctx) b -> TTerm ctx (Arrow a b)
    TApp :: TTerm ctx (Arrow a b) -> TTerm ctx a -> TTerm ctx b
deriving instance Show (TTerm ctx a)

The idea is to represent the context of a term as a list of types of variables in scope, and index into that list, de Bruijn-style, to refer to variables. This indexing operation maintains the necessary connection between the pointer and the type that it points to. Note the type of the TLam constructor, where we extend the context at the front for the inductive step.

To give a taste of how convenient it is to work with this representation programmatically, here's a total evaluator:

-- | Interpretation (semantics) of our types
type family Interp (t :: Type) where
    Interp Base = ()
    Interp (Arrow t1 t2) = Interp t1 -> Interp t2

-- | An environment gives the value of all variables in scope in a given context
data Env (ts :: [Type]) where
    Nil :: Env '[]
    Cons :: Interp t -> Env ts -> Env (t ': ts)

lookupVar :: TVar ts a -> Env ts -> Interp a
lookupVar Here      (Cons x _)  = x
lookupVar (There v) (Cons _ xs) = lookupVar v xs

-- | Evaluate a term of STLC. This function is total!
eval :: Env ctx -> TTerm ctx a -> Interp a
eval env TConst = ()
eval env (TVar v) = lookupVar v env
eval env (TLam lam) = \x -> eval (Cons x env) lam
eval env (TApp f e) = eval env f $ eval env e

Of course, the problem is that this representation is not at all convenient for other purposes. For starters, it is certainly not how we would expect human beings to type in their programs.

My version of the typechecker is such that instead of giving the type of a term (when it is well-typed), it instead transforms the loose representation (Term) into the tight one (TTerm). A Term is well-scoped and well-typed (under some binders) iff there is a TTerm corresponding to it. Let's use singletons to store type information in existential positions:

$(genSingletons [''Type])
$(singDecideInstance ''Type)

-- | Existential version of 'TTerm'
data SomeTerm (ctx :: [Type]) where
    TheTerm :: Sing a -> TTerm ctx a -> SomeTerm ctx

-- | Existential version of 'TVar'
data SomeVar (ctx :: [Type]) where
    TheVar :: Sing a -> TVar ctx a -> SomeVar ctx

-- | A typed binder of variable names
data Binder (ctx :: [Type]) where
    BNil :: Binder '[]
    BCons :: Sym -> Sing t -> Binder ts -> Binder (t ': ts)

Armed with these definitions, we can finally define the type inferer. I would argue that it is no more complicated than Lennart's version. In fact, it has the exact same shape, with value-level equality tests replaced with Data.Type.Equality-based checks.

-- | Type inference for STLC
infer :: Binder ctx -> Term -> Maybe (SomeTerm ctx)
infer bs (Var v) = do
    TheVar t v' <- inferVar bs v
    return $ TheTerm t $ TVar v'
infer bs (App f e) = do
    TheTerm (SArrow t0 t) f' <- infer bs f
    TheTerm t0' e' <- infer bs e
    Refl <- testEquality t0 t0'
    return $ TheTerm t $ TApp f' e'
infer bs (Lam v ty e) = case toSing ty of
    SomeSing t0 -> do
        TheTerm t e' <- infer (BCons v t0 bs) e
        return $ TheTerm (SArrow t0 t) $ TLam e'

inferVar :: Binder ctx -> Sym -> Maybe (SomeVar ctx)
inferVar (BCons u t bs) v
  | v == u = return $ TheVar t Here
  | otherwise = do
      TheVar t' v' <- inferVar bs u
      return $ TheVar t' $ There v'
inferVar _ _ = Nothing

Note that pattern matching on Refl in the App case brings in scope type equalities that are crucial to making infer well-typed.

Of course, because of the existential nature of SomeVar, we should provide a typechecker as well which is a much more convenient interface to work with:

-- | Typechecker for STLC
check :: forall ctx a. (SingI a) => Binder ctx -> Term -> Maybe (TTerm ctx a)
check bs e = do
    TheTerm t' e' <- infer bs e
    Refl <- testEquality t t'
    return e'
  where
    t = singByProxy (Proxy :: Proxy a)

-- | Typechecker for closed terms of STLC
check_ :: (SingI a) => Term -> Maybe (TTerm '[] a)
check_ = check BNil

(The SingI a constraint is an unfortunate implementation detail; the kind of a is Type, which is closed, so GHC should be able to know there is always going to be a SingI a instance).

To review, we've written a typed embedding of STLC into Haskell, with a total evaluator and a typechecker, in about 110 lines of code.

If we were doing this in something more like Agda, one possible improvement would be to define a function untype :: TTerm ctx a -> Term and use that to give check basically a type of Binder ctx -> (e :: Term) -> Either ((e' :: TTerm ctx a) -> untype e' == e -> Void) (TTerm ctx a), i.e. to give a proof in the non-well-typed case as well.

Full code as a gist on Github

Simply Typed Lambda Calculus in Agda, Without Shortcuts

1 May 2013 (programming agda correctness language)

There are many ways to introduce dependent type systems, depending on which side of the Curry-Howard lens you look through. But if your interest is more in programming languages than proof assistants, then length-indexed vectors is your Hello, World!, and an interpreter for the simply-typed lambda calculus is your FizzBuzz. I think the usual presentation of the latter is a bit of a cheat, for reasons which I will explain below. I'll also show a "non-cheating" version.

Mod-N counters in Agda

19 February 2012 (programming agda correctness haskell) (6 comments)

First of all, before I start on the actual blog post, let me put this in context. I rembember a couple of years ago when I developed an interest in functional programming languages, and Haskell in particular. There was a phase when I was able to use Haskell to solve problems in the small. I understood most of the basics of pure functional programming; then there were things I regarded as magic; and of course there was a lot of things I didn't even know that I didn't know about. But none of it did I grok.

I feel like I'm starting to get to the same level with Agda now. So this is going to be one of those "look at this cool thing I made" posts where the actual result is probably going to be trivial for actual experts of the field; but it's an important milestone for my own understanding of the subject.

I wanted to play around with simple but Turing-complete languages, and I started implementing an interpreter for a counter machine. More on that in a later post; this present post describes just the representation of register values. In the model that I implemented, values of registers are byte counters, meaning they have 256 different values, and two operations +1 and -1 that the inverses of each other. Incrementing/decrementing should roll over: 255 +1 = 0 and 0 -1 = 255.

Unary arithmetics is even slower than you'd expect

19 April 2010 (haskell programming math agda correctness)

My coworker Encsé posted a challange use the 9-digit number problem to demonstrate interesting programming techniques. So I sat down and wrote my first practical program using dependant types, in Agda. I've been playing around with Agda previously, but this seemed like a good opportunity to try to write an actual, self-contained, provably correct program with it.

I'm not going to get into details now about the resulting Agda code, because I'm planning to present it in detail in a later post. In its current form, my program is formally proven to produce only good nine-digit numbers; the only property that still needs proving is that it finds all of them.

But the sad surprise came when I tried to actually run it. It was unbearably slow to just get to all possible combinations for the first three digits. I'm talking about 12 minutes just to list them from 123 to 423 (at which point I killed the process). For comparison, the following Haskell program, which is an implementation of the same naïve algorithm, finds the (unique) solution in 4 milliseconds:

import Control.Monad.List

fromDigits = foldr shift 0
    where shift d s = 10 * s + d

p `divides` q = q `mod` p == 0

encse :: [Int]
encse = map fromDigits $ encse' 0 []
    where encse' 9 ds = return ds
          encse' n ds = do d <- [1..9]
                           let ds' = d:ds
                               n' = n + 1
                           guard $ not (d `elem` ds)
                           guard $ n' `divides` fromDigits ds'
                           encse' n' ds'

So where's that slowdown coming from?

The first intuition would be that the code generated by Agda is slow because in parallel to the actual computation, it is also evaluating all kinds of proofs. But the proofs exist only in the world of types, so they shouldn't matter once the program is compiled.

The real answer is that calculating in unary representation is slow. Very, very slow. Even slower than you'd imagine.

The B Method for Programmers (Part 2)

22 February 2010 (programming language math correctness)

In my previous post, I introduced the B method and showed the steps of writing a simple program for finding the n^th element of a sequence satisfying a given predicate p. While you may think the resulting program is correct, we can't just say so and be done with it. The whole point of the B method is that the resulting program can be formally proven correct.

The B software generates 69 so-called proof obligations for the code from the first part. These are assertions about the program actually behaving as specified. For example, let's look at PO69 which asserts that ll is correctly set. Recall first the relevant portion of the specification:

ll = bool(card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}) ≥ nn)

And the invariant of the implementation:

ll = bool(card({kk | kk ∈ aa..ii ∧ pp(kk) = TRUE}) = nn)

So what we have to prove is that given the preconditions, by the time the loop in the implementation terminates, the invariant makes sure ll is equal to its specified value. This is what's described (somewhat more verbosely) by the actual proof obligation below.

Blog tags

Blog tags

Posts tagged correctness

Typed embedding of STLC into Haskell

Simply Typed Lambda Calculus in Agda, Without Shortcuts

Mod-N counters in Agda

Unary arithmetics is even slower than you'd expect

So where's that slowdown coming from?

The B Method for Programmers (Part 2)

Older posts:

Posts from all tags