Erik Derohanian


A Gentle Introduction to MicroKanren

Part 1 / ?

Contents

Overview

MicroKanren Hemann & Friedman, 2013 is a minimal relational programming language in the miniKanren minikanren.org family. It is designed to be simple and easy to extend, but more importantly, easy to understand. This post is going over a python implementation of the interpreter microKanren-py with lots of type signatures and descriptive variable names that are (usually) longer than a single character.

What This Post Is & Isn’t

This is an explanation of the mechanics of a microKanren implementation. It’s going to outline the code of an implementation step by step with examples.

This post isn’t an introduction to relational programming Introduction to Logic Programming with Clojure - Ambrose Bonnaire-Sergeant , a tutorial on any type of Kanren minikanren.org/#writtenTutorials , or an explanation of why one might choose relational programming If you’ve ever written SQL, you’ve already sort of done some relational programming without realizing it. .

Example

An example we’d like to be able to compute in our microKanren implementation is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    goal = fresh(lambda a, b: conj(
        eq(a, 1),
        disj(
            eq(b, 2),
            eq(b, 3))))
    result = goal(State.empty())
    assert result == [
        State({'a#1': 1, 'b#1': 2}, Counter(a=1, b=1)),
        State({'a#1': 1, 'b#1': 3}, Counter(a=1, b=1)),
    ]

This example covers the four primitives available in microKanren: fresh, eq, conj, and disj.

  • fresh is used to create new variables
  • eq is used to compare variables
  • conj is conjunction - the “and” operation
  • disj is disjunction - aka “or”

Here, we are asking microKanren to compute all the possible states where: the variable a is equal to 1 AND (the variable b is equal to 2 OR the variable b is equal to 3). This has two possible results, the first where a is 1 and b is 2, and the second where a is 1 and b is 3. These are listed out on lines 8 & 9 in the result assertion.

The code

Variables, Terms, Substitutions & States

The first things to define are variables, terms, substitutions, and streams. Variables are a lot like the variables you’re probably used to, except that here, once a variable is “defined” as something, that value can never change. New variables are created by calls to fresh. We will represent our variables as strings. In the above example, the two variables are a and b.

Variables can be “defined” as pointing to terms: this can be either another variable, or an arbitrary value. This collection of mappings of Vars to terms is stored in a dictionary of type sub_dict. Above, we’re only using values: 1, 2, and 3, but we also could have defined a variable as being equal to another. In our example, the result is a list of states.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Var(str):
    pass

term = Union[Var, Any]
sub_dict = dict[Var, term]

class State():
    def __init__(self, subs: sub_dict, var_counter: Counter):
        self.subs = subs
        self.var_counter = var_counter

    @classmethod
    def empty(cls):
        return cls({}, Counter())

    def __eq__(self, other):
        return isinstance(other, State) and \
            self.subs == other.subs and \
            self.var_counter == other.var_counter

    def __repr__(self):
        return f"State({self.subs}, {self.var_counter})"

    def __str__(self):
        return self.__repr__(self)

State contains a a sub_dict to keep track of substitutions and a Counter to keep track of what variables have already been created. This Counter is to resolve collisions - remember, fresh always returns a new variable, even if the variable name has already been used. That’s why there a #1 after our variable names in the result. For convenience, we add a classmethod empty to create an empty State, __eq__ to compare States to each other, and __repr__ and __str__ for better representations for debugging.

The usage of a dictionary for substitutions deviates from the original paper, but it makes the job of extending and walking substitutions much easier as well as making solutions easier to read. Extending a substitution is exactly what is sounds like - returning a new dictionary where we’ve added a new mapping of a Var to a term to the existing dictionary of substitutions:

1
2
def extend_subs(source: Var, target: Any, subs: sub_dict) -> sub_dict:
    return subs | {source: target}

Walk

The next thing we need is a way to walk variables. This means looking up variables in the substitution dictionary. If it points to another variable, look up that variable and repeat until you either get to a term or it no longer points to anything.

Examples:

1
2
3
4
walk('a', {}) == 'a'  # empty, return the original Var
walk('a', {'a': 5}) == 5  # return the term it points to
walk('a', {'a': 'b', 'b': 7}) == 7  # follow the pointers and return the term it points to
walk('a', {'a': 'b', 'b': 'c'}) == 'c'  # this one points to another Var

Implementation:

1
2
3
4
def walk(t: term, subs: sub_dict) -> term:
    if isinstance(t, Var) and t in subs:
        return walk(subs[t], subs)
    return t

Streams

We’ve already seen that a problem could have multiple solutions, where each solution is represented as a State object. A stream is that list of solutions:

stream = list[State]

Goals

With all that out of the way, we’re ready to start talking about the structure of the four primitives we saw earlier: fresh, eq, conj, and disj. We want all four of these to be composable so we can arrange them however we’d like. The output of each of these functions also needs to be callable, so they can be called with different States. The output of that callable should be a stream of results. A possible definition of the goal type is:

goal = Callable[[State], stream]

This will hopefully make sense as we get into definitions of the four primitives and try to compose them.

Eq & Unify

The first primitive to look at is eq. The job of eq is to tell us if we can unify two terms, that is to say if we can declare that they can be equal given our current knowlege of the substitutions. For example Don’t worry about the calls to fresh and conj for now, just remember fresh creates variables and conj is the conjuntion, aka “AND” :

1
2
3
4
5
6
7
8
9
    g = fresh(lambda q: eq(q, 'hello'))
    result = g(State.empty())
    assert result == [State({'q#1': 'hello'}, Counter(q=1))]

    g2 = fresh(lambda q: conj(
        eq(q, 'hello'),
        eq(q, 'fail me')))
    result2 = g2(State.empty())
    assert result2 == []

On lines 1-2 we asked for the stream of results where the variable q is equal to 'hello'. This has one solution, where we declare that q = 'hello'.

On lines 5-7 we ask for the stream of results where q is equal to 'hello', but also to 'fail me'. In this case, it is impossible to reconcile the two; q cannot be equal to 'fail me' when it is already equal to 'hello'. There are no possible substitutions that satisfy the constraints we’ve given, so we expect the empty stream as a result.

Our definition of eq is as follows:

1
2
3
4
5
def eq(u: term, v: term) -> goal:
    def _eq(state: State) -> stream:
        subs = unify(u, v, state.subs)
        return [State(subs, state.var_counter)] if subs else []
    return _eq

Remember, we want our primitives to return goals, which are functions that accept a State and return a stream. eq returns its inner function _eq that accepts a State object as an argument, and returns a new stream containing the State with the result of attempting to unify the two terms with the sub_dict. The real comparison happens in the unify function.

We are going to use a very simple implementation of unify. For a more in depth explanation, Peter Norvig norvig.com has written about a more sophisticated Prolog swi-prolog.org implementation in PAIP “Paradigms of Artificial Intelligence Programming”, 1992 - see chapter 11.2 for unification . Others have elaborated on it inaimathi.ca/posts/how-unification-works as well.

Here is our definition of unify, followed by an explanation of what’s going on:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def unify(u: term, v: term, s: sub_dict) -> Optional[sub_dict]:
    u_target = walk(u, s)
    v_target = walk(v, s)

    if isinstance(u_target, Var) and isinstance(v_target, Var) and u_target == v_target:
        return s
    elif isinstance(u_target, Var):
        return extend_subs(u_target, v_target, s)
    elif isinstance(v_target, Var):
        return extend_subs(v_target, u_target, s)
    elif u_target == v_target:
        return s

    return None

The first thing we do is walk both terms. If they have already been defined as being equal to something else, this will tell us. We now want to see if the terms they walked to can be equal, so we check the the 5 possible cases:

  1. u and v both point to the same Var. This is fine, our sub_dict is good as it is and we don’t need to extend it. Return the sub_dict unchanged.
  2. u points to a Var, v points to either a different Var or some arbitrary term. This is also fine, but we need to extend the sub_dict with the knowledge that u and v are now equal.
  3. This is the same as (2) above, but with u and v swapped.
  4. u and v both walk to the same arbitrary term. Like (1), this is fine and we can again return the sub_dict as it is.
  5. If none of the above are true, it means that u and v both walked to different terms. This is a failure, and means that there is no possible way to equate u and v. In this case, return None as there are no solutions.

Fresh

fresh is our second primitive, it is how we create new variables. Our definition of fresh is another place where we deviate from the original paper; in the paper, fresh only creates one variable at a time and stores them as numeric IDs. This implementation allows you to define multiple variables at once and stores them as human readable names for ease of understanding the outputted solutions.

We’ve seen fresh before, but here’s a new example:

1
2
3
4
5
6
    g = fresh(lambda x, y:
              conj(eq(x, 'hello'),
                   eq(y, x)))

    result = g(State.empty())
    assert result == [State({'x#1': 'hello', 'y#1': 'hello'}, Counter(x=1, y=1))]

Here, we’ve asked fresh to create two variables for us, x and y, so it’s extracted the variable names from the lambda and used them as strings. One thing to note is that fresh returns new variables everytime it’s called, so what happens if you use the same variable name multiple times? In the papers implementation, this isn’t a problem because it’s uses an auto incrementing counter as the variable, but we might have name collisions. We’d like our implementation of fresh to be able to account for this by tweaking the variable name to avoid collisions:

1
2
3
4
5
g = conj(
    fresh(lambda x: eq(x, 'hello')),
    fresh(lambda x: eq(x, 'bye')))
result = g(State.empty())
assert result == [State({'x#1': 'hello', 'x#2': 'bye'}, Counter({'x': 2}))]

There were two calls to fresh asking to create a variable x, but the second call should see that there’s already an x and update the name to x#2.

Here is the implementation of fresh:

1
2
3
4
5
6
7
8
def fresh(f: Callable[[VarArg(term)], goal]) -> goal:
    def _fresh(state: State) -> stream:
        term_names = [str(term) for term in inspect.signature(f).parameters]
        state.var_counter.update(term_names)
        terms = [Var(f"{term}#{state.var_counter[term]}") for term in term_names]

        return f(*terms)(state)
    return _fresh

Line 3 uses pythons inspect library inspect documentation to get the names of the arguments of the lambda to use as variable names. Lines 3 & 4 check every variable name for collisions with existing variables in our State, update the counter for the new variables, and create the new Vars with updated names. We then call the lambda with our new terms. The body of the lambda should return a goal. We then call that goal with the updated State, and that completes the job of fresh.

Disj

disj is the third of our primitives. It is the disjunction, the logical OR of two or more goals. Unlike the usual boolean OR we’re used to, we can’t have falsy states, we can only have valid states that meet the constraints we’ve set up or we can have no states at all. Since any state that goes into a disj is valid, all we need to do is return all the states that disj computes by applying the state to the goals it received.

The original paper only accepted two arguments for disj and conj, we support an arbitrary number here that we reduce.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def test_disj():
    g = fresh(lambda a: disj(eq(a, 'hello'),
                             eq(a, 'hi'),
                             eq(a, 12)))
    result = g(State.empty())
    assert result == [
        State({'a#1': 'hello'}, Counter(a=1)),
        State({'a#1': 'hi'}, Counter(a=1)),
        State({'a#1': 12}, Counter(a=1))
    ]

Implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def disj(*goals: goal) -> goal:
    if len(goals) < 2:
        raise Exception("disj needs 2 or more goals")

    def _disj(state: State) -> stream:
        streams = [g(state) for g in goals]
        return reduce(mplus, streams)
    return _disj

def mplus(s1: stream, s2: stream) -> stream:
    return s1 + s2

disj takes two or more goals as arguments. First it checks that it really did receive enough goals as arguments, then it returns its inner goal function. That inner functio evaluates all the goals it’s been passed with the current state, then combines all the lists of solutions using the mplus function.

Conj

The last primitive is conj, the conjunction of two of more goals This is the part of this paper that confused me the most by far. Hopefully I can do a better job explaining this, but it didn’t really make sense to me until I sat down and stepped through a bunch of examples on paper until I could convince myself this really works so if it’s not clear by the end of this explanation I’d encourage you to do the same. . What we want this to mean with streams of results is that the conj of several streams succeeds if there is a State in each stream such that they can all be valid at the same time. We’re interested in finding every possible set of States where that holds true.

Example:

1
2
3
4
5
6
7
8
def test_conj():
    g = conj(
        fresh(lambda a: eq(a, 'hi')),
        fresh(lambda b: eq(b, 42)),
        fresh(lambda c: eq(c, 'hello')))
    result = g(State.empty())
    assert result == [State({'a#1': 'hi', 'b#1': 42, 'c#1': 'hello'},
                            Counter(a=1, b=1, c=1))]

The way we want to accomplish this is by taking our first goal and computing all of it’s results. We then want iterate over those results, and try to evaluate our second goal using each result as the state.

Implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def conj(*goals: goal) -> goal:
    if len(goals) < 2:
        raise Exception("conj needs 2 or more goals")

    def _conj(state: State) -> stream:
        return reduce(bind, goals[1:], goals[0](state))
    return _conj

def bind(s: stream, g: goal) -> stream:
    if not s:
        return []
    else:
        return mplus(g(s[0]), bind(s[1:], g))

This implemenatation is a bit trickier. conj starts off like disj by making sure it has enough goals. The inner function is again reducing over all the goals, but this time we start by taking the first goal and computing its output stream (that’s the third argument of reduce on line 6). We then apply bind to that stream of results and the next goal, and reduce all the goals over this until we’re left with a single stream of results.

bind is where the actual conjing happens. First we check if the stream has any values in it. You can think of conjunction like a set intersection here, the conjunction / intersection of an empty set of results with any other set of results is still the empty set, so we return an empty list []

Then on line 13, we calculate g(s[0]), the goal applied to the first result in the stream, and bind(s[1:], g) which recursively calls bind again with all the results in stream except the first one and the same goal g. We then combine these lists of results with mplus. This way, we run goal g against all the results in stream s to get all the possible pairs of results.

If we had three or more goals to conj, the reduce will take that stream of results and continue to bind them with the rest of the goals, and return the final stream.

Conclusion & Future Posts

With those four primitives, you should be able to get some very basic examples running. Unfortunately, we’ve only covered about half the paper so far - I was hoping to go over the entire thing but this was getting a little long and if I kept going I was afraid I’d have to take “Gentle” out of the title.

In an eventual part two, I’d like to go over the rest of the paper and cover the changes you have to make to able to handle infinite streams and stream interleaving, as well as modifying this interpreter to use pythons iterators Iterator Type documentation so the entire interpreter can be lazy.