PatternSynonyms for expressive code
27 Aug 2019Patterns
Patterns are ubiquitous in Haskell code; it is one of the first things you learn as you get familiar with the language. Almost every piece of Haskell code will have a pattern match construct in a function definition or case expression. However, they have some limitations as well.
As an example, let us look at how to represent identifiers:
import qualified Data.Text as Text
import Data.Text (Text)
newtype Identifier = Identifier { unIdentifier :: Text }
deriving Eq
We can construct values of type Identifier
with the Identifier
data constructor (e.g. Identifier "foo"
). You can
deconstruct an identifier in two ways:
identifierLength1 :: Identifier -> Int
identifierLength1 = Text.length . unIdentifier
identifierLength2 :: Identifier -> Int
identifierLength2 (Identifier s) = Text.length s
Having both a record field accessor and a pattern matching syntax is very useful. We can pick the best one depending on the usage.
A curious case
How would you implement case-insensitive identifiers? In such a case, Identifier "foo"
and Identifier "Foo"
are
considered equal. We must retain the original case in the data type though. Many other use cases - such as displaying
the identifier with the original input case - require that.
Well, it turns out not too complicated. We can do this:
import Data.Function(on)
newtype Identifier = Identifier { unIdentifier :: Text }
instance Eq Identifier where
(==) = (==) `on` (Text.toCaseFold . unIdentifier)
This is as expressive as the previous implementation. We still have the same data constructor and pattern matching at
the usage sites. It has one serious drawback though - performance. Text.toCaseFold
is evaluated for each equality
check which makes this a slow operation.
A performance fix
Here is one way to solve the performance problem:
data Identifier = Identifier
{ unIdentifier :: Text
, folded :: Text
}
makeIdentifier :: Text -> Identifier
makeIdentifier s = Identifier
{ unIdentifier = s
, folded = Text.toCaseFold s
}
instance Eq Identifier where
(==) = (==) `on` folded
This moves the expensive Text.toCaseFold
to record construction so that we don’t pay a penalty on each ==
evaluation. This covers the common use case where the identifier is constructed once and then the equality check is
performed many times on it. In fact, this is the implementation used by the case-insensitive
package.
However, we lost the nice symmetric data constructor and pattern match we had in the previous solution. We want to hide
the internal details of Identifier
record from the usage sites. So we are forced to provide a smart constructor
makeIdentifier
instead and hide the Identifier
data constructor. This also means we cannot pattern match on the
Identifier
data constructor anymore. The only way to deconstruct an identifier is via the unIdentifier
function
which is less expressive than the previous solution.
PatternSynonyms to the rescue
This is where pattern synonyms come in. We can use a pattern synonym to define this “missing” pattern. Here is the syntax:
{-# LANGUAGE PatternSynonyms #-}
data Identifier = MkIdentifier
{ unIdentifier :: Text
, folded :: Text
}
instance Eq Identifier where
(==) = (==) `on` folded
pattern Identifier :: Text -> Identifier
pattern Identifier s <- MkIdentifier s _ where
Identifier s = MkIdentifier s (Text.toCaseFold s)
There are two components to this pattern synonym. First, we define how to use Identifier
in pattern matches. The
part:
pattern Identifier s <- MkIdentifier s _
tells GHC to replace a pattern match on Identifier s
with MkIdentifier s _
. For e.g., a case expression
like this:
case ident of
Identifier s -> ...
gets translated to:
case ident of
MkIdentifier s _ -> ...
The second part:
Identifier s = MkIdentifier s (Text.toCaseFold s)
tells GHC to replace all occurrences of Identifier s
in expressions with the corresponding MkIdentifier s
(Text.toCaseFold s)
expression. For e.g., a let expression like this:
let ident = Identifier s
in ...
gets translated to:
let ident = MkIdentifier s (Text.toCaseFold s)
in ...
Here is how the complete solution:
module Identifiers
( Identifier
, pattern Identifier
) where
import Data.Text (Text)
import qualified Data.Text as Text
import Data.Function (on)
data Identifier = MkIdentifier
{ unIdentifier :: Text
, folded :: Text
}
instance Eq Identifier where
(==) = (==) `on` folded
pattern Identifier :: Text -> Identifier
pattern Identifier s <- MkIdentifier s _ where
Identifier s = MkIdentifier s (Text.toCaseFold s)
The MkIdentifier
data constructor and all its internal implementation details are hidden away but the usage sites
still can use the nice Identifier s
patterns for data construction and pattern matching. Clearly, this is a good
improvement from the previous solution and retains all the performance benefits of it.
Note that there are other ways to declare pattern synonyms. See GHC documentation for more details. You might also be interested in the Data.Sequence module that makes good use of pattern synonyms along with the ViewPatterns extension.