Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to encode *NoMatch* after consumed some input in a parser #429

Closed
complyue opened this issue Oct 25, 2020 · 12 comments
Closed

How to encode *NoMatch* after consumed some input in a parser #429

complyue opened this issue Oct 25, 2020 · 12 comments
Labels

Comments

@complyue
Copy link

As I observed (have no idea where a canonical definition could live for this) that for the contract of parser combinators:

  • fail with no input consumed will have next alternative to be tried with possible success
  • fail with some input consumed will err out immediately regardless rest alternatives

Am I right about these rules? Where is the official specification of such rules?

Then I guess that empty result of a parser could mean NoMatch even after consumed some input, and want to leverage this semantic, but encountered error with current implementation https://github.com/complyue/dcp/blob/5be688396b7e2bda00ea80fd99d2a7b3ec5c788d/src/Parser.hs#L138-L146

artifactDecl :: Parser ArtDecl
artifactDecl = do
  artCmt <- immediateDocComments
  (eof >> empty) <|> do
    artBody <- takeWhileP (Just "artifact body")
                          (not . flip elem (";{" :: [Char]))
    if T.null $ T.strip artBody
      then empty -- this is not possible in real cases
      else return (artCmt, artBody)
$ cabal run dcp < samples/full.txt
Up to date
dcp: 73:1:
   |
73 | <empty line>
   | ^
expecting "{#", "{##", or artifact body

CallStack (from HasCallStack):
  error, called at src/Parser.hs:151:14 in main:Parser

So how can I achieve that?

Background is I'm trying to prototype an implementation of doc comment parsing as described at #428

@noughtmare
Copy link

noughtmare commented Oct 25, 2020

There is some documentation about the semantics of <|> here: https://hackage.haskell.org/package/megaparsec-9.0.0/docs/Text-Megaparsec.html#t:ParsecT. Next to the Alternative instance it says:

empty is a parser that fails without consuming input.

From my own experiments it seems that once you consume some input you have committed to the branch. If you want to back out you have to use the backtracking try function.

But you could use withRecovery. E.g.

> either (error . errorBundlePretty @_ @Void) id $ parse (char 'a' *> char 'b' <|> char 'c') "" "ac"
*** Exception: 1:2:
  |
1 | ac
  |  ^
unexpected 'c'
expecting 'b'

CallStack (from HasCallStack):
  error, called at <interactive>:12:9 in interactive:Ghci2

Can be "fixed" by:

> either (error . errorBundlePretty @_ @Void) id $ parse (withRecovery (\_ -> char 'c') (char 'a' *> char 'b')) "" "ac"
'c'

@complyue
Copy link
Author

@noughtmare Thanks, and is there already some version of many to just ignore some failures (better being discriminative to not hide e.g. syntax errors) ?

moduleDecl :: Parser ModuleDecl
moduleDecl = do
  moduCmt <- docComments
  arts    <- manyTill artifactDecl eof
  return (moduCmt, arts)

So I can simply replace manyTill artifactDecl eof as in above?

@complyue
Copy link
Author

complyue commented Oct 25, 2020

An uneducated guess: is it possible for empty of ParsecT's Alternative instance to have a separate implementation from mzero, which doesn't fail but silently contributes nothing when combined with others?

-- | 'empty' is a parser that __fails__ without consuming input.
instance (Ord e, Stream s) => Alternative (ParsecT e s m) where
empty = mzero
(<|>) = mplus

Failing semantic can still be obtained by using mzero I'd think.

It is MonadPlus defines failure semantic while Alternative (or Monoid) doesn't. So far as I understand it, whether empty / mempty should trigger destined failure, is up to the combinator library implementation to decide, is it?

@noughtmare
Copy link

I'm not quite sure what kind of semantics you want this manyTill replacement to have. You say you want to ignore some failures, but which exactly?

@complyue
Copy link
Author

@noughtmare

You say you want to ignore some failures, but which exactly?

I mean whitespaces found to be associated with no syntactic element. I know Megaparsec's idiom is to let some lexeme to consume (discard) any whitespace following it, but in my case as described in #428, the doc comment is meaningful when found immediately before a lexeme, but should be treated as block comments if all following it is whitespaces till eof.

@mrkkrp
Copy link
Owner

mrkkrp commented Oct 30, 2020

@complyue You control backtracking with try. I have the impression that you have not read the tutorial:

https://markkarpov.com/tutorial/megaparsec.html

It should be helpful for understanding how backtracking works in Megaparsec.

@mrkkrp
Copy link
Owner

mrkkrp commented Oct 30, 2020

You could always encode "no match" by doing something like this:

myParser' = try myParser <|> return NoMatch

@mrkkrp mrkkrp closed this as completed in 7391c09 Oct 30, 2020
@complyue
Copy link
Author

complyue commented Nov 1, 2020

@mrkkrp I realize that NoMatch is not accurate for my intent, I actually want the semantic to be expressible that:

  • no fail (but subsequent parsers should be tried in sequence)
  • no backtracking (leave consumed input consumed)
  • contributing empty result to outer many or a similar combinator

A concrete example is here https://github.com/complyue/dcp/blob/498916f73afbe2b7bbf6105da352a9c99fe88707/src/Parser.hs#L142-L145

artifactDecl :: Parser ArtDecl
artifactDecl = lexeme $ do
  artCmt <- optional immediateDocComment
  atEnd >>= \case
    True  -> empty
    False -> do
      artBody <- takeWhileP (Just "artifact body")
                            (not . flip elem (";{" :: [Char]))
      if T.null $ T.strip artBody
        then empty -- this is not possible in real cases
        else return (artCmt, artBody)

I want True -> empty to let artifactDecl evaluate to sth with the semantic said above, and obviously empty is not the correct device here.

@mrkkrp
Copy link
Owner

mrkkrp commented Nov 1, 2020

So why don’t you just return some value instead of using empty?

@complyue
Copy link
Author

complyue commented Nov 2, 2020

I would prefer to have the parser type (Parser ArtDecl) simple, i.e. without adding another layer of monad transformer or similar abstraction beyond ArtDecl. As I sense it, using mempty as Monoid is designed can achieve that, can I anticipate that? Or what is making that impossible?

@complyue
Copy link
Author

complyue commented Nov 2, 2020

I just come to better understanding of the issue, and seems it can be addressed as well as https://github.com/complyue/dcp/pull/3/files , like complyue/dcp@f0c12d5 or complyue/dcp@2600179 , they are very similar.

But I'm still wondering, this really unaddressable from the combinator lib's design space? Why?

@complyue
Copy link
Author

complyue commented Nov 5, 2020

With help from Olaf, I think I just cracked my confusion now, that I failed
to distinguish between the 2 monoid structures of concatenation and
choice, and have wrongly assumed somehow Alternative can do
concatenation at the same time as choice making.

Now I understand it clearly that parsers can be concatenated only if
their result type is concatenatable, so if the combinator lib is to
address my demand, it can define maynTill' with the parser type
fixed like:

manyTill' :: Monoid a => Parser a -> Parser end -> Parser a
manyTill' p end = go where go = (mempty <$ end) <|> liftA2 mappend p go

Then I can tune my result types, and implement my parsers like:

type ArtDecl = (Maybe DocComment, Text)
data ModuleDecl m = (Applicative m, Monoid (m ArtDecl)) =>
  ModuleDecl (Maybe DocComment) (m ArtDecl)

moduleDecl :: (Applicative m, Monoid (m ArtDecl)) => Parser (ModuleDecl m)
moduleDecl = lexeme $ do
  sc
  moduCmt <- optional docComment
  arts    <- manyTill' (scWithSemiColons >> artifactDecl) eof
  return $ ModuleDecl moduCmt arts

artifactDecl :: (Applicative m, Monoid (m ArtDecl)) => Parser (m ArtDecl)
artifactDecl = lexeme $ do
  artCmt <- optional immediateDocComment
  choice
    [ eof >> return mempty
    , do
      artBody <- takeWhileP (Just "artifact body")
                            (not . flip elem (";{" :: [Char]))
      return $ pure (artCmt, artBody)
    ]

https://github.com/complyue/dcp/blob/39039e62db5f63e61814383ffc2c5f4fba240806/src/Parser.hs#L128-L152

This approach clears my doubts, though doesn't seem more ergonomic compared to approaches like

complyue/dcp@2600179

or

complyue/dcp@f0c12d5

Well, then I know my options now.

tomjaguarpaw pushed a commit to tomjaguarpaw/megaparsec that referenced this issue Sep 29, 2022
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 2.3.4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v2...v2.3.4)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants