New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to encode *NoMatch* after consumed some input in a parser #429
Comments
There is some documentation about the semantics of
From my own experiments it seems that once you consume some input you have committed to the branch. If you want to back out you have to use the backtracking But you could use > either (error . errorBundlePretty @_ @Void) id $ parse (char 'a' *> char 'b' <|> char 'c') "" "ac"
*** Exception: 1:2:
|
1 | ac
| ^
unexpected 'c'
expecting 'b'
CallStack (from HasCallStack):
error, called at <interactive>:12:9 in interactive:Ghci2 Can be "fixed" by: > either (error . errorBundlePretty @_ @Void) id $ parse (withRecovery (\_ -> char 'c') (char 'a' *> char 'b')) "" "ac"
'c' |
@noughtmare Thanks, and is there already some version of moduleDecl :: Parser ModuleDecl
moduleDecl = do
moduCmt <- docComments
arts <- manyTill artifactDecl eof
return (moduCmt, arts) So I can simply replace |
An uneducated guess: is it possible for megaparsec/Text/Megaparsec/Internal.hs Lines 202 to 205 in ccf314b
Failing semantic can still be obtained by using It is |
I'm not quite sure what kind of semantics you want this |
I mean whitespaces found to be associated with no syntactic element. I know Megaparsec's idiom is to let some lexeme to consume (discard) any whitespace following it, but in my case as described in #428, the doc comment is meaningful when found immediately before a lexeme, but should be treated as block comments if all following it is whitespaces till eof. |
@complyue You control backtracking with https://markkarpov.com/tutorial/megaparsec.html It should be helpful for understanding how backtracking works in Megaparsec. |
You could always encode "no match" by doing something like this: myParser' = try myParser <|> return NoMatch |
@mrkkrp I realize that
A concrete example is here https://github.com/complyue/dcp/blob/498916f73afbe2b7bbf6105da352a9c99fe88707/src/Parser.hs#L142-L145 artifactDecl :: Parser ArtDecl
artifactDecl = lexeme $ do
artCmt <- optional immediateDocComment
atEnd >>= \case
True -> empty
False -> do
artBody <- takeWhileP (Just "artifact body")
(not . flip elem (";{" :: [Char]))
if T.null $ T.strip artBody
then empty -- this is not possible in real cases
else return (artCmt, artBody) I want |
So why don’t you just return some value instead of using empty? |
I would prefer to have the parser type ( |
I just come to better understanding of the issue, and seems it can be addressed as well as https://github.com/complyue/dcp/pull/3/files , like complyue/dcp@f0c12d5 or complyue/dcp@2600179 , they are very similar. But I'm still wondering, this really unaddressable from the combinator lib's design space? Why? |
With help from Olaf, I think I just cracked my confusion now, that I failed Now I understand it clearly that parsers can be concatenated only if manyTill' :: Monoid a => Parser a -> Parser end -> Parser a
manyTill' p end = go where go = (mempty <$ end) <|> liftA2 mappend p go Then I can tune my result types, and implement my parsers like: type ArtDecl = (Maybe DocComment, Text)
data ModuleDecl m = (Applicative m, Monoid (m ArtDecl)) =>
ModuleDecl (Maybe DocComment) (m ArtDecl)
moduleDecl :: (Applicative m, Monoid (m ArtDecl)) => Parser (ModuleDecl m)
moduleDecl = lexeme $ do
sc
moduCmt <- optional docComment
arts <- manyTill' (scWithSemiColons >> artifactDecl) eof
return $ ModuleDecl moduCmt arts
artifactDecl :: (Applicative m, Monoid (m ArtDecl)) => Parser (m ArtDecl)
artifactDecl = lexeme $ do
artCmt <- optional immediateDocComment
choice
[ eof >> return mempty
, do
artBody <- takeWhileP (Just "artifact body")
(not . flip elem (";{" :: [Char]))
return $ pure (artCmt, artBody)
] This approach clears my doubts, though doesn't seem more ergonomic compared to approaches like or Well, then I know my options now. |
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 2.3.4. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v2...v2.3.4) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
As I observed (have no idea where a canonical definition could live for this) that for the contract of parser combinators:
fail
with no input consumed will have next alternative to be tried with possible successfail
with some input consumed will err out immediately regardless rest alternativesAm I right about these rules? Where is the official specification of such rules?
Then I guess that
empty
result of a parser could mean NoMatch even after consumed some input, and want to leverage this semantic, but encountered error with current implementation https://github.com/complyue/dcp/blob/5be688396b7e2bda00ea80fd99d2a7b3ec5c788d/src/Parser.hs#L138-L146So how can I achieve that?
Background is I'm trying to prototype an implementation of doc comment parsing as described at #428
The text was updated successfully, but these errors were encountered: