Skip to content

Conversation

@msftrncs
Copy link
Contributor

@msftrncs msftrncs commented Feb 4, 2019

This should fix #82 (the first half of microsoft/vscode#65983) and #49, and maybe others.

I save a hasAdvanced flag to the stack (StackElement) when pushing a rule and when popping the rule I use this flag to reset the anchorPos (in _tokenizeLine) back to -1 to disable further anchor matching.

I have been running in to similar issues trying to change the PowerShell/EditorSyntax grammer, and I believe that this is the correct behavior. I have only so far 'inspect'ed my PowerShell with this change and it seems to work how I think it should. I will test at another location the pug, and if it still exists, the HTML issue as well.

I do not know what affect this might have on projects dependent on this one.

@msftrncs
Copy link
Contributor Author

I have tested my grammar using Atom, and I have no problems with it there, indicating their tokenizer must reset the match anchor point after a rule pop.

I have also 'inspect'ed both the #82 and the #49 issues and it resolves them the same as it resolves my issues in my PowerShell grammar.

@msftrncs
Copy link
Contributor Author

msftrncs commented Feb 19, 2019

@egamma, @alexandrudima, any chance this could get looked at yet during this update cycle?

The reason being, in addition to the issues detailed above and those detailed as duplicated in them, as I keep working on a rewrite of the PowerShell grammar, I keep running in to this. In many cases I can work around it, but in all the cases I go over to Atom, and the issue does not occur.

Current issue:

{
	"comment": "could be a numeric constant or a limited list of unary operators, that we should switch to expression mode",
	"begin": "(?=[\\d.!+\\x{2013}-\\x{2015}-])",
	"end": "$(?=\\n)|(?=[;|)}\\]])",
	"patterns": [
		{
			"include": "#numericConstant"
		},
		{
			"include": "#operators_preUnary"
		},
		{
			"comment": "if numeric constant advanced, switch to expression mode",
			"begin": "(?!\\G)(?!$(?=\\n)|(?=[;|)}\\]]))",
			"end": "$(?=\\n)|(?=[;|)}\\]])",
			"patterns": [
				{
					"include": "#expression_mode"
				}
			]
		},
		{
			"comment": "if numeric constant didn't advanced, try finishing command mode with a command name",
			"begin": "\\G(?!$(?=\\n)|(?=[;|)}\\]]))",
			"end": "$(?=\\n)|(?=[;|)}\\]])",
			"patterns": [
				{
					"include": "#command_name"
				}
			]
		}
	]
},

The purpose of this is to select between what might be a numeric value, which in PowerShell is a fairly complicated set of regex, or the beginning of a command name that just looks like a numeric value. The problem is that a numeric value also uses a begin-end rule, so that it can also check for special operator circumstances, so when it returns, the anchor is left sitting at the end of the numeric value, even though it 'popped' to return. Instead of the rule with the negative lookahead for \G matching, the non-negative \G rule matches and sends the grammar to the command name path.

JSON of this grammar can be found: https://github.com/msftrncs/PowerShell.tmLanguage/tree/argumentmode_2ndtry

Pattern that scopes erroneously:

($i=-5kb, $c, ${true} )
($i=-5kb-$c, ${true} )

In both lines, the $c should scope as a variable, but instead scopes as a function name (hence command name). In the first line , should have scoped as a separator, but instead is glanced over by command name since its an invalid character for a command name, but command name doesn't return (end) and has no other matches so it finds the first thing valid (since the original point should have matched a command name had it not matched a numeric value). In the second line, the - gets caught by a begin-end rule in the numeric value rule, that looks for some other possibilities, but finds nothing and pops, but the anchor point is now set at its end, and so the same thing happens when we return back to the rule shown above.

I am not sure I can find any other reliable workaround in this particular case, though I have ultimately worked around every single other places with this sort of thing has occurred, but that's just making the regex more complicated, in each case that a work around can be made reliable, and I don't think that is the route that should be taken, as it seems the other TextMate compatible editors do not require these workarounds.

If not in this update cycle, maybe shortly afterwards? The wildcard I am unclear about is if this solution, which modifies the constructor for StackElement would potentially upset any downstream projects, and maybe needs to be implemented differently.

@alexdima
Copy link
Member

alexdima commented Apr 4, 2019

@msftrncs Thank you for the great analysis and the PR!

I have decided to go in a similar direction, but with a simplification:

  • capture the anchor position value when a stack element is pushed.
  • when a stack element is popped, restore the anchor position value from the popped stack element.

This has as a "side effect" the desired behaviour that the anchor position is not matched if tokenization has advanced. IMHO, this capturing/restoring is easier to comprehend.

@msftrncs
Copy link
Contributor Author

msftrncs commented Apr 5, 2019

Thanks for getting to this. I am sure there are lots more ways to tackle the problem. I was worrying about a rule pop occurring on a different line than the push did, and how to detect that, so I stayed away from pushing the actual anchor position.

@msftrncs
Copy link
Contributor Author

msftrncs commented Apr 5, 2019

Ah, I had looked over the stackelement.reset() method being used to reset the previous element each time a new line is tokenized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Grammar parser does not correctly recognize pug rule

2 participants