If you're actually looking for examples of how to do this in a game, look up Event[0], which has a text interface to an AI.
Also, if you're parsing an English sentence, you don't want to read it from left to write, you want to evaluate the syntactic dependencies or phrase structure grammar. The most basic form of this is working out the verb and the subject and object. Which is why the earliest text adventure parsers worked with two-word commands: an action, an object to perform the action on. (The subject was implied: You wanted to do the thing.)
NLTK, SpaCy, OpenNLP, or the equivalent libraries for your programming environment have already implemented solutions for this that handle more complex sentences. As have the open-source chatbot platforms.They can even work without punctuation, if that's what you really want. (Or, going the other way, there's word-classification networks trained on Twitter and Reddit that recognize emoji.)
And as has been said above, remembering what happened last session just needs serialization: save what you've got in memory to disk, load it from disk next time.