what action space can the model handles or are there any demo showing the case that can predict a type text action or scroll down action?