Workshop on Interactive Learning for Natural Language Processing

Background & References

What is interactive NLP?

Interactive machine learning [IML; 7] studies algorithms that learn from data collected through interaction with a computational agent or human in a shared environment, through feedback on model decisions. In contrast to the common paradigm of supervised learning, IML does not assume access to pre-collected labeled data, thereby decreasing data costs. Instead, it allows systems to improve over time, empowering non-expert users to provide feedback. IML has seen wide success in areas such as video games [24] and recommendation systems [13].

What are current approaches?

Although most downstream applications of NLP involve interactions with humans—e.g., via labels, demonstrations, corrections, or evaluation—common NLP models are not built to learn from or adapt to users through interaction. There remains a large research gap that must be closed to enable NLP systems that adapt on-the-fly to the changing needs of humans and dynamic environments through interaction.
While still understudied, there is growing interest in NLP models that learn through interaction, especially when considering most recent developments with large language models [8]. Some systems learn with computational agent or human feedback in the form of low-level labels using techniques such as active learning [12], imitation learning [3, 23], pairwise feedback [23, 5], preference learning [25], contextual bandits [10, 9, 26], and reinforcement learning [17, 27, 22]. The most popular form of feedback recently is reinforcement learning from human feedback (RLHF) [21, 1, 2, 20, 17, 30, 27, 31], which has lead to the development of ChatGPT [20]. In contrast to these low-level label feedback, natural language feedback systems aim to leverage the full expressivity of natural language to handle nuanced and expressive feedback [14]. Types of natural language feedback studied include explanations [16], advice [15, 18, 28], instructions [29, 11], descriptions [6, 19], and prompts [4].

References

  1. Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv preprint arXiv:2204.05862, 2022.
  2. M. Bakker, M. Chadwick, H. Sheahan, M. Tessler, L. Campbell-Gillingham, J. Balaguer, N. McAleese, A. Glaese, J. Aslanides, M. Botvinick, et al. Fine-tuning Language Models to Find Agreement among Humans with Diverse Preferences. Proceedings of Neural Information Processing Systems, 2022.
  3. K. Brantley, A. Sharaf, and H. Daumé, III. Active imitation learning with noisy guidance. In The 58th annual meeting of the Association for Computational Linguistics. ACL, 2020.
  4. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
  5. A. Chen, J. Scheurer, T. Korbak, J. A. Campos, J. S. Chan, S. R. Bowman, K. Cho, and E. Perez. Improving code generation by training with natural language feedback. arXiv preprint arXiv:2303.16749, 2023.
  6. C. Colas, T. Karch, N. Lair, J.-M. Dussoux, C. Moulin-Frier, P. Dominey, and P.-Y. Oudeyer. Language as a cognitive tool to imagine goals in curiosity driven exploration. NeurIPS, 33:3761–3774, 2020.
  7. J. A. Fails and D. R. Olsen Jr. Interactive machine learning. In Proceedings of the 8th international conference on Intelligent user interfaces, pages 39–45, 2003.
  8. P. Fernandes, A. Madaan, E. Liu, A. Farinhas, P. H. Martins, A. Bertsch, J. G. de Souza, S. Zhou, T. Wu, G. Neubig, et al. Bridging the gap: A survey on integrating (human) feedback for natural language generation. arXiv preprint arXiv:2305.00955, 2023.
  9. G. Gao, H.-T. Chen, Y. Artzi, and E. Choi. Continually improving extractive qa via human feedback. arXiv preprint arXiv:2305.12473, 2023.
  10. G. Gao, E. Choi, and Y. Artzi. Simulating bandit learning from user feedback for extractive question answering. arXiv preprint arXiv:2203.10079, 2022.
  11. N. Kojima, A. Suhr, and Y. Artzi. Continual learning for grounded instruction generation by observing human following behavior. TACL, 9:1303–1319, 2021.
  12. J.-U. Lee, C. M. Meyer, and I. Gurevych. Empowering active learning to jointly optimize system and user demands. In The 58th annual meeting of the Association for Computational Linguistics, pages 4233–4247, July 2020.
  13. L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670, 2010.
  14. A. A. Lipnevich and J. K. Smith. Effects of differential feedback on students examination performance. Journal of Experimental Psychology: Applied, 15(4):319, 2009.
  15. N. Mehta and D. Goldwasser. Improving natural language interaction with robots using advice. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1962–1967. ACL, June 2019.
  16. S. Murty, P. W. Koh, and P. Liang. ExpBERT: Representation engineering with natural language explanations. In The 58th annual meeting of the Association for Computational Linguistics. ACL, 2020.
  17. R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saunders, et al. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  18. K. Nguyen and H. Daumé III. Help, anna! visual navigation with natural multimodal assistance via retrospective curiosity-encouraging imitation learning. arXiv preprint arXiv:1909.01871, 2019.
  19. K. X. Nguyen, D. Misra, R. Schapire, M. Dudík, and P. Shafto. Interactive learning from activity description. In International Conference on Machine Learning, pages 8096–8108. PMLR, 2021.
  20. OpenAI. https://openai.com/blog/chatgpt, 2023.
  21. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training Language Models to Follow Instructions with Human Feedback. In Proceedings of Neural Information Processing Systems, 2022.
  22. J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023.
  23. J. Scheurer, J. A. Campos, T. Korbak, J. S. Chan, A. Chen, K. Cho, and E. Perez. Training language models with language feedback at scale. arXiv preprint arXiv:2303.16755, 2023.
  24. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  25. E. Simpson, Y. Gao, and I. Gurevych. Interactive text ranking with Bayesian optimisation: A case study on community QA and summarisation. arXiv preprint arXiv:1911.10183, 2019.
  26. A. Sokolov, S. Riezler, and T. Urvoy. Bandit structured prediction for learning from partial feedback in statistical machine translation. arXiv preprint arXiv:1601.04468, 2016.
  27. N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, and P. F. Christiano. Learning to summarize with human feedback. NeurIPS, 33:3008–3021, 2020.
  28. J. Thomason, M. Murray, M. Cakmak, and L. Zettlemoyer. Vision-and-dialog navigation. In Conference on Robot Learning, pages 394–406. PMLR, 2020.
  29. O. Watkins, A. Gupta, T. Darrell, P. Abbeel, and J. Andreas. Teachable reinforcement learning via advice distillation. NeurIPS, 34, 2021.
  30. J. Wu, L. Ouyang, D. M. Ziegler, N. Stiennon, R. Lowe, J. Leike, and P. Christiano. Recursively summarizing books with human feedback. arXiv preprint arXiv:2109.10862, 2021.
  31. D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Radford, D. Amodei, P. Christiano, and G. Irving. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.