University of California Berkeley and
Lawrence Berkeley National Laboratory
The majority of all materials data is currently scattered across the text, tables, and figures of millions of scientific publications. In my talk, I will discuss the work of our team at Lawrence Berkeley National Laboratory on the use of natural language processing (NLP) to extract and discover scientific knowledge through textual analysis of the abstracts of several million journal articles. With this data we are exploring new avenues for materials discovery and design such as how functional materials like thermoelectrics can be identified by using only unsupervised word embeddings for materials. To date, we have used advanced techniques for named entity recognition to extract more than 100 million mentions of materials, structures, properties, applications, synthesis methods, and characterization techniques from our database of over 3 million materials science abstracts. Our most recent work utilizes GPT-3, the same machine learning model behind OpenAI's ChatGPT, for joint named entity recognition and relation extraction to extract complex hierarchical information from research articles. Finally, I will also give an overview on how we are making all of this data freely available to the materials research community through our public-facing website matscholar.com and upcoming APIs.
This is a Zoom seminar.