I'd like to search plant science literature (full text) to only return articles in which the word "three" appears four or more times in the full-text Methods section (presumably the best source of this information would be pubmed central). I've looked at scite - and this accepts json and regex - but apparently only searches citations rather than full-text methods. Perhaps it's possible to write a query which uses json and regex to query https://europepmc.org/ to perform this specific action ? An alternative would be to use informatics tools to download huge amounts of data from PubMed and then construct R (or other) programming in order to search through the downloaded dataset. This question is about what is the most effective way forward in order to achieve this ? I have extensive R coding experience, Python shouldn't be a problem - I've been advised that an API would probably first be needed - and then NLP (natural language processing), particularly a technique called "stemming" and "tokenising", or perhaps the large language model langchain ?
Asked
Active
Viewed 46 times