Selection of most relevant organic molecules
The first step was to select an appropriate search engine for this collection of bibliographic resources. Three search engines were available at the Université de Lorraine: WOS, Science Direct and Ex Libris. Whether at the creation of BAPPOP in 2015, or when it was updated in 2023, WOS was selected as being the tool returning the highest number of references for a test associating one of the five sub-families of organic pollutants (polycyclic aromatic hydrocarbon-PAH, polychlorinated biphenyls-PCB, polychlorinated dibenzodioxins-PCDD/polychlorinated dibenzofurans-PCDF and benzene, toluene, ethylbenzene and xylene-BTEX) to four target keywords (plant/transfer/uptake/soil).
At the time of the BAPPOP dataset creation in 2015 it appeared essential to define a reasonable list of target molecules to be included in the dataset, so that it would both meet the end-users’ expectations and be adapted to existing literature resources. This list was updated in 2023 following the same decision-making methodology. The decision-making process thus associated three major issues regarding the existence of i) regulations and ii) toxicological reference values, and iii) the availability of published data.
In 2015 a preliminary list of 576 organic molecules was established, based on existing European legislation6,7,8, scientific reports by the European Food Safety and Authority (EFSA 20089 and EFSA 201010) and French monitoring and control plan of animal and plant foodstuffs and animal feedstuffs by the French directorate for food 200411 and 201012. To be further considered, a molecule had to be present in at least three of the above cited regulations, thereby leading in 2015 to a reduced list of 169 molecules. To this list, organic molecules considered as priority substances by environmental and health risk agencies were added on expert opinion. Thus, this led to a list of 235 organic molecules of interest.
Given the way the dataset was to be implemented, it seemed pointless to include molecules which had yet to be studied and for which literature resource was poor or non-existent. A bibliographic criterion, based on the potential availability of data was defined. Using the request in WOS [“plant” AND “transfer” OR “uptake” AND “name of the organic molecule”], where “name of the molecule” was replaced by the name of one of the 235 pre-selected molecule, initially applied over the period 1950 to June 19th 2014, the Bibliometric score set was equal to the number of publications returned by the search +1.
A Toxicological score combining hazard and dose scoring was defined as follows. The Hazard score was defined according to the French National Institute of Research and Security (INRS) classification on carcinogenic, mutagenic and reprotoxic (CMR) substances13 and the directive 1272/2008/CEE14 according to the class 1A or 1B, 2 and no CMR classification. If the organic molecule is:
Class 1A or 1B, the hazard score is equal to 6;
Class 2, the hazard score is equal to 3;
Not classified as CMR, the hazard score is equal to 1.
The Dose score is given according to the existence of a toxicity reference value (TRV) in the main toxicological databases, like US EPA15, ATSDR16 and RIVM17. A score equal to 2 is attributed to a substance which presents at least one TRV, and a score equal to 1 is attributed when no TRV exists for the substance. The global toxicological score is then given by:
$${Toxicological\; score}={Hazard\; score}\ast {Dose\; score}$$
(1)
A Global score was then calculated with:
$${Global\; score}={Bibliometric\; score}\ast {Toxicological\; score}$$
(2)
In 2015, this analysis led to the selection of 47 molecules to be considered in the dataset. To represent the quantity of available publications and the number of molecules to consider, the cumulative number of publications as a function of molecule number was represented (Fig. 1). Thus, it makes it possible to assess the gain in publications compared to the number of molecules to be considered. It has been suggested that a threshold of 30 should be set, given that the gain in publishing beyond this is small (Fig. 1). In order not to exclude other non-dioxin like PCB, 7 non-dioxin like PCB were added, resulting in a list of 54 organic molecules of interest.
Fig. 1
Cumulative number of publications as a function of the number of molecules to consider.
In 2023, the same methodology was followed, considering the period 2015 to 2022 for the update. Updated legislation was used for the filter application, leading to the addition of 18 new organic molecules to the BAPPOP dataset. Based on the same strategy, 36 molecules that are now prioritised by environmental and risks agencies were added to the BAPPOP dataset update, including PCCD, PCDF and per-and polyfluoroalkyl substances (PFAS), of which 7 PCDD and 10 PCDF that are conventionally assessed in the environment and 19 PFAS based on commonly studied PFAS in scientific literature. Thus, 108 organic molecules of interest are now considered in the BAPPOP dataset.
Plant selection criteria
The publications, of whatever origin, from which the data were registered in BAPPOP were selected following a specific decision-making methodology detailed in the Fig. 2 flowchart. The first selection criterion was to refer to experiments on kitchen garden crops exclusively. The kitchen garden crops not cultivated in temperate climates were not included.
Fig. 2
Decision-making flowchart for data implementation in BAPPOP dataset. (BCF stands for bioconcentration factor).
With regards to plant species, the consumed organs of plants were taken into account, to the exclusion of all others. For example, a study concerning the analysis of organic molecule concentration in tomato roots was excluded. Similarly, studies involving model plant species (e.g. Arabidopsis thaliana) or cereals other than grain of sweet corn were excluded. The molecules studied in publications had to be among the 108 selected molecules of interest. In addition, all experiments in cited studies had to be carried out on soils, leading to the elimination of hydroponic experimentations or direct use of substrates (e.g. compost, sewage sludge…) without mixing with soil. Organic molecule concentrations in soils and plants had to be supplied by the publication. Situation in which experimentations were conducted with unrealistically high doses of organic molecules in soils or leading to significant phytotoxic responses of the plant were also excluded. When bioconcentration factors (BCF) were used to express plant uptake, these had to correspond to only one species and one edible part of the plant. Data obtained by averaging the concentration of different plant species or different organs of different plant species or even calculated from models were also excluded.
Data collection
Relevant international scientific published literature was collected with a search request in WOS database, performed on title, abstract and keywords, using the following equation:
$$BAPPOP\,update=(Plant\,\mathrm{AND}\,(Transfer\,OR\,Uptake)\,\mathrm{AND}\, \mbox{“} Name\,of\,the\,organic\,molecule\,of\,interest\mbox{”})$$
(3)
At the time of BAPPOP creation in 2015, 1213 occurrences were identified for the period spanning 1950 to June 19th 2014. When updating the dataset in 2022, 2677 occurrences were identified for the subsequent period from 2015 to August 1st 2022. Following the methods described above, 90 references were recorded in BAPPOP with 87 references collected via search request on WOS and 3 experimental reports from French institutes.