Distribution modeling applied to deficient data species assessment: A case study with Pithecopus nordestinus (Anura, Phyllomedusidae)

The arboreal frog Pithecopus nordestinus is geographically present in almost all Brazilian Northeast territory and Minas Gerais State. It is currently classified as deficient data (DD) by IUCN Red List of Endangered Species, requiring further knowledge about its geographic distribution and population status. In this context, the species distribution modeling can be applied, since its basis uses species occurrence records and environmental variables related to bioclimatic and landscape features. This kind of method predicts the species suitability of certain organism in the geographic space. We obtained 159 P. nordestinus occurrence records, covering all the previously known distribution of the species. These records were collected from direct field sampling, scientific literature, museum collections, and available online databases. We used four species distribution modeling algorithms to obtain the potential range (extent of occurrence) and available habitat for this frog through habitat area analysis proposed by IUCN. The generated models can be considered as excellent, with mean AUC value of 0.981. The environmental variables related to temperature and radiation were the most important to Neotropical Biology and Conservation 15(2): 165–175 (2020) doi: 10.3897/neotropical.15.e47426 Copyright Felipe Pessoa Da Silva et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. ReseARch ARticle Felipe Pessoa Da Silva et al. 166 the construction of this distribution model. Our results indicate that the forested areas of the Atlantic Forest domain and forest patches inside the Caatinga biome present the highest suitability values for the species occurrence and the major part of available habitats, a fact possibly related to the known arboreal habit of this amphibian. We thus provide a new distribution area for P. nordestinus more broadly than previously known and a new polygon for conservation purposes based on extent of occurrence, and an increase of occupancy based on habitat area analysis. The identification of additional areas where the P. nordestinus occurrence was not yet well known, new habitats for possible dispersal or recolonization; and the selection of conservation hotspots applied to this species are direct applications from our study. In addition, the methodological procedures used here may serve as a baseline tool for new investigations with focus on still deficient data species and its ecological and conservation planning requirements.

the construction of this distribution model. Our results indicate that the forested areas of the Atlantic Forest domain and forest patches inside the Caatinga biome present the highest suitability values for the species occurrence and the major part of available habitats, a fact possibly related to the known arboreal habit of this amphibian. We thus provide a new distribution area for P. nordestinus more broadly than previously known and a new polygon for conservation purposes based on extent of occurrence, and an increase of occupancy based on habitat area analysis. The identification of additional areas where the P. nordestinus occurrence was not yet well known, new habitats for possible dispersal or recolonization; and the selection of conservation hotspots applied to this species are direct applications from our study. In addition, the methodological procedures used here may serve as a baseline tool for new investigations with focus on still deficient data species and its ecological and conservation planning requirements.

Keywords
Amphibia, conservation planning, ecological niche, ensemble modeling introduction The IUCN Red List of Threatened Species is one of the most important and relevant tools for biodiversity, as well as a source of information about the status of the worldwide fauna and flora, which supports decision making in conservation planning. A single species can be allocated in only one of the different extinction risk categories ranging from low to high potential. When there is insufficient and/or inadequate information to make an effective assessment of its status, either directly or indirectly from the species distribution and population information, the taxa can be classified as 'deficient data' ("DD"). The taxa classified as DD may require urgent conservation actions, yet this is not a threat category and conservation actions may not be directed at prioritizing, or focusing on, this species (IUCN 2001(IUCN , 2017. The last valid and available IUCN assessment about Anura evaluated the conservation status of more than 6.000 species and, currently, 21.32% of them are categorized as DD. Out of this total DD species, 11% of them occur in the Neotropics. Phyllomedusidae is endemic from the Neotropics. In this family, Pithecopus Cope, 1866 is represented by five distinct species (Faivovich et al. 2010;Frost 2016;IUCN 2019).
The small anuran Pithecopus nordestinus (Caramaschi, 2006) is characterized by the predominant arboreal habits, and can be distinguished from other similar species by its color combination pattern, such as the black vertical bars above the redorange background in the flank hidden parts and limbs. The species is distributed throughout the northeastern region of Brazil, occupying most of the Caatinga biome and adjacent regions in the states of Alagoas, Bahia, Ceará, Minas Gerais, Paraíba, Pernambuco, Piauí, Rio Grande do Norte e Sergipe (Caramaschi 2006;Faivovich et al. 2010;Duellman et al. 2016;Frost 2016). This species is currently classified as DD according to the IUCN Red List, and it is not possible to determine their current status and, moreover, the risk of its extinction. In this context, more information and studies about its distribution and population status are needed (Angulo 2016).
Knowledge about the geographical distribution of a species is a fundamental tool for the foundation of evolutionary and ecological studies. Species distribution models use the ecological niche as baseline (Peterson 2011), merging occurrence data and environmental variables to infer biological requirements for the persistence of a target species (Elith and Leathwick 2009). There are several algorithms used to obtain species distribution models focusing on the species conservation planning (Carroll 2010;Yang et al. 2013). Additionally, this type of approach allows the detailed analysis of environmental variables closely associated with species real occurrence (Phillips et al. 2006(Phillips et al. , 2017. Brooks et al. (2019) proposed the implementation of a new method called the 'area of habitat' (AOH) to fill gaps of information about the extinction risk of a species, based on the geographic distribution and other criteria such as extreme population fluctuations and continuous decline. This analysis is a result of the available habitat for a species inside the distribution limits of this taxa, being more realistic and a refinement of the previous considered analyses by IUCN. The previous assessments by IUCN just considered the extent of occurrence (EOO; is a predicted area from the minimum polygon based on confirmed location records for some taxa) and the area of occupancy (AOO; defined by the occupied cells by the species inside a grid of 2 km 2 ) (IUCN 2013).
Recent occurrences have been recorded to P. nordestinus (Dal Vechio et al. 2016;Castro et al. 2019;Freitas et al. 2019), which in turn may change the IUCN/ conservation status of the species. Here, we also provide new occurrence records for the species, so our aim here is to update the distribution of the species and: (1) characterize the potential distribution of P. nordestinus in Brazil; (2) identify environmental factors associated with the occurrence of their natural populations; and (3) suggest baseline tools to improve conservation strategies for DD species based on IUCN assessments criteria.

Database compilation
The construction of the database containing P. nordestinus occurrence records was performed by field surveys, a probe into the scientific literature, museum vouchers, and online databases such as GBIF (http://www.gbif.org) and SpeciesLink (http:// splink.cria.org). The current nomenclature and previous synonymy of the species, "Pithecopus nordestinus" and "Phyllomedusa nordestina", respectively, were used as search keywords in all databases. Also, only geo-referenced occurrences of the target species were used in the analyses. We obtained a total of 159 unique and representative records to build the model from the entire P. nordestinus range (Figure 1) (Suppl. material 1: Table S1). This information was entered into ARCGIS 10.1 software (ESRI 2010) to extract information on the respective biomes where each sample of P. nordestinus was recorded (based on Olson et al. [2001]), see Suppl. material 1: Table S1.

Selection of Environmental variables
In the modeling procedures we initially considered 36 bioclimatic variables obtained from the WorldClim (http://www.worldclim.org/) and CliMond databases (Hijmans et al. 2005;Hutchinson et al. 2009;Kriticos et al. 2012) and five vegetation structure variables from EarthEnv (Tuanmu and Jetz 2014). All variables were converted using the package raster implemented in R software (R Core Team 2018) at a resolution of 2.5 min. A Pearson's correlation test between variables was conducted, where the most correlated ones were eliminated (r > 0.80; p < 0.05) in order to avoid model over-fitting (Callegari-Jacques 2003;Mateo et al. 2013). A total of 12 least correlated ones were selected to build the models. The bioclimatic variables selected were: temperature seasonality (Bio4), temperature annual range (Bio7), mean temperature of warmest quarter (Bio10), annual precipitation (bio12), Annual mean radiation (bio20), radiation of driest quarter (Bio25), Radiation of coldest quarter (bio27), Mean moisture index of driest quarter (Bio33), evergreen broadleaf trees (V02), Mixed/Other Trees (V04), shrubs (V05) and altitude (alt).

Modeling procedures
The distribution models were produced using ensembles from several algorithms implemented in Biomod2 package (Thuiller et al. 2016) in the R software (R Core Team 2018). The location records were used to construct models containing five sets with 10.000 background points randomly created throughout the study area. We used four different distribution model algorithms: artificial neural networks (ANN) (Ripley 1996), generalized boosted models or boosted regression trees (GBM) (Friedman 2001), random forest (RF) (Breiman 2001), and maximum entropy (Maxent) (Phillips et al. 2006). We totalized 200 runs (comprising 4 algorithms × 10 runs of cross validation x 5 sets of random background points), with 1.000 iterations in each run. To evaluate the models fitting in each run, the dataset was partitioned into two sets: 70% for training and 30% for testing models. Moreover, aiming to analyze the accuracy of our models, we performed two different statistical tests: (1) True Skill Statistic analysis (Allouche et al. 2006) and (2) the Area Under Curve (AUC) value for the Receiver Operating Characteristic (ROC) curve (with values varying from 0 to 1, and based on sensitivity versus specificity of the response between occurrence data and variables, incorporating a binomial probability as a null model; Phillips et al. 2006). We then analyzed the relative contribution of each environmental variable to construct the models using the jackknife test (Phillips et al. 2006). To be conservative, models with AUC and TSS values above 0.7 and 0.4, respectively, were selected for/to construct the ensemble model (Buisson et al. 2010). The ensemble approach considered a final climatic suitability map considering the mean suitability value of each grid generated by distinct algorithms. The validation and clustering of the final maps were done using the minimum omission, considered as the lowest suitability value of an effectively sampled point (Silva et al. 2017), yielding the distribution map for P. nordestinus.

Distribution assessment
The final polygon was built overlapping the new points of occurrence, the IUCN polygon and the models built, and then drawing the minimum convex polygon.
The EOO and AOO were calculated using the package "red -IUCN red listing tools" (Cardoso 2017;Cardoso 2018) implemented in the R software, following the guidelines proposed by IUCN. We analyzed the AOH based in the methods proposed by Brooks et al. (2019). Initially, the polygon was overlaid with the landscape information classes based on Globcover 2015 (https://www.esa-landcover-cci.org) and elevation (Jarvis et al. 2008), with pixel resolution of 30m. This procedure aimed to extract the preferable known habitats and altitudes for the species. Finally, for the AOH validation, we used our own database of location records (we used the binary map to calculate how much of the AOH is predicted to be occupied by the species). All procedures were performed using the ArcGIS 10.1 software (ESRI 2010).

Results
Approximately 55% of the occurrences for the species were located within the Caatinga biome, 40% in the moist Atlantic Forest and 3% in dry Atlantic Forest and have also recorded occurrences in the Cerrado (2%) ( Figure 1A). These classifications in distinct biomes were based on Olson et al. (2001).
The distribution model generated for P. nordestinus in our study provided satisfactory predictive powers, with an average of AUC = 0.973 (± 0.038) and TSS 0.834 (± 0.064). The model shows a greater suitability for P. nordestinus especially in areas of moist Atlantic Forests in the coastal portion of northeast Brazil and dominated forest patches of Caatinga and Cerrado biomes ( Figure 1B). The Temperature annual range (Bio7) presented the greatest contribution to the model construction, followed by the Annual mean radiation (Bio20), the radiation of driest quarter (Bio25) and the temperature seasonality (Bio4). The first two variables combined sum up to  a total of 49.9% of contribution for the model construction (Table 1). The Jackknife analysis also shows the variable Bio11 as the one that presents a greater gain to the training of the model. The current available IUCN polygon (Angulo 2016) showed a total area of 1,061,442 km 2 , more restricted compared with our proposed polygon of 1,531,907 km 2 . Thus, our proposed distribution increased the range of P. nordestinus in 44.32% ( Figure 1C). The EOO calculated by using the modeled area was 1,618,169 km 2 , with an AOO of 115,484 km 2 . The AOH obtained ( Figure 1D) was 1,126,679 km 2 within the new proposed distribution polygon, as well the overlapped predicted area inside of AOH was 434,379 km 2 .

Discussion
The high AUC and TSS values obtained (above 0.80) indicates that all models were able to roughly estimate the geographic distribution of the species as ranked by Elith and Leathwick (2009). Despite the high predictive value, the models can present some sources of uncertainty, which are possibly derived from information not captured by the algorithms or important information on the species' distribution that were not considered during its construction. A future step to try to improve the models would be to incorporate availability information about resources for reproduction, such as the proximity of water bodies, and also biotic variables such as the presence and abundance of predators and competitors (Caldas et al. 2019). The high suitability found in areas of moist Atlantic Forests agrees with Moura et al. (2015) that suggest high association between the species' presence and forest patches, due to their preferable arboreal habits. Additionally, our results incorporate changes in the species distribution described initially by Caramaschi (2006) and the actually valid polygon from the last IUCN Red List assessment (Angulo 2016), which can be observed in the differences between the polygons showed in the Figure 1B, filling important gaps in the knowledge about the distribution of P. nordestinus.
The species is still considered as data deficient at global level, so, comparing the two polygons, we can highlight an expansion of the potential distribution of the species to the southern coast of Bahia State, and the western part of the Brazilian northeastern region. Through the association between the species distribution model and AOH obtained in our analyses, we identified realistic suitable areas inside the known occurrence limits of the species ( Figure 1D). This kind of approach can be considered as an important tool for conservation planning in global and local levels and can be applied to other species that present the same endemism patterns, especially in the Caatinga and Atlantic Forest domain.
Previous studies on ecological niche modeling for reptiles (Sales et al. 2015) and amphibians (Cassemiro et al. 2012) from northeast Brazil suggest that temperature is the main environmental variable related to the distribution of these groups in the region. Our results agree with those obtained by Cassemiro et al. (2012). However, it is possible to observe environmental variables related to radiation and moisture also contributing significantly (Table 1). All these results converge to the direct in-fluence under forested landscapes preferred by P. nordestinus. These environmental predictors could be directly related to thermoregulation, especially in semi-desert ecoregions such as the Caatinga (Olson et al. 2001). In this biome, individuals tend to search for forest patches aiming to avoid the loss of body water to the environment by evaporation (Sanabria and Quiroga 2019).
We propose an increase in the distributional suitable area for P. nordestinus previously described by recent studies. The environmental variables that best explain the distribution of this species are related to temperature (mainly), radiation and moisture. Additionally, the distribution can be influenced by the landscape conformation and habitat availability (Van Buskirk 2005), favoring the preferable habits of this tree frog and thermoregulation features associated. Our findings can be applied to the identification of additional locations where P. nordestinus may exist, but has not yet been documented; recognition of localities where the species lost their range and could be reestablished in the future; selection of hotspots (priority areas) for conservation; and to enhancing the IUCN Red List assessments, thus outlining new conservation strategies for deficient data species and those potentially at risk.