iNaturalist - data manimpulation and plotting

I started this project to make a Christmas present for my mother. She takes a lot of pictures of organisms and catelogs them on iNaturalist. iNaturalist is a website/app that “crowd-sources” observations of nature and then makes the data available to analyze. It’s pretty cool. https://www.inaturalist.org/home.

Each identification has to be confirmed by two other people to cound as “research-grade”. There are a lot of issues with uneven sample sizee, and presence-only data if you want to use it for actual species distribution modeling, but it’s still pretty neat.

The package rinat allows you to pull data from iNaturalist.

For example, if I want to see where all the periodical cicadas were this summer, I can look up the taxon_id on the website for Magicicada septendecim. You can specify observation quality, geographic bounds, time bounds, etc.

You can get up to 10,000 observations with each query, so large queries need to be broken up.

?get_inat_obs

#I'm going to look for all the research-quality observations
#from 2021 that have geographic location info.
phar = get_inat_obs(taxon_id = 105098, quality = "research",
                    geo = TRUE, maxresults = 1000, year = 2021)

str(phar)
## 'data.frame':    1000 obs. of  36 variables:
##  $ scientific_name                 : chr  "Magicicada septendecim" "Magicicada septendecim" "Magicicada septendecim" "Magicicada septendecim" ...
##  $ datetime                        : chr  "2021-05-17 09:06:00 -0400" "2021-06-04 08:34:00 -0400" "2021-06-02 07:09:12 -0600" "2021-11-06 12:46:00 -0400" ...
##  $ description                     : chr  "" "Brood X. Mated pair & masses on tree trunk" "" "This specimen seems to have been preserved under a log since it's emergence." ...
##  $ place_guess                     : chr  "Howard County, MD, USA" "Princeton, NJ, USA" "Superior Township" "Frederick County, MD, USA" ...
##  $ latitude                        : num  39.2 40.4 42.3 39.6 39.9 ...
##  $ longitude                       : num  -77 -74.7 -83.6 -77.4 -76.4 ...
##  $ tag_list                        : chr  "" "" "" "" ...
##  $ common_name                     : chr  "Pharaoh Cicada" "Pharaoh Cicada" "Pharaoh Cicada" "Pharaoh Cicada" ...
##  $ url                             : chr  "https://www.inaturalist.org/observations/101054735" "https://www.inaturalist.org/observations/100808164" "https://www.inaturalist.org/observations/100520076" "https://www.inaturalist.org/observations/100449425" ...
##  $ image_url                       : chr  "https://inaturalist-open-data.s3.amazonaws.com/photos/168719974/medium.jpg" "https://inaturalist-open-data.s3.amazonaws.com/photos/168288385/medium.jpeg" "https://inaturalist-open-data.s3.amazonaws.com/photos/167781691/medium.jpeg" "https://inaturalist-open-data.s3.amazonaws.com/photos/167651363/medium.jpeg" ...
##  $ user_login                      : chr  "smuller" "jimdugan" "eprince2" "emilio_c" ...
##  $ id                              : int  101054735 100808164 100520076 100449425 100270113 100245595 100113260 100104465 100104441 100104432 ...
##  $ species_guess                   : chr  "Pharaoh Cicada" "Pharaoh Cicada" "Pharaoh Cicada" "Pharaoh Cicada" ...
##  $ iconic_taxon_name               : chr  "Insecta" "Insecta" "Insecta" "Insecta" ...
##  $ taxon_id                        : int  105098 105098 105098 105098 105098 105098 105098 105098 105098 105098 ...
##  $ num_identification_agreements   : int  2 2 2 1 1 1 2 2 2 1 ...
##  $ num_identification_disagreements: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ observed_on_string              : chr  "2021/05/17 9:06 AM EDT" "2021/06/04 8:34 AM EDT" "Wed Jun 02 2021 07:09:12 GMT -0700 (MST)" "2021/11/06 12:46 PM EDT" ...
##  $ observed_on                     : chr  "2021-05-17" "2021-06-04" "2021-06-02" "2021-11-06" ...
##  $ time_observed_at                : chr  "2021-05-17 13:06:00 UTC" "2021-06-04 12:34:00 UTC" "2021-06-02 13:09:12 UTC" "2021-11-06 16:46:00 UTC" ...
##  $ time_zone                       : chr  "Eastern Time (US & Canada)" "Eastern Time (US & Canada)" "Mountain Time (US & Canada)" "Eastern Time (US & Canada)" ...
##  $ positional_accuracy             : int  771 5693 NA 814 NA 290 NA 10 5 5 ...
##  $ public_positional_accuracy      : int  771 5693 NA 814 NA 290 NA 10 5 5 ...
##  $ geoprivacy                      : chr  "" "" "" "" ...
##  $ taxon_geoprivacy                : chr  "open" "open" "open" "open" ...
##  $ coordinates_obscured            : chr  "false" "false" "false" "false" ...
##  $ positioning_method              : chr  "" "" "" "" ...
##  $ positioning_device              : chr  "" "" "" "" ...
##  $ user_id                         : int  267190 824339 4944119 1557503 35628 632759 2895175 330921 330921 330921 ...
##  $ created_at                      : chr  "2021-11-14 00:47:23 UTC" "2021-11-10 23:52:48 UTC" "2021-11-07 19:04:10 UTC" "2021-11-07 00:42:44 UTC" ...
##  $ updated_at                      : chr  "2021-11-14 17:17:58 UTC" "2021-11-12 15:11:49 UTC" "2021-11-08 03:56:33 UTC" "2021-11-07 03:27:35 UTC" ...
##  $ quality_grade                   : chr  "research" "research" "research" "research" ...
##  $ license                         : chr  "CC-BY-NC" "CC-BY-NC" "CC-BY-NC" "CC-BY-NC" ...
##  $ sound_url                       : chr  "" "" "" "" ...
##  $ oauth_application_id            : int  NA NA 333 NA 333 NA 2 NA NA NA ...
##  $ captive_cultivated              : chr  "false" "false" "false" "false" ...

Visualizing the data

rinat doesn’t have a lot of built-in visualization methods, but it does have the ability to map the data.

#It defaults to the US
inat_map(phar)

Instead of plotting every point on the map (which is hard to see), I’d rather group the observations by county and use a color ramp to show density of cicadas.

I can get maps of the counties in each US state with the tigris package. I’ll start with Maryland, because that was the center of the Brood X cicadas, and it’s where my mother lives.

#I use the 'counties' data set and specify the state from the tigris package. 
#Then i use `st_transform` from the sf pacakge to change the coordinate reference 
# system to match the data. 
MD = counties(state = "MD") %>%
  st_transform(crs = 4326)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |=========                                                             |  14%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=======================                                               |  34%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |==============================                                        |  44%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=====================================                                 |  54%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |============================================                          |  64%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |===================================================                   |  74%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |==========================================================            |  84%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |===============================================================       |  91%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |=================================================================     |  94%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================|  99%
  |                                                                            
  |======================================================================| 100%
#now I turn my cicada data frame into a spatial data frame.
phar_sf = st_as_sf(phar, coords = c("longitude","latitude"), crs = 4326)

#join by county in Maryland
phar2 = st_join(phar_sf, MD) %>%
  #remove all obersvations outside of Maryland
  filter(!is.na(STATEFP)) 

#Remove the 'geometry column' and calculate total number of observations
#in each county using 'group_by' and 'summarize'
pharSum = phar2 %>%
  st_drop_geometry() %>%
  group_by(NAME) %>%
  summarize(Cicadas = n())

#join the total number of cicadas to the map of maryland
MD2 = left_join(MD, pharSum)
## Joining, by = "NAME"
#plot th emap
ggplot() +
  geom_sf(data = MD2, aes(fill = Cicadas))

Observations per individual

So, all that was looking at all the observations for a particular taxa. If we want to look at all the observations for a particular individual, we can do that too.The query in rinat doesn’t have as many options for filtering, but we can definitely query all observations for a person.

My mother’s user id is jane41. Let’s see what she’s been seeing lately.

Note: sometimes it’s a little buggy. If it runs for a long time without stopping, intterupt it and try again.

mom = get_inat_obs_user("jane41", maxresults = 1000)
str(mom)
## 'data.frame':    1000 obs. of  36 variables:
##  $ scientific_name                 : chr  "Callirhytis quercuspunctata" "Harpalus" "Boisea trivittata" "Carabidae" ...
##  $ datetime                        : chr  "2021-11-14 13:16:50 -0500" "2021-11-14 13:06:51 -0500" "2021-11-08 14:38:23 -0500" "2021-10-26 14:35:53 -0400" ...
##  $ description                     : chr  "" "" "" "" ...
##  $ place_guess                     : chr  "Rockville, MD 20851, USA" "Baltimore Rd & Gladstone Dr, Rockville, MD 20851, USA" "Veirs Mill Rd & Aspen Hill Rd, Aspen Hill, MD 20853, USA" "Rockville, MD 20851, USA" ...
##  $ latitude                        : num  39.1 39.1 39.1 39.1 38.9 ...
##  $ longitude                       : num  -77.1 -77.1 -77.1 -77.1 -77 ...
##  $ tag_list                        : chr  NA NA NA NA ...
##  $ common_name                     : chr  "Gouty Oak Gall Wasp" "" "Eastern Boxelder Bug" "Ground Beetles" ...
##  $ url                             : chr  "https://www.inaturalist.org/observations/101121079" "https://www.inaturalist.org/observations/101120827" "https://www.inaturalist.org/observations/100623377" "https://www.inaturalist.org/observations/100176023" ...
##  $ image_url                       : chr  "https://static.inaturalist.org/photos/168835023/medium.jpeg" "https://static.inaturalist.org/photos/168834568/medium.jpeg" "https://static.inaturalist.org/photos/167968438/medium.jpeg" "https://static.inaturalist.org/photos/167158937/medium.jpeg" ...
##  $ user_login                      : chr  "jane41" "jane41" "jane41" "jane41" ...
##  $ id                              : int  101121079 101120827 100623377 100176023 100175878 99920508 99803159 99376426 99376399 99376309 ...
##  $ species_guess                   : chr  "Gouty Oak Gall Wasp" "Harpalus" "Eastern Boxelder Bug" "Ground Beetles" ...
##  $ iconic_taxon_name               : chr  "Insecta" "Insecta" "Insecta" "Insecta" ...
##  $ taxon_id                        : int  179371 131093 53227 49567 126532 126532 47380 155106 55719 335597 ...
##  $ num_identification_agreements   : int  0 1 1 1 1 1 0 1 2 0 ...
##  $ num_identification_disagreements: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ observed_on_string              : chr  "2021-11-14 13:16:50" "2021-11-14 13:06:51" "2021-11-08 14:38:23" "2021-10-26 14:35:53" ...
##  $ observed_on                     : chr  "2021-11-14" "2021-11-14" "2021-11-08" "2021-10-26" ...
##  $ time_observed_at                : chr  "2021-11-14 18:16:50 UTC" "2021-11-14 18:06:51 UTC" "2021-11-08 19:38:23 UTC" "2021-10-26 18:35:53 UTC" ...
##  $ time_zone                       : chr  "Eastern Time (US & Canada)" "Eastern Time (US & Canada)" "Eastern Time (US & Canada)" "Eastern Time (US & Canada)" ...
##  $ positional_accuracy             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ public_positional_accuracy      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ geoprivacy                      : logi  NA NA NA NA NA NA ...
##  $ taxon_geoprivacy                : logi  NA NA NA NA NA NA ...
##  $ coordinates_obscured            : chr  "false" "false" "false" "false" ...
##  $ positioning_method              : chr  "gps" "gps" "gps" "gps" ...
##  $ positioning_device              : chr  "gps" "gps" "gps" "gps" ...
##  $ user_id                         : int  502784 502784 502784 502784 502784 502784 502784 502784 502784 502784 ...
##  $ created_at                      : chr  "2021-11-14 18:32:35 UTC" "2021-11-14 18:29:42 UTC" "2021-11-08 20:36:24 UTC" "2021-11-03 20:32:29 UTC" ...
##  $ updated_at                      : chr  "2021-11-14 18:32:44 UTC" "2021-11-14 18:44:41 UTC" "2021-11-08 20:39:43 UTC" "2021-11-04 05:20:56 UTC" ...
##  $ quality_grade                   : chr  "needs_id" "needs_id" "research" "needs_id" ...
##  $ license                         : logi  NA NA NA NA NA NA ...
##  $ sound_url                       : logi  NA NA NA NA NA NA ...
##  $ oauth_application_id            : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ captive_cultivated              : chr  "false" "false" "false" "false" ...

Alright. I’d like to do some sort of plot of diversity of observations over time, but it doesn’t really make sense to do it at the species level. I’d like to plot number of Orders or Families or Phyla or something, but that information isn’t provided.

There is the iconic_taxon_name which has larger groups, but those might be too big.

ggplot(mom, aes(x= iconic_taxon_name)) +geom_bar()

To figure out what genus, family, order, class, phyla, etc, goes with each species, we can use the taxize package. This package is really, really cool. I am going to use it a lot in the future.

taxize can query various taxonomic databases on the interwebs and pull out futher taxonomic levels. It’s got tools to look up synonyms, query all data upstream and downstream, etc. It’s really powerful.

#example

#it helps a lot to get the unique identifier for each taxa before querying
#the database. 
?get_uid()
Maple = get_uid("Acer rubrum")
## No ENTREZ API key provided
##  Get one via taxize::use_entrez()
## See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
## ══  1 queries  ═══════════════
## 
## Retrieving data for taxon 'Acer rubrum'
## ✔  Found:  Acer+rubrum
## ══  Results  ═════════════════
## 
## • Total: 1 
## • Found: 1 
## • Not Found: 0
#Once you get the unique identifier, you can look up the classificaiton
Mapleclass = classes = classification(Maple, db = 'ncbi')
## No ENTREZ API key provided
##  Get one via taxize::use_entrez()
## See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
#it automatically comes out as a list, it will be easier to see as a data frame
Mapleclass = rbind(Mapleclass)
print(Mapleclass)
##                  name         rank      id query
## 1  cellular organisms      no rank  131567 45314
## 2           Eukaryota superkingdom    2759 45314
## 3       Viridiplantae      kingdom   33090 45314
## 4        Streptophyta       phylum   35493 45314
## 5      Streptophytina    subphylum  131221 45314
## 6         Embryophyta        clade    3193 45314
## 7        Tracheophyta        clade   58023 45314
## 8       Euphyllophyta        clade   78536 45314
## 9       Spermatophyta        clade   58024 45314
## 10      Magnoliopsida        class    3398 45314
## 11    Mesangiospermae        clade 1437183 45314
## 12     eudicotyledons        clade   71240 45314
## 13         Gunneridae        clade   91827 45314
## 14       Pentapetalae        clade 1437201 45314
## 15             rosids        clade   71275 45314
## 16            malvids        clade   91836 45314
## 17         Sapindales        order   41937 45314
## 18        Sapindaceae       family   23672 45314
## 19  Hippocastanoideae    subfamily 1977916 45314
## 20            Acereae        tribe 1977919 45314
## 21               Acer        genus    4022 45314
## 22        Acer rubrum      species   45314 45314

Now you have a list of all the different levels of classification for red maples! Pretty neat.

But that was just one organism. What about all the different critters that my mom saw?

The really powerful thing about this querying function is you can feed it a vector of IDs at any taxonomic level. You can mix genus, species, order, phylum, etc, and it will find them and give you the rest of the classfication.

#first get all the unique taxa

allspecies = unique(mom$scientific_name)

#if you ask the API for too many things at once, it freezes.
#I found this work around online. Basically, it makes the computer wait
#one second between queries. This will take a while

res = lapply(allspecies, function(w) {
  Sys.sleep(1) # sleep for a second, possibly less to avoid rate limit
  get_uid(w, rows = 1, messages = FALSE) #get unique identifier
})

#reformat numbers as unique identifiers
res <- as.uid(res, check = FALSE) # don't check that ids are valid, much faster

#put it in a data frame
sigh = as.data.frame(res) 

#bind unique identifiers to species names (for use later)
names = data.frame(query = sigh$ids, scientific_name = allspecies)

#now query all the unique identifiers
classes = classification(res, db = 'ncbi', batch_size=10)

#put them into a data frame
classesR = rbind(classes)
str(classesR)
## 'data.frame':    9357 obs. of  4 variables:
##  $ name : chr  "cellular organisms" "Eukaryota" "Opisthokonta" "Metazoa" ...
##  $ rank : chr  "no rank" "superkingdom" "clade" "kingdom" ...
##  $ id   : chr  "131567" "2759" "33154" "33208" ...
##  $ query: chr  "1277657" "1277657" "1277657" "1277657" ...

Now, this is a TON of information. Some of these taxonomic levels are once i’ve never heard of. There are a lot of ‘clades’ that aren’t used very much unless you are really a specialized taxonomist. Also, the data frame is in ‘long’ format. I think it will be easier to deal with in ‘wide’ format with the levels of classification as columns instead of rows.

momclass = classesR %>%
  #Just grab the levels of classification I am most interested in
  filter(rank %in% c("order", "species", "genus", "phylum", "class", "family"))%>%
  
  #shift from long to wide using `pivot_wider`
  pivot_wider(id_cols = query, names_from = rank, values_from = name,
              values_fn = first) %>%
  
  #add the scientific names bak on
  left_join(names) %>%
  
  #remove any duplicate rows
  distinct()
## Joining, by = "query"
str(momclass)
## tibble [383 × 8] (S3: tbl_df/tbl/data.frame)
##  $ query          : chr [1:383] "1277657" "41076" "1255142" "41073" ...
##  $ phylum         : chr [1:383] "Arthropoda" "Arthropoda" "Arthropoda" "Arthropoda" ...
##  $ class          : chr [1:383] "Insecta" "Insecta" "Insecta" "Insecta" ...
##  $ order          : chr [1:383] "Hymenoptera" "Coleoptera" "Hemiptera" "Coleoptera" ...
##  $ family         : chr [1:383] "Cynipidae" "Carabidae" "Rhopalidae" "Carabidae" ...
##  $ genus          : chr [1:383] "Callirhytis" "Harpalus" "Boisea" NA ...
##  $ species        : chr [1:383] "Callirhytis quercuspunctata" NA "Boisea trivittata" NA ...
##  $ scientific_name: chr [1:383] "Callirhytis quercuspunctata" "Harpalus" "Boisea trivittata" "Carabidae" ...

Now that we have the full taxonomic ranks for all the observations, we can merge them onto the origional data frame using ‘left_join’

momwclasses = left_join(mom, momclass)
## Joining, by = "scientific_name"
str(momwclasses)
## 'data.frame':    1000 obs. of  43 variables:
##  $ scientific_name                 : chr  "Callirhytis quercuspunctata" "Harpalus" "Boisea trivittata" "Carabidae" ...
##  $ datetime                        : chr  "2021-11-14 13:16:50 -0500" "2021-11-14 13:06:51 -0500" "2021-11-08 14:38:23 -0500" "2021-10-26 14:35:53 -0400" ...
##  $ description                     : chr  "" "" "" "" ...
##  $ place_guess                     : chr  "Rockville, MD 20851, USA" "Baltimore Rd & Gladstone Dr, Rockville, MD 20851, USA" "Veirs Mill Rd & Aspen Hill Rd, Aspen Hill, MD 20853, USA" "Rockville, MD 20851, USA" ...
##  $ latitude                        : num  39.1 39.1 39.1 39.1 38.9 ...
##  $ longitude                       : num  -77.1 -77.1 -77.1 -77.1 -77 ...
##  $ tag_list                        : chr  NA NA NA NA ...
##  $ common_name                     : chr  "Gouty Oak Gall Wasp" "" "Eastern Boxelder Bug" "Ground Beetles" ...
##  $ url                             : chr  "https://www.inaturalist.org/observations/101121079" "https://www.inaturalist.org/observations/101120827" "https://www.inaturalist.org/observations/100623377" "https://www.inaturalist.org/observations/100176023" ...
##  $ image_url                       : chr  "https://static.inaturalist.org/photos/168835023/medium.jpeg" "https://static.inaturalist.org/photos/168834568/medium.jpeg" "https://static.inaturalist.org/photos/167968438/medium.jpeg" "https://static.inaturalist.org/photos/167158937/medium.jpeg" ...
##  $ user_login                      : chr  "jane41" "jane41" "jane41" "jane41" ...
##  $ id                              : int  101121079 101120827 100623377 100176023 100175878 99920508 99803159 99376426 99376399 99376309 ...
##  $ species_guess                   : chr  "Gouty Oak Gall Wasp" "Harpalus" "Eastern Boxelder Bug" "Ground Beetles" ...
##  $ iconic_taxon_name               : chr  "Insecta" "Insecta" "Insecta" "Insecta" ...
##  $ taxon_id                        : int  179371 131093 53227 49567 126532 126532 47380 155106 55719 335597 ...
##  $ num_identification_agreements   : int  0 1 1 1 1 1 0 1 2 0 ...
##  $ num_identification_disagreements: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ observed_on_string              : chr  "2021-11-14 13:16:50" "2021-11-14 13:06:51" "2021-11-08 14:38:23" "2021-10-26 14:35:53" ...
##  $ observed_on                     : chr  "2021-11-14" "2021-11-14" "2021-11-08" "2021-10-26" ...
##  $ time_observed_at                : chr  "2021-11-14 18:16:50 UTC" "2021-11-14 18:06:51 UTC" "2021-11-08 19:38:23 UTC" "2021-10-26 18:35:53 UTC" ...
##  $ time_zone                       : chr  "Eastern Time (US & Canada)" "Eastern Time (US & Canada)" "Eastern Time (US & Canada)" "Eastern Time (US & Canada)" ...
##  $ positional_accuracy             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ public_positional_accuracy      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ geoprivacy                      : logi  NA NA NA NA NA NA ...
##  $ taxon_geoprivacy                : logi  NA NA NA NA NA NA ...
##  $ coordinates_obscured            : chr  "false" "false" "false" "false" ...
##  $ positioning_method              : chr  "gps" "gps" "gps" "gps" ...
##  $ positioning_device              : chr  "gps" "gps" "gps" "gps" ...
##  $ user_id                         : int  502784 502784 502784 502784 502784 502784 502784 502784 502784 502784 ...
##  $ created_at                      : chr  "2021-11-14 18:32:35 UTC" "2021-11-14 18:29:42 UTC" "2021-11-08 20:36:24 UTC" "2021-11-03 20:32:29 UTC" ...
##  $ updated_at                      : chr  "2021-11-14 18:32:44 UTC" "2021-11-14 18:44:41 UTC" "2021-11-08 20:39:43 UTC" "2021-11-04 05:20:56 UTC" ...
##  $ quality_grade                   : chr  "needs_id" "needs_id" "research" "needs_id" ...
##  $ license                         : logi  NA NA NA NA NA NA ...
##  $ sound_url                       : logi  NA NA NA NA NA NA ...
##  $ oauth_application_id            : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ captive_cultivated              : chr  "false" "false" "false" "false" ...
##  $ query                           : chr  "1277657" "41076" "1255142" "41073" ...
##  $ phylum                          : chr  "Arthropoda" "Arthropoda" "Arthropoda" "Arthropoda" ...
##  $ class                           : chr  "Insecta" "Insecta" "Insecta" "Insecta" ...
##  $ order                           : chr  "Hymenoptera" "Coleoptera" "Hemiptera" "Coleoptera" ...
##  $ family                          : chr  "Cynipidae" "Carabidae" "Rhopalidae" "Carabidae" ...
##  $ genus                           : chr  "Callirhytis" "Harpalus" "Boisea" NA ...
##  $ species                         : chr  "Callirhytis quercuspunctata" NA "Boisea trivittata" NA ...

Now we can filter the data set for a particular type of critter and make graphs!

For example, Mom takes a lot of pictures of insects, particularly the hover flies that land on flowers.I can now filter the dataset for all the insects in the order "Diptera’

#use 'filter' from dplyr
momflies = filter(momwclasses, order == "Diptera")

#how many individuals of per family of flies has mom seen?
ggplot(momflies, aes(x = family, fill = family)) + geom_bar()+
  theme(axis.text.x = element_text(angle = 90))

I can also creat summary data sets by using group_by and summarize to calculate number of individuals per family (or per genus)

momsum = group_by(momflies, family) %>%
  summarize(N = n()) %>%
  
  #we had so many familmies it was hard to see. Let's group some rare families into 'other'
  mutate(family2 = case_when(
    N < 3 ~ "other",
    is.na(family) ~ "Unknown",
    TRUE ~ family
  ))


momflies = left_join(momflies, momsum)
## Joining, by = "family"
momgenus = group_by(momflies, family, genus) %>%
  summarize(N = n())
## `summarise()` has grouped output by 'family'. You can override using the `.groups` argument.
#the 'treemap' package allows you do make these cool plots with boxes of proportional sizes based on groups
?treemap
treemap(momflies,
        index=c("family","genus"),
        vSize="N",
        type="index",
        lowerbound.cex.labels = 0
)

I can look at how number of observations changes over time if I reformat the observation date and time with the ‘lubridate’ package

#reformat the date from ta character to a date, and extract the year, month, and day of the year
momflies = momflies %>%
  mutate(Date = ymd(observed_on), Year = year(Date), Month = month(Date), Julian = yday(Date))

#how many individuals of each species did she see each month?
ggplot(momflies, aes(x = Month, fill = genus)) + geom_bar()

Another neat data visualization we can play with are ridgeline plots from the ggridges package

?geom_density_ridges
ggplot(momflies, aes(x = Julian, y = family2, fill = family2)) + 
  geom_density_ridges2(jittered_points = TRUE)
## Picking joint bandwidth of 45.7

#or we can summarize the data and plot numbre of bugs over time

#fill in zeros for days with no obeservations and total up 
#number of observations per day.
flysum= group_by(momflies, Julian, family2) %>%
  summarise(N = n()) %>%
  pivot_wider(id_cols = Julian, names_from = family2, values_fill = 0, values_from = N) %>%
  pivot_longer(cols = -Julian, names_to = "family", values_to = "N")
## `summarise()` has grouped output by 'Julian'. You can override using the `.groups` argument.
ggplot(flysum, aes(x = Julian, y = family, fill = family, height = N)) + 
  geom_ridgeline()+
  scale_y_discrete(expand = c(0.01, 0)) +
  scale_fill_discrete(guide = "none")

There are lots of other fun things to do with this data, I just show you a few visualizations. Have fun!