Learning and Applying Basic API Tasks in Chronicling America#
This notebook is meant to teach you some of the basic API tasks you’ll encounter using the loc.gov API.
Feel free to download this notebook and run with your own search query.
The first thing you want to do is import modules. Modules contain Python definitions and statements that you will need in order for the scripts to run.
Importing Modules#
The following modules are some of the most common you will be needing.
To import the modules you need for this notebook, simply run the following code below.
import requests
import pandas as pd
import os
import pprint
import re
Define and perform a query#
After importing modules, you will typically define your query by pasting your API Query URL.
In more complicated queries, you would also run functions which can refine your search.
Your results will need to be read in JSON format so make sure “&fo=json” is at the end of your API Query URL.
Paste your Search Query URL into the
locgov_url_search = '{URL}'
Make sure the search query URL has
&fo=json
at the endWhen ready, Run the code.
# Define your query URL
locgov_url_search = 'https://www.loc.gov/collections/chronicling-america/?dl=page&end_date=1924-12-31&ops=PHRASE&qs=clara+bow&searchType=advanced&start_date=1924-10-01&location_state=california&fo=json'
# Run the query using the API
api_query = requests.get(locgov_url_search)
# Tell Python to read the results as json
search_result = api_query.json()
Print out Metadata from the first result#
After performing the query, typically you want to perform some sort of task.
For the task here, you will print out the first results of the search query in JSON format. Notice the metadata elements on the left of each row.
Simply press run. If you would rather see the second result, change the 0 to 1 before you press run.
# Look at the first record in the results section
first_result = search_result['results'][0]
pprint.pprint(first_result)
{'access_restricted': False,
'aka': ['http://www.loc.gov/resource/sn92070146/1924-11-20/ed-1/?sp=6'],
'batch': ['curiv_ocotillo_ver01'],
'campaigns': [],
'composite_location': ['0/united states/',
'1/united states/california/',
'2/united states/california/imperial/',
'3/united states/california/imperial/el centro/'],
'contributor': ['university of california, riverside; riverside, ca'],
'date': '1924-11-20',
'dates': ['1924-11-20'],
'description': ['6 PURPORTED LOVE AFFAIRS OF WOODROW WILSON LABELED AS BASE '
'LIES BY NOTED EDITOR EMpORIA Kan Nov 20The flew of Woodrow '
'Wilson Amer nN wn r f I1P president to Mrs fare Peck which '
'were mentioned 1 Wilsons so vailed love fairs contain the '
'casual friend gossipy self expression of a de ni einrin to '
'an honest woman I illiam Allen White noted editor d auilior '
'declared in an inter w with the United Press here duv r r '
'White in discussing the use of le WilsonPeck letters '
'incident in is recent book The Life of Wood on Wilson '
'declared he included I because it is revealing and be I ause '
'honesty and candor seems to require it gc All the miserable '
'buzzing and Hie Whole line of myths about Wilsons love '
'affairs were ut terly false the Emporia editor de elafed I '
'Mj Wilson was the victim of a foul slander arising out of '
'the Bruriotlk mob psychology of salacious minds I Mrs Peck '
'was on friendly and affectionate terms with the Wilson '
'family I knew her before her di BvoreJrqm Air Peek She is a '
'wo linan of the most dainty and exqui site spirit joyous '
'innocent beau'],
'digitized': True,
'extract_timestamp': '2023-09-04T23:04:51.792Z',
'group': ['ndnp/curiv',
'university-of-california-riverside-riverside-ca-awardee'],
'hassegments': False,
'id': 'http://www.loc.gov/resource/sn92070146/1924-11-20/ed-1/?sp=6',
'image_url': ['https://tile.loc.gov/image-services/iiif/service:ndnp:curiv:batch_curiv_ocotillo_ver01:data:sn92070146:00414189696:1924112001:1270/full/pct:6.25/0/default.jpg#h=462&w=365',
'https://tile.loc.gov/image-services/iiif/service:ndnp:curiv:batch_curiv_ocotillo_ver01:data:sn92070146:00414189696:1924112001:1270/full/pct:12.5/0/default.jpg#h=925&w=730',
'https://tile.loc.gov/text-services/word-coordinates-service?segment=/service/ndnp/curiv/batch_curiv_ocotillo_ver01/data/sn92070146/00414189696/1924112001/1270.xml&format=alto_xml'],
'index': 1,
'language': ['english'],
'location': ['imperial county',
'united states',
'imperial',
'san diego county',
'california',
'el centro'],
'location_city': ['el centro'],
'location_country': ['united states'],
'location_county': ['imperial'],
'location_state': ['california'],
'mime_type': ['image/jp2',
'application/pdf',
'text/xml',
'image/jpeg',
'image/jpeg',
'application/json',
'text/plain'],
'number': ['sn92070146', '0000000006', '1', '0', '4', '6', '9', '8'],
'number_edition': ['1'],
'number_lccn': ['sn92070146'],
'number_page': ['0000000006'],
'number_reel': ['0', '0', '4', '1', '4', '1', '8', '9', '6', '9', '6'],
'online_format': ['image', 'pdf', 'online text'],
'original_format': ['newspaper'],
'other_title': ['Imperial Valley press and El Centro progress',
'Post-press',
'Imperial Valley news-press',
'Imperial Valley press/Brawley news',
'Morning post-Imperial Valley press',
'At head of title: Extra Imperial Valley press'],
'page_coordinate_data': {'coords_list': [[18161.0, 8627.0, 378.0, 104.0],
[18638.0, 8622.0, 335.0, 128.0],
[19748.0, 4091.0, 326.0, 128.0]],
'height': '29600',
'relevant_snippet': '... “doubling” for an '
"'induction coil, "
'[[tag]]Clara[[/tag]]. '
'[[tag]]Bow[[/tag]], ingenue '
'film actress, and considered '
"'one of Important Notice BARNES "
'GREATS.RING 1 MB OwiNGr TO THE '
'UNUSUAL AMOUNT OF PREPARATION '
'NECESSARY TOO THE— PRODUCTION '
'OF" TI—IE— AAArriVE '
"JFPECTAC.LE- IZ^^HECOURT'OF "
'QUEEN ANNE, AN C OUR. DCS iCiEZ '
'TO START TWC PE7RFOR.M AN/CC '
'PDONAOTLN\\ A/O .STREET PARADE '
'i WIUV BE GIVtW THM* 111111 1 '
"L- ——T HI UIIB.W T'BMm El "
'Centro 9J.f 11 Monday, Nov. LU. '
'GRAND STAND CHAIR SEATS ON SALE '
'AT VALLEY DRUG CO. Thursday, '
'Novrnhvr 0. H e fed*. burned on '
'one the fllniin" f tonight at '
"The n -i.| r freak electric g' "
'platinum ring -.1 buckle termed '
"I' a complete t the girl's Im "
'It. ...',
'searchTerms': {'bow': [[18638.0,
8622.0,
335.0,
128.0],
[19748.0,
4091.0,
326.0,
128.0]],
'clara': [[18161.0,
8627.0,
378.0,
104.0]]},
'width': '23372'},
'page_id': 'sn92070146-1924-11-20-ed-1-1270',
'partof': ['imperial valley press (el centro, calif.) 1907-current',
'serial and government publications division',
'chronicling america'],
'partof_collection': ['chronicling america'],
'partof_division': ['serial and government publications division'],
'partof_title': ['imperial valley press (el centro, calif.) 1907-current'],
'publication_frequency': ['daily'],
'resources': [{'files': 1,
'url': 'https://www.loc.gov/resource/sn92070146/1924-11-20/ed-1/?sp=6'}],
'segmentof': ['http://www.loc.gov/resource/sn92070146/1924-11-20/ed-1/'],
'shelf_id': '6',
'site': ['chroniclingamerica'],
'subject': ['imperial county',
'united states',
'el centro',
'san diego county (calif.)',
'newspapers',
'imperial county (calif.)',
'san diego county',
'california',
'imperial'],
'timestamp': '2023-09-05T23:44:39.604Z',
'title': 'Image 6 of Imperial Valley press (El Centro, Calif.), November 20, '
'1924',
'type': ['segment'],
'url': 'https://www.loc.gov/resource/sn92070146/1924-11-20/ed-1/?sp=6&q=clara+bow',
'word_coordinates_url': 'https://tile.loc.gov/text-services/word-coordinates-service?q=clara+bow&relevant_snippet=1&segment=%2Fservice%2Fndnp%2Fcuriv%2Fbatch_curiv_ocotillo_ver01%2Fdata%2Fsn92070146%2F00414189696%2F1924112001%2F1270.xml&format=alto_xml'}
In the previous step, you created a JSON output of the metadata.
If you want to print out specific metadata from the JSON output, here is an example with some of the most common metadata fields for newspapers:
Press run to print. You can also go through the JSON to test other metadata elements.
print('Page/Title Description:')
pprint.pprint(first_result["title"])
print('\nDate:')
pprint.pprint(first_result["date"])
print('\nEdition Number:')
pprint.pprint(first_result["number_edition"])
print('\nNewspaper Title:')
pprint.pprint(first_result["partof_title"])
print('\nLCCN:')
pprint.pprint(first_result["number_lccn"])
print('\nFrequency:')
pprint.pprint(first_result["publication_frequency"])
print('\nState:')
pprint.pprint(first_result["location_state"])
print('\nCounty')
pprint.pprint(first_result["location_county"])
print('\nCity:')
pprint.pprint(first_result["location_city"])
print('\nLanguage:')
pprint.pprint(first_result["language"])
print('\nLink to Newspaper Page:')
pprint.pprint(first_result["id"])
print('\nLink to Newspaper Issue:')
pprint.pprint(first_result["segmentof"])
print('\nCollection:')
pprint.pprint(first_result["partof_collection"])
print('\nBatch Name:')
pprint.pprint(first_result["batch"])
print('\nContributor:')
pprint.pprint(first_result["group"])
Page/Title Description:
'Image 6 of Imperial Valley press (El Centro, Calif.), November 20, 1924'
Date:
'1924-11-20'
Edition Number:
['1']
Newspaper Title:
['imperial valley press (el centro, calif.) 1907-current']
LCCN:
['sn92070146']
Frequency:
['daily']
State:
['california']
County
['imperial']
City:
['el centro']
Language:
['english']
Link to Newspaper Page:
'http://www.loc.gov/resource/sn92070146/1924-11-20/ed-1/?sp=6'
Link to Newspaper Issue:
['http://www.loc.gov/resource/sn92070146/1924-11-20/ed-1/']
Collection:
['chronicling america']
Batch Name:
['curiv_ocotillo_ver01']
Contributor:
['ndnp/curiv', 'university-of-california-riverside-riverside-ca-awardee']
Pagination via API Query#
Other types of useful metadata includes pagination. Pagination can be used to display the # of pages of a Search Query or the current page of a newspaper image.
The pagination results here should be the same as those shown after you performed an Advanced Search on Chronicling America.
This is useful for validating the search query and checking whether there are too many results for your search. If there are too many results for your search, you may encounter issues processing your data.
Note: We recommend keeping search results under 100,000 hits. Please use facets to limit the size of your results and search query or you may be automatically blocked. See Limitations and Rate Limits for more information.
Simply run the codes below.
The first will give you the pagination metadata of the search result.
search_result['pagination']
{'current': 1,
'first': None,
'from': 1,
'last': None,
'next': None,
'of': 2,
'page_list': [{'number': 1, 'url': None}],
'perpage': 40,
'perpage_options': [20, 40, 80, 160],
'previous': None,
'results': '1 - 2',
'to': 2,
'total': 1}
The second will print out the pagination metadata with more context.
print('Current page:')
print(search_result['pagination']['current'])
print('\nPath to request the next page:')
print(search_result['pagination']['next'])
print('\nTotal number of results:')
print(search_result['pagination']['of'])
print('\nTotal number of results per page:')
print(search_result['pagination']['perpage'] )
print('\nTotal number of pages:')
print(search_result['pagination']['total'])
Current page:
1
Path to request the next page:
None
Total number of results:
2
Total number of results per page:
40
Total number of pages:
1