Archivers API API Reference
Api for Archivers records
API Endpoint
https://api.archivers.space/
Terms of Service: https://archivers.space/terms/api
Request Content-Types: application/json
Response Content-Types: application/json
Schemes: https
Version: 0.0.1
Paths
GET /users
Get Users
Enveloped array of Users
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": [
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"username": "wonder_woman",
"type": "user",
"name": "Diana Prince",
"description": "The Spirit of Truth",
"homeUrl": "https://en.wikipedia.org/wiki/Wonder_Woman",
"currentKey": "358a2a6b8e857836a9410c3ae5285eb5fec6fda7dcb7c78f75b4bada99bceea3"
}
],
"pagination": "object"
}
GET /users/{id}
Get info on a single user
(no description)
Single User
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": {
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"username": "wonder_woman",
"type": "user",
"name": "Diana Prince",
"description": "The Spirit of Truth",
"homeUrl": "https://en.wikipedia.org/wiki/Wonder_Woman",
"currentKey": "358a2a6b8e857836a9410c3ae5285eb5fec6fda7dcb7c78f75b4bada99bceea3"
}
}
GET /primers
List Primers
Enveloped array of Primers
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": [
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"shortTitle": "EPA",
"title": "Environmental Protection Agency",
"description": "The mission of the Environmental Protection Agency is to protect human health and the environment through the development and enforcement of regulations. The EPA is responsible for administering a number of laws that span various sectors, such as agriculture, transportation, utilities, construction, and oil and gas. In the budget for FY 2017, the agency lays out goals to better support communities and address climate change following the President’s Climate Action Plan. Additionally, the agency aims to improve community water infrastructure, chemical plant safety, and collaborative partnerships among federal, state, and tribal levels.\n",
"parent": null,
"subPrimers": "array",
"meta": {
"county": "US",
"primaryLanguage": "en"
},
"stats": "object",
"sources": [
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"title": "epa.gov",
"description": "entire epa site",
"url": "epa.gov",
"primerId": "5b1031f4-38a8-40b3-be91-c324bf686a87",
"crawl": true,
"staleDuration": "integer",
"lastAlertSent": "string",
"meta": "object",
"stats": "object"
}
]
}
],
"pagination": "object"
}
GET /primers/{id}
Get a Primer
(no description)
List
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": {
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"shortTitle": "EPA",
"title": "Environmental Protection Agency",
"description": "The mission of the Environmental Protection Agency is to protect human health and the environment through the development and enforcement of regulations. The EPA is responsible for administering a number of laws that span various sectors, such as agriculture, transportation, utilities, construction, and oil and gas. In the budget for FY 2017, the agency lays out goals to better support communities and address climate change following the President’s Climate Action Plan. Additionally, the agency aims to improve community water infrastructure, chemical plant safety, and collaborative partnerships among federal, state, and tribal levels.\n",
"parent": null,
"subPrimers": "array",
"meta": {
"county": "US",
"primaryLanguage": "en"
},
"stats": "object",
"sources": [
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"title": "epa.gov",
"description": "entire epa site",
"url": "epa.gov",
"primerId": "5b1031f4-38a8-40b3-be91-c324bf686a87",
"crawl": true,
"staleDuration": "integer",
"lastAlertSent": "string",
"meta": "object",
"stats": "object"
}
]
}
}
GET /sources
List Sources
Enveloped Sources
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": [
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"title": "epa.gov",
"description": "entire epa site",
"url": "epa.gov",
"primerId": "5b1031f4-38a8-40b3-be91-c324bf686a87",
"crawl": true,
"staleDuration": "integer",
"lastAlertSent": "string",
"meta": "object",
"stats": "object"
}
],
"pagination": "object"
}
GET /sources/{id}
Get a Source
(no description)
Enveloped Source
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": {
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"title": "epa.gov",
"description": "entire epa site",
"url": "epa.gov",
"primerId": "5b1031f4-38a8-40b3-be91-c324bf686a87",
"crawl": true,
"staleDuration": "integer",
"lastAlertSent": "string",
"meta": "object",
"stats": "object"
}
}
GET /urls
List Urls
Enveloped List of Urls
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": [
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"url": "http://www.epa.gov",
"lastGet": "string (date-time)",
"lastHead": "string (date-time)",
"status": "integer",
"contentType": "text/html; charset=utf-8",
"contentSniff": "text/html; charset=utf-8",
"contentLength": -1,
"title": "United States Environmental Protection Agency, US EPA",
"downloadTook": 0,
"headersTook": 0,
"headers": [
"X-Content-Type-Options",
"nosniff",
"Expires",
"Fri, 24 Feb 2017 21:53:45 GMT",
"Date",
"Fri, 24 Feb 2017 21:53:45 GMT",
"Etag",
"W/\"7f53-549471782bb42\"",
"X-Ua-Compatible",
"IE=Edge,chrome=1",
"X-Cached-By",
"Boost",
"Content-Type",
"text/html; charset=utf-8",
"Vary",
"Accept-Encoding",
"Accept-Ranges",
"bytes",
"Cache-Control",
"no-cache, no-store, must-revalidate, post-check=0, pre-check=0",
"Server",
"Apache",
"Connection",
"keep-alive",
"Strict-Transport-Security",
"max-age=31536000; preload;"
],
"meta": "array",
"hash": "1220459219b10032cc86dcdbc0f83aea15a9d3e1119e7b5170beaee233008ea2c2de",
"contentUrl": "https://content.archivers.space/1220459219b10032cc86dcdbc0f83aea15a9d3e1119e7b5170beaee233008ea2c2de"
}
],
"pagination": "object"
}
GET /urls/{id}
Get a Url
(no description)
Enveloped Url
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": {
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"url": "http://www.epa.gov",
"lastGet": "string (date-time)",
"lastHead": "string (date-time)",
"status": "integer",
"contentType": "text/html; charset=utf-8",
"contentSniff": "text/html; charset=utf-8",
"contentLength": -1,
"title": "United States Environmental Protection Agency, US EPA",
"downloadTook": 0,
"headersTook": 0,
"headers": [
"X-Content-Type-Options",
"nosniff",
"Expires",
"Fri, 24 Feb 2017 21:53:45 GMT",
"Date",
"Fri, 24 Feb 2017 21:53:45 GMT",
"Etag",
"W/\"7f53-549471782bb42\"",
"X-Ua-Compatible",
"IE=Edge,chrome=1",
"X-Cached-By",
"Boost",
"Content-Type",
"text/html; charset=utf-8",
"Vary",
"Accept-Encoding",
"Accept-Ranges",
"bytes",
"Cache-Control",
"no-cache, no-store, must-revalidate, post-check=0, pre-check=0",
"Server",
"Apache",
"Connection",
"keep-alive",
"Strict-Transport-Security",
"max-age=31536000; preload;"
],
"meta": "array",
"hash": "1220459219b10032cc86dcdbc0f83aea15a9d3e1119e7b5170beaee233008ea2c2de",
"contentUrl": "https://content.archivers.space/1220459219b10032cc86dcdbc0f83aea15a9d3e1119e7b5170beaee233008ea2c2de"
}
}
GET /repositories
List Data Repositories
list of repositories
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": [
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"title": "The Internet Archive",
"description": "the archive of the worlds internet",
"url": "https://archive.org"
}
],
"pagination": "object"
}
GET /repository/{id}
get details for a single data repository
(no description)
repository details
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": {
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"title": "The Internet Archive",
"description": "the archive of the worlds internet",
"url": "https://archive.org"
}
}
GET /coverage
Coverage tree of url-based resources
id of the primer to get coverage for
maximum number of children to return
comma-separated list of repository ids to limit coverage results for
url of node to use as the root of the coverage tree, useful for tree traversal in combination with the depth param
covergage tree
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": {
"numLeaves": 5000,
"numLeavesArchived": 3101,
"children": {
"name": "ftp://ftp.epa.gov",
"numLeaves": 3,
"numLeavesArchived": 1,
"numChildren": 2,
"children": [
{
"name": "castnet",
"numLeaves": 2,
"numLeavesArchived": 1,
"numChildren": 1,
"children": [
"..."
]
}
]
},
"coverage": [
{
"repositoryId": "4c0122g5-38a8-40b3-be91-c324bf686a87",
"archived": true,
"priority": 10
},
{
"repositoryId": "4c0122g5-38a8-40b3-be91-c324bf686a87",
"archived": false,
"priority": 0
}
]
},
"pagination": "object"
}
GET /collections
List Collections
Enveloped array of collections
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": {
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"creator": "358a2a6b8e857836a9410c3ae5285eb5fec6fda7dcb7c78f75b4bada99bceea3",
"title": "EPA Volatile Organic Compound Measurements",
"schema": [
"string"
],
"contents": [
"string"
]
},
"pagination": "object"
}
GET /collections/{id}
Get a Collection
(no description)
Enveloped Collection Object
Response Content-Types: application/json
Response Example (200 OK)
{
"meta": "object",
"data": {
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"creator": "358a2a6b8e857836a9410c3ae5285eb5fec6fda7dcb7c78f75b4bada99bceea3",
"title": "EPA Volatile Organic Compound Measurements",
"schema": [
"string"
],
"contents": [
"string"
]
}
}
Schema Definitions
Collection: object
- id: UUID
- created: string (date-time)
-
Created timestamp rounded to seconds in UTC
- updated: string (date-time)
-
Updated timestamp rounded to seconds in UTC
- creator: string
-
sha256 multihash of the public key that created this collection
- title: string
-
human-readable title of the collection
- schema: string[]
-
csv column headers, first value must always be "hash"
- contents: string[]
-
actual collection contents
Example
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"creator": "358a2a6b8e857836a9410c3ae5285eb5fec6fda7dcb7c78f75b4bada99bceea3",
"title": "EPA Volatile Organic Compound Measurements",
"schema": [
"string"
],
"contents": [
"string"
]
}
CoverageTree: object
Tree of url components showing coverage information
- numLeaves: integer
-
number of nodes in the tree that have no children. full url endpoints are leaves.
- numLeavesArchived: integer
-
number of leaves that have been archived by at least one data repository
- children: object[]
-
array of nodes that are a child of this node
- coverage: object[]
-
array of services that have coverage information for this node
Example
{
"numLeaves": 5000,
"numLeavesArchived": 3101,
"children": {
"name": "ftp://ftp.epa.gov",
"numLeaves": 3,
"numLeavesArchived": 1,
"numChildren": 2,
"children": [
{
"name": "castnet",
"numLeaves": 2,
"numLeavesArchived": 1,
"numChildren": 1,
"children": [
"..."
]
}
]
},
"coverage": [
{
"repositoryId": "4c0122g5-38a8-40b3-be91-c324bf686a87",
"archived": true,
"priority": 10
},
{
"repositoryId": "4c0122g5-38a8-40b3-be91-c324bf686a87",
"archived": false,
"priority": 0
}
]
}
DataRepository: object
representation of a place that keeps an archive of data, eg the internet archive
- id: UUID
- created: string (date-time)
-
timestamp of first addition to the database rounded to seconds in UTC.
- updated: string (date-time)
-
timestamp of most recent change rounded to seconds in UTC
- title: string
-
title of the data repository
- description: string
-
description of the repository
- url: string
-
url to home page of the repositoriy
Example
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"title": "The Internet Archive",
"description": "the archive of the worlds internet",
"url": "https://archive.org"
}
Link: object
- created: object (string)
-
created timestamp rounded to seconds in UTC
- updated: object (string)
-
updated timestamp rounded to seconds in UTC
- src: object (string)
-
origin url of the linking document
- dst: object (string)
-
absolute url of the href property
Example
{}
Metadata: object
- id: UUID
- hash: string
-
sha256 multihash of all other fields in metadata as expressed by Metadata.HashableBytes()
- timestamp: string (date-time)
-
Creation timestamp
- keyId: string
-
Sha256 multihash of the public key that signed this metadata
- subject: string
-
Sha256 multihash of the content this metadata is describing
- prev: string
-
Hash value of the metadata that came before this, if any
- meta: object
-
Acutal stored metadata about the subject, can be any valid json Object
Example
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"hash": "string",
"timestamp": "string (date-time)",
"keyId": "string",
"subject": "string",
"prev": "string",
"meta": "object"
}
Primer: object
- id: UUID
- created: string (date-time)
-
timestamp of first addition to the database rounded to seconds in UTC.
- updated: string (date-time)
-
timestamp of most recent change rounded to seconds in UTC
- shortTitle: string
-
shortest possible expression of this primer's name, usually an acronym called shortTitle b/c acronyms collide often & users should feel free to expand on acronyms
- title: string
-
human-readable title of this primer.
- description: string
-
long-form description of this primer.
- parent: object
-
parent primer (if any)
- subPrimers: array
-
child-primers list
- meta: object
-
metadata to associate with this primer
- stats: object
-
statistics about this primer
- sources: Source
-
collection of child sources
Example
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"shortTitle": "EPA",
"title": "Environmental Protection Agency",
"description": "The mission of the Environmental Protection Agency is to protect human health and the environment through the development and enforcement of regulations. The EPA is responsible for administering a number of laws that span various sectors, such as agriculture, transportation, utilities, construction, and oil and gas. In the budget for FY 2017, the agency lays out goals to better support communities and address climate change following the President’s Climate Action Plan. Additionally, the agency aims to improve community water infrastructure, chemical plant safety, and collaborative partnerships among federal, state, and tribal levels.\n",
"parent": null,
"subPrimers": "array",
"meta": {
"county": "US",
"primaryLanguage": "en"
},
"stats": "object",
"sources": [
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"title": "epa.gov",
"description": "entire epa site",
"url": "epa.gov",
"primerId": "5b1031f4-38a8-40b3-be91-c324bf686a87",
"crawl": true,
"staleDuration": "integer",
"lastAlertSent": "string",
"meta": "object",
"stats": "object"
}
]
}
Source: object
Source is a concreate handle for archiving. Crawlers use source's url as a base of a link tree. Sources are connected to a parent Primer to provide context & organization.
- id: UUID
- created: string (date-time)
-
timestamp of first addition to the database rounded to seconds in UTC.
- updated: string (date-time)
-
timestamp of most recent change rounded to seconds in UTC
- title: string
-
human-readable title for this source
- description: string
-
description of the source, ideally one paragraph
- url: string
-
url to serve as boundaries for archiving
- primerId: string
-
primer this source is connected to
- crawl: boolean
-
weather or not this url should be crawled be a web crawler
- staleDuration: integer
-
amount of time before a link within this tree is considered in need of re-checking for changes. currently not in use, but planned.
- lastAlertSent: string
-
yeah this'll probably get depricated. Part of a half-baked alerts feature idea.
- meta: object
-
Metadata associated with this source that should be added to all child urls, currently not in use, but planned
- stats: object
-
Stats about this source
Example
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"title": "epa.gov",
"description": "entire epa site",
"url": "epa.gov",
"primerId": "5b1031f4-38a8-40b3-be91-c324bf686a87",
"crawl": true,
"staleDuration": "integer",
"lastAlertSent": "string",
"meta": "object",
"stats": "object"
}
Url: object
- id: UUID
- created: string (date-time)
-
timestamp of first addition to the database rounded to seconds in UTC.
- updated: string (date-time)
-
timestamp of most recent change rounded to seconds in UTC
- url: string
-
URI string without any normalization. Url strings must always be absolute. unique to each entry
- lastGet: string (date-time)
-
timestamp for most recent GET request
- lastHead: string (date-time)
-
timestamp for most recent HEAD request
- status: integer
-
latest returned HTTP response status code
- contentType: string
-
latest returnd 'Content-Type' HTTP header
- contentSniff: string
-
Result of mime sniffing to GET response body, as detailed at https://mimesniff.spec.whatwg.org
- contentLength: integer
-
server-specified ContentLength header
- title: string
-
HTML Title tag attribute
- downloadTook: integer
-
Time remote server took to transfer content in miliseconds.
- headersTook: integer
-
Time taken to in miliseconds. currently not implemented
- headers: string
-
key-value array of returned headers from most recent HEAD or GET request stored in the form [key,value,key,value...]
- meta: array
-
any associative metadata for this url
- hash: string
-
Hash is a multihash sha-256 of the response body of a GET request
- contentUrl: string
-
Url to saved content
Example
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"url": "http://www.epa.gov",
"lastGet": "string (date-time)",
"lastHead": "string (date-time)",
"status": "integer",
"contentType": "text/html; charset=utf-8",
"contentSniff": "text/html; charset=utf-8",
"contentLength": -1,
"title": "United States Environmental Protection Agency, US EPA",
"downloadTook": 0,
"headersTook": 0,
"headers": [
"X-Content-Type-Options",
"nosniff",
"Expires",
"Fri, 24 Feb 2017 21:53:45 GMT",
"Date",
"Fri, 24 Feb 2017 21:53:45 GMT",
"Etag",
"W/\"7f53-549471782bb42\"",
"X-Ua-Compatible",
"IE=Edge,chrome=1",
"X-Cached-By",
"Boost",
"Content-Type",
"text/html; charset=utf-8",
"Vary",
"Accept-Encoding",
"Accept-Ranges",
"bytes",
"Cache-Control",
"no-cache, no-store, must-revalidate, post-check=0, pre-check=0",
"Server",
"Apache",
"Connection",
"keep-alive",
"Strict-Transport-Security",
"max-age=31536000; preload;"
],
"meta": "array",
"hash": "1220459219b10032cc86dcdbc0f83aea15a9d3e1119e7b5170beaee233008ea2c2de",
"contentUrl": "https://content.archivers.space/1220459219b10032cc86dcdbc0f83aea15a9d3e1119e7b5170beaee233008ea2c2de"
}
User: object
- id: UUID
- created: string (date-time)
-
timestamp of first addition to the database rounded to seconds in UTC.
- updated: string (date-time)
-
timestamp of most recent change rounded to seconds in UTC
- username: string
-
handle for the user. min 1 character, max 80. composed of [_,-,a-z,A-Z,1-9]
- type: string
-
specifies weather this is a user or an organization
- name: string
-
user name field. could be first[space]last, but not strictly enforced
- description: string
-
user-filled description of self
- homeUrl: string
-
url this user wants the world to click
- currentKey: string
-
sh256 multihash of public key that this user is currently using for signatures
Example
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"created": "string (date-time)",
"updated": "string (date-time)",
"username": "wonder_woman",
"type": "user",
"name": "Diana Prince",
"description": "The Spirit of Truth",
"homeUrl": "https://en.wikipedia.org/wiki/Wonder_Woman",
"currentKey": "358a2a6b8e857836a9410c3ae5285eb5fec6fda7dcb7c78f75b4bada99bceea3"
}
Snapshot: object
- url: string
-
The url that was requested
- created: string (date-time)
-
Time this request was issued
- status: integer
-
Returned Status
- duration: integer
-
Time to complete response in milliseconds
- headers: string
-
Record of all returned headers in [key,value,key,value...]
- hash: string
-
Multihash of response body (if any)
Example
{
"url": "string",
"created": "string (date-time)",
"status": "integer",
"duration": "integer",
"headers": "string",
"hash": "string"
}
Uncrawlable: object
- id: UUID
- url: string
-
url from urls table, must be unique
- created: string
-
Created timestamp rounded to seconds in UTC
- updated: string
-
Updated timestamp rounded to seconds in UTC
- creator: string
-
sha256 multihash of the public key that created this uncrawlable
- name: string
-
name of person making submission
- email: string
-
email address of person making submission
- eventName: string
-
name of data rescue event where uncrawlable was added
- agency: string
-
agency name
- agencyId: string
-
EDGI agency Id
- subagencyId: string
-
EDGI subagency Id
- orgId: string
-
EDGI organization Id
- suborgId: string
-
EDGI Suborganization Id
- subprimerId: string
-
EDGI subprimer Id
- ftp: boolean
-
flag for ftp content
- database: boolean
-
flag for 'database'
- interactive: boolean
-
flag for visualization / interactive content obfuscating data
- manyFiles: boolean
-
flag for a page that links to many files
- comments: string
-
uncrawlable comments
Example
{
"id": "c98255ce-30a2-4fe5-94a6-7e6ec08a46ec",
"url": "string",
"created": "string",
"updated": "string",
"creator": "string",
"name": "string",
"email": "string",
"eventName": "string",
"agency": "string",
"agencyId": "string",
"subagencyId": "string",
"orgId": "string",
"suborgId": "string",
"subprimerId": "string",
"ftp": "boolean",
"database": "boolean",
"interactive": "boolean",
"manyFiles": "boolean",
"comments": "string"
}