Building a High Performance Geoprocessing API Solution Using ArcGIS Enterprise, Node, and Turf. js
Esri geoprocessing services can fail to perform under heavy load and require costly architecture scaling to resolve. We propose a new model for writing geoprocessing as a service with Node and Turf.js that still interacts with the original data provided as Esri feature services. This model was shown to produce significantly greater performance under heavy traffic than the traditional Esri Python geoprocessing service paradigm.
Background
When building web mapping applications in the Esri domain, we accomplish geospatial analysis by writing geoprocessing services. Geoprocessing services are Python scripts served through ArcGIS Server that leverage the powerful analytic capabilities provided in Esri's desktop environments (ArcGIS Desktop or ArcGIS Pro), and exposing them to the web. Common examples of geoprocessing services are:
Generating drive time polygons
Finding nearby locations in another dataset
Clipping data to an input geometry
Interpolating points
Creating buffers
Problem
For a recent project, we needed to expose various geospatial analysis capabilities to the web to support client-side workflows. For example:
given a point, determine the boundary it is located in
given a point, determine the nearest road
As you can see, the questions being asked in these examples are not complex. Writing Python solutions in ArcGIS Pro is trivial, as is sharing the result as a web tool. We were successful with our initial testing of these geoprocessing services and we easily implemented the solutions into a client-side demo application.
We initially observed response times for the services using the network analysis tab in Chrome DevTools. When requests to the services were spaced out at regular (>10 seconds) intervals using the client web app, response times were returned within an acceptable time (2-4 seconds). However, attempts to send requests at shorter intervals (<3 seconds) would cause the responses to get backed up, producing longer wait times in the unacceptable range (>30 seconds).
We built the geoprocessing services to support ~3 requests per second during peak times. We wrote a load testing script using Locust, an open source load testing solution for Python, to better understand response times. As you can see in the screenshot below of Locust's dashboard, response times started to exceed 30 seconds after reaching the desired 3 requests per second threshold.
Our first approach to resolve these unsatisfactory response times was to apply Esri's performance tips for geoprocessing services. As suggested in the article, we made geoprocessing data accessible locally to ArcGIS Server, intermediate data was written to memory, and we reduced data size by simplifying geometries and removing unnecessary fields. Lastly, we attempted to adjust the geoprocessing service settings to maximize access to resources. These updates did not have any significant reductions in response times.
To determine a baseline of how fast the geoprocessing services could be, we wrote and deployed a Hello World! script as a service. The service did not do any processing or require access to data, it simply returned the message Hello World! back to the client. Load testing results showed similar response times in the unacceptable range by hitting only the new service. These results would indicate the performance was not being affected by inefficient analysis methods or data issues, but rather the overhead of geoprocessing services themselves.
At this point, the only logical solution to lower response times for geoprocessing services would be to scale the architecture horizontally (adding more machines) or vertically (adding more CPU/RAM). Both scenarios would have significant cost implications, while the existing architecture was already built to be robust and highly available.
Our next step in attempting to lower response times was to rethink how we approach geospatial analysis.
Solution
Our primary requirement for developing a replacement for Esri's geoprocessing services was that it would need to work with the existing Esri ArcGIS Enterprise architecture. Access to data for web maps and applications was provided via Hosted Feature Layers, and we did not want to duplicate the data in another environment or modify the existing data delivery method.
The solution must provide a complete abstraction to the analysis. In other words, we did not want the client workflows to include any logic regarding how the data is accessed or how geospatial analysis is performed.
Ultimately, we needed to replicate a similar experience to working with geoprocessing services but have faster response times.
Test Case A: Query Feature Services directly
Esri feature services allow some interaction with the data through the query endpoint. The query endpoint is part of the ArcGIS REST API that exposes a set of endpoints, allowable query parameters, and outputs formats for developers.
The first test is to determine if the query endpoint would provide a complete solution for replacing geoprocessing services (for the questions specific to this project).
The query endpoint allows us to get a response in either json or geojson based on our specific query. There are several useful query parameters that we can submit with our request to limit the data returned to our specific needs:
where - a where clause, such as where = population > 1500
geometry and geometryType - geometry to use as a spatial filter, such as geometryType=esriGeometryEnvelope&geometry=-104,35.6,-94.32,41
With these parameters, we can interact directly with the layer to ask questions like:
given a point, return the feature (if any) it intersects with
given a bounding box polygon, return the features (if any) within it
Benefits
Interacting directly with the query endpoint is a powerful functionality of ArcGIS feature services.
Developers can perform basic spatial and SQL queries to limit the data contained in the response to a desired set of parameters.
Limitations
Query endpoints for feature services do not perform any advanced geospatial analysis, such as buffering the results or calculating distances from the input point to each of the returned features. Any geospatial analysis would need to be handled by the client.
Does not provide the ability to modify the response before delivering it to the client.
Test Case B: Middleman API with NodeJS
In the previous example, the hosted feature service query operation returns the actual data (geojson or Esri json) as the response. Although this may be desired for some applications, we needed to develop a middleman to intercept the request and format the response before sending it back to the client. An intermediary API would let us parse request parameters, query the hosted feature service, and format the response.
We decided to use Node to develop our middleman API. Node is non-blocking and suited for handling a lot of requests at once. Other languages can be used, but we found Node to be a good fit for this project.
For example, let's say the client needed to know if a location was inside or outside of a defined boundary. The client would send a request to our API with the x coordinate and y coordinate of their location. Our API could then handle the communication to the hosted feature service query endpoint and generate a custom response back to the user based on the result of the query.
The example logic below illustrates how we would use Node to send the coordinates of our client's request to the hosted feature service, specifying that we want the feature that intersects (esriSpatialRelIntersects) with the input point. If there are features that intersect with our point (json.features.length) we return the string "inside". This lets the client know that the location they sent is inside the client boundary.
This example is not a complete solution. It does not represent the API logic, but rather a simplified snippet of how to talk to the ESRI hosted feature service.
Benefits
The client does not need to know how to interact with the complexity of working directly with the hosted feature services.
The API logic can create a custom response.
The API logic can contain additional logic, such as talking to a database, interacting with multiple hosted feature services, etc.
Limitations
The solution is still missing the ability to perform complex spatial analysis.
Test Case C: Write a custom API that uses Turf.js for advanced spatial analysis
The final limitation that we need to overcome is the ability for our solution to perform spatial analysis. Since the response from our query to the hosted feature service can be specified as geojson format, we can use Turf.js to perform this analysis in memory within our Node logic. Turf.js is a Javascript library for advanced geospatial analysis that 'speaks' geojson.
By adding Turf.js to our Node middleman API, we can perform geospatial analysis directly on the geojson returned by the hosted feature service query operation.
For example, we can determine the distance from an input point location to all of the features returned from our query operation using turf.pointToLineDistance. This would be the equivalent of the Generate Near Table previously only accessible in our desktop environment or through geoprocessing services.
Benefits
Can perform advanced spatial analysis.
Limitations
Although Turf.js includes many operations for geospatial analysis, you may find that some of the operations available in ArcPy are difficult to implement or absent from the library.
Performance
Now that we have a new model to perform geospatial analysis as a service using Node and Turf.js, we needed to perform load testing. Again, we used Locust to test our API that performed the same analysis as our Python geoprocessing service.
Our goal was to determine if we could achieve acceptable response times during peak traffic (3 requests per second). The definition of an acceptable response time is arbitrary, but the goal was average response times under 3 seconds.
The graph of response times is below. As you can see, even sending significantly more requests per second (86.9 RPS) than anticipated, our performance stayed at ~200 milliseconds per response.
This greatly exceeded our expectations!
Conclusion
We were able to successfully develop a model for building a scalable geoprocessing service API that served as a replacement for Esri's standard python geoprocessing services, while still working with data served through ArcGIS Enterprise as hosted feature services. The performance of the new model was greatly improved and will meet the needs for our project during any foreseeable traffic demands. All this was done without needing to provide any horizontal or vertical scaling to our original infrastructure.
This same model should serve as a guide for future projects that have significant traffic and performance demands, but still need to operate within the Esri structure of hosted feature services.
Are you in need of advanced geoprocessing implementations for a web application? If so contact us today to see how CartoLab can help you.