Service for maintaining and validating against a known set of malware url's
The malware URL validation service provides a lookup service to determine if a given URL domain has been flagged to be a source of maleware. The supported WEB API endpoints are the folloiwng:
-
Get Query to determine if URL is flagged as malware
-
Post to add URL to list of domains flagged as malware
Validation Service Overview
The validation service can be run with a standalone colocated in memory url cache, or can be run with a distributed multi node cache. The distributed cache configuration can run with one or more URL cache service instances that will distribute the responsibility for any particular URL over the set of running cache service instance nodes. It is also possible to run more than one URL cache service per physical server if the desire is to reduce size of cache structures on URL cache services in the interest of speeding up lookup times.
It is also possible to run multiple validation servers on different physical servers when running in distributed mode.
For the distributed node cache configuration, the addressing for any particular URL is based on a CRC hash of the URL which is then modded by the number of nodes to find the correct "home" node.
The current implimentation does not provide cache redundancy, but this could be added in to allow service redundancy. Utilizing cache redundancy with a round robin access scheme could also be a way to decrease lookup times if cache access is becoming a bottleneck as the URL data set grows in size.
The validation functionality can also be run as Go client code from within an application, and use direct gRPC calls to the URL cache service instances. Similarily as noted above, multiple validation servers can be run as well when running in distributed mode.
In order to run the validation service, and any associated cache service instances, it will be necessary to have the go runtime environment installed on the target system, with the GOBIN environment variable set. To run the service from the command line, perform following steps:
Running URL Cache Service(s)
Only required for distributed URL cache operation.
If running in distributed fashion, it will be necessary to start one or more URL cache service instances, ideally one per physical server, in order to utilize distributed memory provided by these machines. A comma seperated list of the hostname and port in hostname:port syntax must be provided to the validation service on startup.
cd url_cache_service
go install .
url_cache_service <listening port>
ex. url_cache_service 7789
Running Validation Service
For colocated cache operation, the validation service takes no arguments. In order for the validation service to find the cache service nodes, a comma separated list of the hostnames for the nodes must be provided to the validation service on startup. The order of the url cache server list forms the list used for node access, so needs to be same for all clients or validation servers.
cd validation_service
go install .
validation_server <hostname1:port>,<hostname2:port>...<hostnameN:port>
ex. validation_server 192.168.1.1:7789,192.168.1.2:7789
or
validation_server localhost:7787,localhost:7788,localhost:7789
The validation service will be bound to port 8080 on localhost by default, so currently can not be run colocated with another web server running on port 8080. Network configuration on the host machines will need to be setup in order to allow gRPC connections to specified port using hostnames or ip addresses. Connections can be made using localhost, so testing of multiple url cache servers can be achieved by starting multiple instances with different listening ports on the same machine as the validation service and using "localhost:" syntax in the list of url cache servers on the command line.
Once the service is running, the curl tool can be used to send requests to the server. If running in zsh, it is necessary to use quotations around the entire get URL info request,
Get URL info curl example
curl "http://localhost:8080/v1/urlinfo?url=www.yahoo.com"
Add URL curl example
curl http://localhost:8080/v1/urladd -d url=www.yahoo.com