Marill -- Automated Site Testing Utility

Marill -- Automated site testing utility.

Build Status GitHub Issues Project Status Codebeat Badge Go Report Card

Marill is an automated site testing utility, which is meant to make administrators lives easier by taking much of the leg-work out of testing. It’s intended to be lightweight, flexible, and easy to use, while still being very powerful.

Table of Contents

Goal

Often times during server administration, migrations, and large server changes, things can and will go wrong. Servers are complex systems with many working parts, and with that comes a lot of breakage.

Creating an automated site testing utility, like Marill, allows:

Features

Disclaimer: Marill is still in early development, and this list is subject to change drastically. (code, libraries, tools, cli-args, etc)

Limitations

There are a few limitations with Marill, due to how the utility was developed. Marill was meant to be lightweight, and portable. This means it cannot work exactly like a normal browser. Below are a few examples:

How does it work?

The general idea is that you place Marill on the server you would like to test. Marill by default will then figure out the list of domains that server is hosting. Marill will then begin to act much like a browser, crawling each site (and all resources like images/css/javascript/etc if --assets is used). It will then pass each resource it fetches through the list of builtin, or external tests. Each domain is given a starting score of 10, and each test has a pre-defined weight. If the test matches, that score is applied to the main score. If the score falls below the minimum configured score, it is considered failed.

Examples

Here are a few examples of tests that are useful:

Example running from my workstation (though, this would be best suited running from the server itself): asciicast

Installation

Check out the releases page for prebuilt versions. Below are example commands of how you would install the utility. Some of the more popular OS/distro steps are provided below, but there are more released versions on the releases page previously mentioned.

Ubuntu/Debian

$ wget https://liam.sh/ghr/marill_0.1.1_linux_amd64.deb
$ dpkg -i marill_0.1.1_linux_amd64.deb

CentOS/Redhat

$ yum localinstall https://liam.sh/ghr/marill_0.1.1_linux_amd64.rpm

Manual Install

$ wget https://liam.sh/ghr/marill_0.1.1_linux_amd64.tar.gz
$ tar -C /usr/bin/ -xzvf marill_0.1.1_linux_amd64.tar.gz marill
$ chmod +x /usr/bin/marill

Build From Source

Dependencies (to build from source only):

# you can "git clone" the repo too, just make sure it's following this directory
# structure.
$ go get -d -u github.com/lrstanley/marill
$ cd $GOPATH/src/github.com/lrstanley/marill
# this will show you all of the available options (to fetch dependencies,
# run in debug mode, etc.)
$ make help
$ make
$ ./marill --help

Usage

This is very likely to change quite a bit until we’re out of beta. Please use wisely.

$ marill --help
NAME:
   marill - Automated website testing utility

USAGE:
   marill [global options] command [command options] [arguments...]

VERSION:
   git revision XXXXXX

AUTHOR(S):
   Liam Stanley <[email protected]>

COMMANDS:
     scan           [DEFAULT] Start scan for all domains on server
     urls, domains  Print the list of urls as if they were going to be scanned
     tests          Print the list of tests that are loaded and would be used
     help, h        Shows a list of commands or help for one command

GLOBAL OPTIONS:
   -d, --debug              Print debugging information to stdout
   -q, --quiet              Do not print regular stdout messages
   --no-color               Do not print with color
   --no-banner              Do not print the colorful banner
   --show-warnings          Show a warning if one or more test failed, even if it didn't drop below min-score
   --exit-on-fail           Send exit code 1 if any domains fail tests
   --log FILE               Log information to FILE
   --debug-log FILE         Log debugging information to FILE
   --result-file FILE       Dump result template into FILE (will overwrite!)
   --no-updates             Don't check to see if there are updates
   --threads n              Use n threads to fetch data (0 defaults to server cores/2) (default: 0)
   --delay DURATION         Delay DURATION before each resource is crawled (e.g. 5s, 1m, 100ms) (default: 0s)
   --http-timeout DURATION  DURATION before an http request is timed out (e.g. 5s, 10s, 1m) (default: 10s)
   --domains DOMAIN:IP ...  Manually specify list of domains to scan in form: DOMAIN:IP ..., or DOMAIN:IP:PORT
   --min-score value        Minimum score for domain (default: 8)
   -a, --assets             Crawl assets (css/js/images) for each page
   --ignore-success         Only print results if they are considered failed
   --allow-insecure         Don't check to see if an SSL certificate is valid
   --tmpl value             Golang text/template string template for use with formatting scan output
   --json PATH              Optional PATH to output json results to
   --json-pretty            Used with [--json], pretty-prints the output json
   --ignore-http            Ignore http-based URLs during domain search
   --ignore-https           Ignore https-based URLs during domain search
   --ignore-remote          Ignore all resources that resolve to a remote IP (use with --assets)
   --ignore-domains GLOB    Ignore URLS during domain search that match GLOB, pipe separated list
   --match-domains GLOB     Allow URLS during domain search that match GLOB, pipe separated list
   --ignore-test GLOB       Ignore tests that match GLOB, pipe separated list
   --match-test GLOB        Allow tests that match GLOB, pipe separated list
   --tests-url URL          Import tests from a specified URL
   --tests-path PATH        Import tests from a specified file-system PATH
   --ignore-std-tests       Ignores all built-in tests (useful with --tests-url)
   --pass-text GLOB         Give sites a +10 score if body matches GLOB
   --fail-text GLOB         Give sites a -10 score if body matches GLOB
   --help, -h               show help
   --version, -v            print the version

COPYRIGHT:
   (c) 2016 Liam Stanley

Getting Started

Getting started with Marill should be fairly easy. Since Marill is a single binary, there are no dependencies that are needed for the utility to run.

Head to this page and download the top item in the list. For example, using the latest version:

$ wget -q -O- https://release.liam.sh/marill/latest.tar.gz | tar -zx -C /root/tmp/

You should now see a file named marill in /root/tmp/. Feel free to look over the current flags and arguments:

$ /root/tmp/marill --help

The main arguments that may be useful are:

So, for example, to start off with:

$ /root/tmp/marill -a

cPanel/Apache based servers

Marill has out of the box support for cPanel based servers (though things like /var/cpanel/users/<user> and /var/cpanel/userdata/<domain>).

For Apache, Marill will find the current running httpd instance, and run <binary> -S, which pulls information about all virtual host entries. Note that this isn’t supported on all Apache versions (see here for more information).

Alternatives (Nginx, Caddy, etc)

If your web server does not match the above description, you can utilize the manual domain list flag of Marill. The current syntax for this is as follows:

$ marill --domains "<items>"

Replace <items> with one of the following list of inputs:

DOMAIN can be any of one of the following examples:

So, to put it all together, you can do something like:

$ marill a --domains "somedomain.com:443 domain.com:1234 example.com:123.456.7.89:80 https://domain.com/"

Things to note/Troubleshooting

FAQ

  1. Will it cause high load?
    • The general target at which this was written for are servers under maintenance, or being ran on a new server that is being migrated to. That being said, Marill does run scans in parallel. It will run scans in parallel in the amount of cores divided by 2. (8 core server, 4 concurrent crawls, 2 core server, 1 crawl at a time). If you see Marill still causing too much load, you can utilize --delay and --threads.
  2. How long does Marill take to crawl sites (e.g. 1,000 sites on a server)?
    • Given a cPanel server, is must be noted that along with the input (default http) version of a domain, the https version of the site will be scanned as well if cPanel has a certificate for it. Furthermore, it will also attempt to crawl www.domain.com, not just domain.com. As for other webservers, it all depends on the input. Please note that using --assets (-a), that Marill will take longer. This is because this fetches all resources for each site being crawled. If you would like Marill to crawl faster, don’t use -a.
    • Generally speaking, crawling without -a is fairly fast.
  3. Is it better to run Marill from inside of the server, or from a remote location?
    • Running remotely ensures there are no ip or firewall related issues, however in the same sense if you are crawling quite a few sites, many servers may assume due to the high connection count, that your connections are malicious.
    • If ran from inside of the server, Marill can scan and determine what the server is hosting (by checking Apache, cPanel, etc).
  4. Can I give Marill a custom IP address for which to crawl a site (beforeit goes live and DNS is updated)?
    • Yes! For example, rather than --domains "domain.com domain2.com", you can do something like:
    $ marill --domains "domain.com:1.2.3.4 domain2.com:2.3.4.5"
    
    • Also note that you can run scans on alternate ports:
    $ marill --domains "domain.com:1.2.3.4:8080 domain.com:9000"
    
  5. Can I give Marill a custom port for which to crawl a site?
    • Yes! see FAQ #4
  6. Can Marill crawl sub-domains and sub-folders?
    • Yes! You can pass any url into --domains as necessary. For example:
    $ marill --domains "https://domain.com/sub/folder/some-page"
    
    • or with a custom ip as well:
    $ marill --domains "https://domain.com/sub/folder/some-page:1.2.3.4"
    

Contributing

Please review the CONTRIBUTING doc for submitting issues/a guide on submitting pull requests and helping out.

License

LICENSE: The MIT License (MIT)
Copyright (c) 2016 Liam Stanley <[email protected]>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.