Skip to main content
Version: v1.4.0

Creating parsers

Foreword

This documentation assumes you're trying to create a parser for crowdsec with the intent of submitting to the hub, and thus create the associated functional testing. The creation of said functional testing will guide our process and will make it easier.

We're going to create a parser for the imaginary service "myservice" that produce three types of logs via syslog :

Dec  8 06:28:43 mymachine myservice[2806]: bad password for user 'toto' from '1.2.3.4'
Dec 8 06:28:43 mymachine myservice[2806]: unknown user 'toto' from '1.2.3.4'
Dec 8 06:28:43 mymachine myservice[2806]: accepted connection for user 'toto' from '1.2.3.4'

As we are going to parse those logs to further detect bruteforce and user-enumeration attacks, we're simply going to "discard" the last type of logs.

Pre-requisites

  1. Create a local test environment

  2. Clone the hub

git clone https://github.com/crowdsecurity/hub.git

Create our test

From the root of the hub repository :

▶ cscli hubtest create myservice-logs --type syslog

Test name : myservice-logs
Test path : /home/dev/github/hub/.tests/myservice-logs
Log file : /home/dev/github/hub/.tests/myservice-logs/myservice-logs.log (please fill it with logs)
Parser assertion file : /home/dev/github/hub/.tests/myservice-logs/parser.assert (please fill it with assertion)
Scenario assertion file : /home/dev/github/hub/.tests/myservice-logs/scenario.assert (please fill it with assertion)
Configuration File : /home/dev/github/hub/.tests/myservice-logs/config.yaml (please fill it with parsers, scenarios...)

Configure our test

Let's add our parser to the test configuration (.tests/myservice-logs/config.yaml). He specify that we need syslog-logs parser (because myservice logs are shipped via syslog), and then our custom parser.

parsers:
- crowdsecurity/syslog-logs
- ./parsers/s01-parse/crowdsecurity/myservice-logs.yaml
scenarios:
postoverflows:
log_file: myservice-logs.log
log_type: syslog
ignore_parsers: false

note: as our custom parser isn't yet part of the hub, we specify its path relative to the root of the hub directory

Parser creation : skeleton

For the sake of the tutorial, let's create a very simple parser :

filter: 1 == 1
debug: true
onsuccess: next_stage
name: crowdsecurity/myservice-logs
description: "Parse myservice logs"
grok:
#our grok pattern : capture .*
pattern: ^%{DATA:some_data}$
#the field to which we apply the grok pattern : the log message itself
apply_on: message
statics:
- parsed: is_my_service
value: yes
  • a filter : if the expression is true, the event will enter the parser, otherwise, it won't
  • a onsuccess : defines what happens when the event was successfully parsed : shall we continue ? shall we move to next stage ? etc.
  • a name & a description
  • some statics that will modify the event
  • a debug flag that allows to enable local debugging information
  • a grok pattern to capture some data in logs

We can then "test" our parser like this :

▶ cscli hubtest run myservice-logs
INFO[01-10-2021 12:41:21 PM] Running test 'myservice-logs'
WARN[01-10-2021 12:41:24 PM] Assert file '/home/dev/github/hub/.tests/myservice-logs/parser.assert' is empty, generating assertion:

len(results) == 2
len(results["s00-raw"]["crowdsecurity/syslog-logs"]) == 3
results["s00-raw"]["crowdsecurity/syslog-logs"][0].Success == true
...
len(results["s01-parse"]["crowdsecurity/myservice-logs"]) == 3
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Success == true
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["program"] == "myservice"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["timestamp"] == "Dec 8 06:28:43"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["is_my_service"] == "yes"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["logsource"] == "syslog"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["message"] == "bad password for user 'toto' from '1.2.3.4'"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["some_data"] == "bad password for user 'toto' from '1.2.3.4'"
...


Please fill your assert file(s) for test 'myservice-logs', exiting

What happened here ?

  • Our logs have been processed by syslog-logs parser and our custom parser
  • As we have no existing assertion(s), cscli hubtest kindly generated some for us

This mostly allows us to ensure that our logs have been processed by our parser, even if it's useless in its current state. Further inspection can be seen with cscli hubtest explain :

▶ cscli hubtest explain myservice-logs
line: Dec 8 06:28:43 mymachine myservice[2806]: bad password for user 'toto' from '1.2.3.4'
├ s00-raw
| └ 🟢 crowdsecurity/syslog-logs
└ s01-parse
└ 🟢 crowdsecurity/myservice-logs

line: Dec 8 06:28:43 mymachine myservice[2806]: unknown user 'toto' from '1.2.3.4'
├ s00-raw
| └ 🟢 crowdsecurity/syslog-logs
└ s01-parse
└ 🟢 crowdsecurity/myservice-logs

line: Dec 8 06:28:43 mymachine myservice[2806]: accepted connection for user 'toto' from '1.2.3.4'
├ s00-raw
| └ 🟢 crowdsecurity/syslog-logs
└ s01-parse
└ 🟢 crowdsecurity/myservice-logs

We can see that our log lines were successfully parsed by both syslog-logs and myservice-logs parsers.

Parser creation : actual parser

Let's modify our parser, ./parsers/crowdsecurity/s01-parse/myservice-logs.yaml :

onsuccess: next_stage
filter: "evt.Parsed.program == 'myservice'"
name: crowdsecurity/myservice-logs
description: "Parse myservice logs"
#for clarity, we create our pattern syntax beforehand
pattern_syntax:
MYSERVICE_BADPASSWORD: bad password for user '%{USERNAME:user}' from '%{IP:source_ip}' #[1]
MYSERVICE_BADUSER: unknown user '%{USERNAME:user}' from '%{IP:source_ip}' #[1]
nodes:
#and we use them to parse our two type of logs
- grok:
name: "MYSERVICE_BADPASSWORD" #[2]
apply_on: message
statics:
- meta: log_type #[3]
value: myservice_failed_auth
- meta: log_subtype
value: myservice_bad_password
- grok:
name: "MYSERVICE_BADUSER" #[2]
apply_on: message
statics:
- meta: log_type #[3]
value: myservice_failed_auth
- meta: log_subtype
value: myservice_bad_user
statics:
- meta: service #[3]
value: myservice
- meta: username
expression: evt.Parsed.user
- meta: source_ip #[1]
expression: "evt.Parsed.source_ip"

Various changes have been made here :

  • We created to patterns to capture the two relevant type of log lines, Using an online grok debugger or an online regex debugger [2] )
  • We keep track of the username and the source_ip (Please note that setting the source_ip in evt.Meta.source_ip and evt.Parsed.source_ip is important [1])
  • We setup various statics information to classify the log type [3]

Let's run out tests again :

▶ cscli hubtest run myservice-logs                    
INFO[01-10-2021 12:49:56 PM] Running test 'myservice-logs'
WARN[01-10-2021 12:49:59 PM] Assert file '/home/dev/github/hub/.tests/myservice-logs/parser.assert' is empty, generating assertion:

len(results) == 2
len(results["s00-raw"]["crowdsecurity/syslog-logs"]) == 3
results["s00-raw"]["crowdsecurity/syslog-logs"][0].Success == true
...
len(results["s01-parse"]["crowdsecurity/myservice-logs"]) == 3
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Success == true
...
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["timestamp"] == "Dec 8 06:28:43"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["program"] == "myservice"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["source_ip"] == "1.2.3.4"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["user"] == "toto"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Meta["log_subtype"] == "myservice_bad_password"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Meta["log_type"] == "myservice_failed_auth"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Meta["service"] == "myservice"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Meta["source_ip"] == "1.2.3.4"
results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Meta["username"] == "toto"
...
results["s01-parse"]["crowdsecurity/myservice-logs"][1].Evt.Meta["log_subtype"] == "myservice_bad_user"
results["s01-parse"]["crowdsecurity/myservice-logs"][2].Success == false


Please fill your assert file(s) for test 'myservice-logs', exiting

We can see that our parser captured all the relevant information, and it should be enough to create scenarios further down the line.

Again, further inspection with cscli hubtest explain will show us more about what happened :

▶ cscli hubtest explain myservice-logs
line: Dec 8 06:28:43 mymachine myservice[2806]: bad password for user 'toto' from '1.2.3.4'
├ s00-raw
| └ 🟢 crowdsecurity/syslog-logs
└ s01-parse
└ 🟢 crowdsecurity/myservice-logs

line: Dec 8 06:28:43 mymachine myservice[2806]: unknown user 'toto' from '1.2.3.4'
├ s00-raw
| └ 🟢 crowdsecurity/syslog-logs
└ s01-parse
└ 🟢 crowdsecurity/myservice-logs

line: Dec 8 06:28:43 mymachine myservice[2806]: accepted connection for user 'toto' from '1.2.3.4'
├ s00-raw
| └ 🟢 crowdsecurity/syslog-logs
└ s01-parse
└ 🔴 crowdsecurity/myservice-logs

note: we can see that our log line accepted connection for user 'toto' from '1.2.3.4' wasn't parsed by crowdsecurity/myservice-logs as we have no pattern for it

Closing word

We have now a fully functional parser for myservice logs ! We can either deploy it to our production systems to do stuff, or even better, contribute to the hub !

If you want to know more about directives and possibilities, take a look at the parser reference documentation !

See as well this blog article on the topic.