Ship access log to ElasticSearch
Ship access log to ElasticSearch
This article introduce how to use a custom python script to parse Apache access log and shipping it to ElasticSearch.
If you wan’t store the huge log to ElasticSearch, you should read Using Elasticsearch for logs, Using some popular OpenSource software, like Graylog2, Logstash, Apache Flume.
System basic setup
# Raise the nofiles limit on Linux
if ! grep "\* *soft *nofile *65535" /etc/security/limits.conf; then
cat >> /etc/security/limits.conf > /etc/profile
echo "export PATH=$JAVA_HOME/bin:$PATH" >> /etc/profile
. /etc/profile
Installation and Configuration ElasticSearch
Deploying ElasticSearch on a Cluster(EC2)
cd /opt/
wget -O elasticsearch-0.19.8.tar.gz https://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.19.8.tar.gz
tar zxf elasticsearch-0.19.8.tar.gz
mkdir -p /var/log/elasticsearch /var/data/elasticsearch
cd /opt/elasticsearch-0.19.8/
cat >> config/elasticsearch.yml elasticsearch.sh
. /opt/elasticsearch-0.19.8/bin/elasticsearch.sh
# Install plugin elasticsearch-head
/opt/elasticsearch-0.19.8/bin/plugin -install mobz/elasticsearch-head
# Start ElasticSearch
/opt/elasticsearch-0.19.8/bin/elasticsearch
Cluster status
curl -XGET 'http://*.*.*.*:9200/_cluster/health?pretty=true'; echo
curl -XGET 'http://*.*.*.*:9200/_cluster/state?pretty=true'; echo
curl -XGET 'http://*.*.*.*:9200/_cluster/nodes?pretty=true'; echo
curl -XGET 'http://*.*.*.*:9200/_cluster/nodes/stats?pretty=true'; echo
Schema Mapping
curl -XPUT http://localhost:9200/_template/template_access/ -d '{
"template": "access-*",
"settings": { "number_of_replicas": 1, "number_of_shards": 5 },
"mappings": {
"access": {
"_all": { "enabled": false },
"_source": { "compress": true },
"properties": {
"bytes": { "index": "not_analyzed", "store": "yes", "type": "integer" },
"host": { "index": "analyzed", "store": "yes", "type": "ip" },
"method": { "index": "not_analyzed", "store": "yes", "type": "string" },
"protocol": { "index": "not_analyzed", "store": "yes", "type": "string" },
"referrer": { "index": "not_analyzed", "store": "yes", "type": "string" },
"status": { "index": "analyzed", "store": "yes", "type": "string" },
"timestamp": { "index": "analyzed", "store": "yes", "type": "date" },
"uri": { "index": "not_analyzed", "store": "yes", "type": "string" },
"user-agent": { "index": "not_analyzed", "store": "yes", "type": "string" }
}
}
}
}'
Ship log script
I have put this code on the github source code.
Usage:
cat /var/log/httpd/access_log | python ship_log_into_elasticsearch.py
or
logtail /var/log/httpd/access_log | python ship_log_into_elasticsearch.py
config file:
cat conf/main.cfg
[elasticsearch]
host = localhost:9200
bulk_size = 5000
doc_type = access
code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
abspath = os.path.abspath(os.path.dirname(__file__))
os.chdir(abspath)
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import pyes
import ConfigParser
import time
import datetime
import re
import inspect
import logging
import logging.config
import traceback
# init logging facility
logconf = "conf/logging.cfg"
logging.config.fileConfig(logconf)
# 66.249.73.69 - - [08/Aug/2012:12:10:10 +0400] "GET / HTTP/1.1" 200 23920 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
access_log_pattern = re.compile(
r"(?P[\d\.]+)\s"
r"(?P\S*)\s"
r"(?P\S*)\s"
r"(?P
Related links:
- elasticsearch, ElasticSearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene.
- elasticsearch-head, elasticsearch-head is a web front end for browsing and interacting with an Elastic Search cluster.
- pyes, pyes is a connector to use elasticsearch from python.