Ship access log to ElasticSearch

Posted by 4Aiur on 08/13/2012 in Python |

Ship access log to ElasticSearch

This article introduce how to use a custom python script to parse Apache access log and shipping it to ElasticSearch.
If you wan’t store the huge log to ElasticSearch, you should read Using Elasticsearch for logs, Using some popular OpenSource software, like Graylog2, Logstash, Apache Flume.

System basic setup

# Raise the nofiles limit on Linux
if ! grep "\* *soft *nofile *65535" /etc/security/limits.conf; then
    cat >> /etc/security/limits.conf > /etc/profile
echo "export PATH=$JAVA_HOME/bin:$PATH" >> /etc/profile
. /etc/profile

Installation and Configuration ElasticSearch

Deploying ElasticSearch on a Cluster(EC2)

cd /opt/
wget -O elasticsearch-0.19.8.tar.gz
tar zxf elasticsearch-0.19.8.tar.gz
mkdir -p /var/log/elasticsearch /var/data/elasticsearch
cd /opt/elasticsearch-0.19.8/
cat >> config/elasticsearch.yml
. /opt/elasticsearch-0.19.8/bin/
# Install plugin elasticsearch-head
/opt/elasticsearch-0.19.8/bin/plugin -install mobz/elasticsearch-head
# Start ElasticSearch

Cluster status

curl -XGET 'http://*.*.*.*:9200/_cluster/health?pretty=true'; echo
curl -XGET 'http://*.*.*.*:9200/_cluster/state?pretty=true'; echo
curl -XGET 'http://*.*.*.*:9200/_cluster/nodes?pretty=true'; echo
curl -XGET 'http://*.*.*.*:9200/_cluster/nodes/stats?pretty=true'; echo

Schema Mapping

curl -XPUT http://localhost:9200/_template/template_access/ -d '{
  "template": "access-*",
  "settings": { "number_of_replicas": 1, "number_of_shards": 5 },
  "mappings": {
    "access": {
      "_all": { "enabled": false },
      "_source": { "compress": true },
      "properties": {
        "bytes": { "index": "not_analyzed", "store": "yes", "type": "integer" },
        "host": { "index": "analyzed", "store": "yes", "type": "ip" },
        "method": { "index": "not_analyzed", "store": "yes", "type": "string" },
        "protocol": { "index": "not_analyzed", "store": "yes", "type": "string" },
        "referrer": { "index": "not_analyzed", "store": "yes", "type": "string" },
        "status": { "index": "analyzed", "store": "yes", "type": "string" },
        "timestamp": { "index": "analyzed", "store": "yes", "type": "date" },
        "uri": { "index": "not_analyzed", "store": "yes", "type": "string" },
        "user-agent": { "index": "not_analyzed", "store": "yes", "type": "string" }

Ship log script

I have put this code on the github source code.


cat /var/log/httpd/access_log | python


logtail /var/log/httpd/access_log | python

config file:

cat conf/main.cfg
host = localhost:9200
bulk_size = 5000
doc_type = access


#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
abspath = os.path.abspath(os.path.dirname(__file__))
import sys
import pyes
import ConfigParser
import time
import datetime
import re
import inspect
import logging
import logging.config
import traceback

# init logging facility
logconf = "conf/logging.cfg"

# - - [08/Aug/2012:12:10:10 +0400] "GET / HTTP/1.1" 200 23920 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +"
access_log_pattern = re.compile(

Related links:

  • elasticsearch, ElasticSearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene.
  • elasticsearch-head, elasticsearch-head is a web front end for browsing and interacting with an Elastic Search cluster.
  • pyes, pyes is a connector to use elasticsearch from python.


Copyright © 2010-2024 4Aiur All rights reserved.
This site is using the Desk Mess Mirrored theme, v2.5, from