This project is mirrored from https://github.com/prometheus/prometheus.git. Updated .
  1. 17 Apr, 2014 2 commits
    • Prometheus version 0.4.0. · d2421a69
      Change-Id: I752044a69f86aacc5a7e6da62a6f4a187cf67d27
      Julius Volz authored
    • Fix RWLock memory storage deadlock. · 1b299758
      This fixes https://github.com/prometheus/prometheus/issues/390
      
      The cause for the deadlock was a lock semantic in Go that wasn't
      obvious to me when introducing this bug:
      
      http://golang.org/pkg/sync/#RWMutex.Lock
      
      Key phrase: "To ensure that the lock eventually becomes available, a
      blocked Lock call excludes new readers from acquiring the lock."
      
      In the memory series storage, we have one function
      (GetFingerprintsForLabelMatchers) acquiring an RLock(), which calls
      another function also acquiring the same RLock()
      (GetLabelValuesForLabelName). That normally doesn't deadlock, unless a
      Lock() call from another goroutine happens right in between the two
      RLock() calls, blocking both the Lock() and the second RLock() call from
      ever completing.
      
        GoRoutine 1          GoRoutine 2
        ======================================
        RLock()
        ...                  Lock() [DEADLOCK]
        RLock() [DEADLOCK]   Unlock()
        RUnlock()
        RUnlock()
      
      Testing deadlocks is tricky, but the regression test I added does
      reliably detect the deadlock in the original code on my machine within a
      normal concurrent reader/writer run duration of 250ms.
      
      Change-Id: Ib34c2bb8df1a80af44550cc2bf5007055cdef413
      Julius Volz authored
  2. 16 Apr, 2014 2 commits
    • Separate storage implementation from interfaces. · 01f652cb
      This was initially motivated by wanting to distribute the rule checker
      tool under `tools/rule_checker`. However, this was not possible without
      also distributing the LevelDB dynamic libraries because the tool
      transitively depended on Levigo:
      
      rule checker -> query layer -> tiered storage layer -> leveldb
      
      This change separates external storage interfaces from the
      implementation (tiered storage, leveldb storage, memory storage) by
      putting them into separate packages:
      
      - storage/metric: public, implementation-agnostic interfaces
      - storage/metric/tiered: tiered storage implementation, including memory
                               and LevelDB storage.
      
      I initially also considered splitting up the implementation into
      separate packages for tiered storage, memory storage, and LevelDB
      storage, but these are currently so intertwined that it would be another
      major project in itself.
      
      The query layers and most other parts of Prometheus now have notion of
      the storage implementation anymore and just use whatever implementation
      they get passed in via interfaces.
      
      The rule_checker is now a static binary :)
      
      Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
      Julius Volz authored
    • Parameterize the buffer for marshal/unmarshal. · 3e969a8c
      We are not reusing buffers yet.  This could introduce problems,
      so the behavior is disabled for now.
      
      Cursory benchmark data:
      - Marshal for 10,000 samples: -30% overhead.
      - Unmarshal for 10,000 samples: -15% overhead.
      
      Change-Id: Ib006bdc656af45dca2b92de08a8f905d8d728cac
      Matt T. Proud authored
  3. 15 Apr, 2014 4 commits
    • Clean up quitting behavior and add quit trigger. · 2064f326
      The closing of Prometheus now using a sync.Once wrapper to prevent
      any accidental multiple invocations of it, which could trigger
      corruption or a race condition.  The shutdown process is made more
      verbose through logging.
      
      A not-enabled by default web handler has been provided to trigger a
      remote shutdown if requested for debugging purposes.
      
      Change-Id: If4fee75196bbff1fb1e4a4ef7e1cfa53fef88f2e
      Matt T. Proud authored
    • Correct size of unmarshalling destination buffer. · 6ec72393
      The format header size is not deducted from the size of the byte
      stream when calculating the output buffer size for samples.  I have
      yet to notice problems directly as a result of this, but it is good
      to fix.
      
      Change-Id: Icb07a0718366c04ddac975d738a6305687773af0
      Matt T. Proud authored
    • Use idiomatic one-to-many one-time signal pattern. · 81367893
      The idiomatic pattern for signalling a one-time message to multiple
      consumers from a single producer is as follows:
      
      ```
        c := make(chan struct{})
        w := new(sync.WaitGroup)  // Boilerplate to ensure synchronization.
      
        for i := 0; i < 1000; i++ {
          w.Add(1)
          go func() {
            defer w.Done()
      
            for {
              select {
              case _, ok := <- c:
                if !ok {
                  return
                }
              default:
                // Do something here.
              }
            }
          }()
        }
      
        close(c)  // Signal the one-to-many single-use message.
        w.Wait()
      
      ```
      
      Change-Id: I755f73ba4c70a923afd342a4dea63365bdf2144b
      Matt T. Proud authored
  4. 14 Apr, 2014 2 commits
    • Make curation semaphore behavior idiomatic. · 1d01435d
      Idiomatic semaphore usage in Go, unless it is wrapping a concrete type,
      should use anonymous empty structs (``struct{}``).  This has several
      features that are worthwhile:
      
        1. It conveys that the object in the channel is likely used for
           resource limiting / semaphore use.  This is by idiom.
      
        2. Due to magic under the hood, empty structs have a width of zero,
           meaning they consume little space.  It is presumed that slices,
           channels, and other values of them can be represented specially
           with alternative optimizations.  Dmitry Vyukov has done
           investigations into improvements that can be made to the channel
           design and Go and concludes that there are already nice short
           circuiting behaviors at work with this type.
      
      This is the first change of several that apply this type of change to
      suitable places.
      
      In this one change, we fix a bug in the previous revision, whereby a
      semaphore can be acquired for curation and never released back for
      subsequent work: http://goo.gl/70Y2qK.  Compare that versus the
      compaction definition above.
      
      On top of that, the use of the semaphore in the mode better supports
      system shutdown idioms through the closing of channels.
      
      Change-Id: Idb4fca310f26b73c9ec690bbdd4136180d14c32d
      Matt T. Proud authored
    • Fix Mac OS X build since we upgraded to go1.2. · e9eda761
      Since go1.2, the release engineers have keyed their release
      artifacts to the major release family of Mac OS X.
      
      Change-Id: Ia4bf0c86af9884748e21be14ab6e09f01a830e19
      Matt T. Proud authored
  5. 08 Apr, 2014 2 commits
    • Allow reversing vector and scalar arguments in binops. · d411a7d8
      This allows putting a scalar as the first argument of a binary operator
      in which the second argument is a vector:
      
        <scalar> <binop> <vector>
      
      For example,
      
        1 / http_requests_total
      
      ...will output a vector in which every sample value is 1 divided by the
      respective input vector element.
      
      This even works for filter binary operators now:
      
        1 == http_requests_total
      
      Returns a vector with all values set to 1 for every element in
      http_requests_total whose initial value was 1.
      
      Note: For filter binary operators, the resulting values are always taken
      from the left-hand-side of the operation, no matter whether the scalar
      or the vector argument is the left-hand-side. That is,
      
        1 != http_requests_total
      
      ...will set all result vector sample values to 1, although these are
      exactly the sample elements that were != 1 in the input vector.
      
      If you want to just filter elements without changing their sample
      values, you still need to do:
      
        http_requests_total != 1
      
      The new filter form is a bit exotic, and so probably won't be used
      often. But it was easier to implement it than disallow it completely or
      change its behavior.
      
      Change-Id: Idd083f2bd3a1219ba1560cf4ace42f5b82e797a5
      Julius Volz authored
  6. 04 Apr, 2014 1 commit
  7. 01 Apr, 2014 1 commit
    • Add regex-matching support for labels. · c7c0b33d
      There are four label-matching ops for selecting timeseries now:
      
      - Equal: =
      - NotEqual: !=
      - RegexMatch: =~
      - RegexNoMatch: !~
      
      Instead of looking up labels by a simple clientmodel.LabelSet (basically
      an equals op for every key/value pair in the set), timeseries
      fingerprint selection is now done via a list of metric.LabelMatchers.
      
      Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96
      Julius Volz authored
  8. 28 Mar, 2014 1 commit
  9. 26 Mar, 2014 2 commits
    • Prometheus version 0.2.1. · 71d2ff40
      Change-Id: I288f88390d3eee45bf684647337e7bfff11def6a
      Julius Volz authored
    • Fix interval op special case. · 7a577b86
      In the case that a getValuesAtIntervalOp's ExtractSamples() is called
      with a current time after the last chunk time, we return without
      extracting any further values beyond the last one in the chunk
      (correct), but also without advancing the op's time (incorrect). This
      leads to an infinite loop in renderView(), since the op is called
      repeatedly without ever being advanced and consumed.
      
      This adds handling for this special case. When detecting this case, we
      immediately set the op to be consumed, since we would always get a value
      after the current time passed in if there was one.
      
      Change-Id: Id99149e07b5188d655331382b8b6a461b677005c
      Julius Volz authored
  10. 25 Mar, 2014 1 commit
  11. 21 Mar, 2014 2 commits
  12. 18 Mar, 2014 2 commits
    • Prometheus version 0.2.0. · 0e7596b6
      Change-Id: I4ecc8b909fc90378d855ea3620e1f6f75cc53b6d
      Julius Volz authored
    • Fix incorrect interval op advancement. · 9d5c3677
      This fixes a bug where an interval op might advance too far past the end
      of the currently extracted chunk, effectively skipping over relevant
      (to-be-extracted) values in the subsequent chunk. The result: missing
      samples at chunk boundaries in the resulting view.
      
      Change-Id: Iebf5d086293a277d330039c69f78e1eaf084b3c8
      Julius Volz authored
  13. 14 Mar, 2014 1 commit
    • Switch to new "__name__" metric name label. · cc04238a
      This also fixes the compaction test, which before worked only because
      the input sample sorting was accidentally equal to the resulting on-disk
      sample sorting.
      
      Change-Id: I2a21c4b46ba562424b27058fc02eba84fa6a6006
      Julius Volz authored
  14. 12 Mar, 2014 1 commit
    • Add regression tests for 'loop until op is consumed' bug. · c3b282bd
      - Most of this is the actual regression test in tiered_test.go.
      
      - Working on that regression tests uncovered problems in
        tiered_test.go that are fixed in this commit.
      
      - The 'op.consumed = false' line added to freelist.go was actually not
        fixing a bug. Instead, there was no bug at all. So this commit
        removes that line again, but adds a regression test to make sure
        that the assumed bug is indeed not there (cf. freelist_test.go).
      
      - Removed more code duplication in operation.go (following the same
        approach as before, i.e. embedding op type A into op type B if
        everything in A is the same as in B with the exception of String()
        and ExtractSample()). (This change make struct literals for ops more
        clunky, but that only affects tests. No code change whatsoever was
        necessary in the actual code after this refactoring.)
      
      - Fix another op leak in tiered.go.
      
      Change-Id: Ia165c52e33290ad4f6aba9c83d92318d4f583517
      Bjoern Rabenstein authored
  15. 11 Mar, 2014 6 commits
    • Merge "Introduce semantic versioning." · b7ba349c
      Björn Rabenstein authored
    • Convert metric.Values to slice of values. · 86fc13a5
      The initial impetus for this was that it made unmarshalling sample
      values much faster.
      
      Other relevant benchmark changes in ns/op:
      
      Benchmark                                 old        new   speedup
      ==================================================================
      BenchmarkMarshal                       179170     127996     1.4x
      BenchmarkUnmarshal                     404984     132186     3.1x
      
      BenchmarkMemoryGetValueAtTime           57801      50050     1.2x
      BenchmarkMemoryGetBoundaryValues        64496      53194     1.2x
      BenchmarkMemoryGetRangeValues           66585      54065     1.2x
      
      BenchmarkStreamAdd                       45.0       75.3     0.6x
      BenchmarkAppendSample1                   1157       1587     0.7x
      BenchmarkAppendSample10                  4090       4284     0.95x
      BenchmarkAppendSample100                45660      44066     1.0x
      BenchmarkAppendSample1000              579084     582380     1.0x
      BenchmarkMemoryAppendRepeatingValues 22796594   22005502     1.0x
      
      Overall, this gives us good speedups in the areas where they matter
      most: decoding values from disk and accessing the memory storage (which
      is also used for views).
      
      Some of the smaller append examples take minimally longer, but the cost
      seems to get amortized over larger appends, so I'm not worried about
      these. Also, we're currently not bottlenecked on the write path and have
      plenty of other optimizations available in that area if it becomes
      necessary.
      
      Memory allocations during appends don't change measurably at all.
      
      Change-Id: I7dc7394edea09506976765551f35b138518db9e8
      Julius Volz authored
    • Introduce semantic versioning. · 44390d83
      This introduces semantic versioning (http://semver.org/) in Prometheus:
      
      - A new VERSION file contains the semantic version string.
      
      - The "tarball" target now includes versioning and build information in
        the tarball name, like: "prometheus-0.1.0.linux-amd64.tar.gz".
      
      - A new "release" target allows scp-ing the versioned tarball to a
        remote machine (file server).
      
      - A new "tag" target allows git-tagging the current revision with the
        version specified in VERSION.
      
      Change-Id: I1f19f38b9b317bfa9eb513754750df5a9c602d94
      Julius Volz authored
    • Add version field to LevelDB sample format. · a7d0973f
      This doesn't add complex discriminator logic yet, but adds a single
      version byte to the beginning of each samples chunk. If we ever need to
      change the disk format again, this will make it easy to do so without
      having to wipe the entire database.
      
      Change-Id: I60c39274256f790bc2da83167a1effaa174588fe
      Julius Volz authored
  16. 09 Mar, 2014 2 commits
  17. 07 Mar, 2014 2 commits
  18. 06 Mar, 2014 4 commits
  19. 05 Mar, 2014 1 commit
  20. 04 Mar, 2014 1 commit
    • Remove the multi-op-per-fingerprint capability. · 9ea9189d
      Currently, rendering a view is capable of handling multiple ops for
      the same fingerprint efficiently. However, this capability requires a
      lot of complexity in the code, which we are not using at all because
      the way we assemble a viewRequest will never have more than one
      operation per fingerprint.
      
      This commit weeds out the said capability, along with all the code
      needed for it. It is still possible to have more than one operation
      for the same fingerprint, it will just be handled in a less efficient
      way (as proven by the unit tests).
      
      As a result, scanjob.go could be removed entirely.
      
      This commit also contains a few related refactorings and removals of
      dead code in operation.go, view,go, and freelist.go. Also, the
      docstrings received some love.
      
      Change-Id: I032b976e0880151c3f3fdb3234fb65e484f0e2e5
      Bjoern Rabenstein authored