ganeti-github.git
4 years agoFix error message in attachInstanceDiskChecks
Lisa Velden [Mon, 14 Dec 2015 14:13:10 +0000 (15:13 +0100)]
Fix error message in attachInstanceDiskChecks

Name the instance where disks are already attached to, which is not
necessarily the instance where we want to attach a disk to.
This fixes issue 1151.

Signed-off-by: Lisa Velden <velden@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

4 years agoUpdate documentation of harep
Klaus Aehlig [Thu, 3 Dec 2015 10:24:31 +0000 (11:24 +0100)]
Update documentation of harep

Be more explicit about which action is taken by harep under
which conditions. In particular, mention the limitation that
harep never carries out migration operations.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

4 years agoDocument harep --dry-run in the man page
Klaus Aehlig [Thu, 3 Dec 2015 08:52:32 +0000 (09:52 +0100)]
Document harep --dry-run in the man page

Document the new --dry-run option in harep's man page.
Also mention the limitations, as harep records its state
in instance tags.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

4 years agoSupport --dry-run in harep
Klaus Aehlig [Wed, 2 Dec 2015 16:20:46 +0000 (17:20 +0100)]
Support --dry-run in harep

Add a --dry-run option to harep, so that users can verify
that the actions taken by harep are the ones they want.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

4 years agoAdd a --dry-run option to htools
Klaus Aehlig [Wed, 2 Dec 2015 13:51:56 +0000 (14:51 +0100)]
Add a --dry-run option to htools

Add a new flag, --dry-run, to the available flags in htools.
It will be used for harep to allow diagnose-only runs.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoFix lines with more than 80 characters
Lisa Velden [Fri, 27 Nov 2015 10:25:55 +0000 (11:25 +0100)]
Fix lines with more than 80 characters

Previous refactoring has introduced lines with too many characters.
This patch fixes this.

Signed-off-by: Lisa Velden <velden@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoAdd more detach/attach sequence tests
Lisa Velden [Wed, 25 Nov 2015 16:57:18 +0000 (17:57 +0100)]
Add more detach/attach sequence tests

Test detach/attach sequences with an instance that becomes diskless
after detaching its disk and also test detach/attach with drbd disks.

Signed-off-by: Lisa Velden <velden@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoAllow disk attachment to diskless instances
Lisa Velden [Wed, 25 Nov 2015 15:00:45 +0000 (16:00 +0100)]
Allow disk attachment to diskless instances

As only DRBD disks can be associated to more nodes than the instance
where we want to attach the disk to, we have to change the check for
associated nodes, too.

Signed-off-by: Lisa Velden <velden@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoImprove tests for attaching disks
Lisa Velden [Wed, 25 Nov 2015 13:53:39 +0000 (14:53 +0100)]
Improve tests for attaching disks

by associating disks and instances to a specific node.
Also refactor mock uuids and mock disk names into variables.

Signed-off-by: Lisa Velden <velden@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoUse only string value in error message
Lisa Velden [Mon, 23 Nov 2015 14:42:09 +0000 (15:42 +0100)]
Use only string value in error message

Signed-off-by: Lisa Velden <velden@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoMerge branch 'stable-2.15' into stable-2.16
Helga Velroyen [Fri, 20 Nov 2015 10:34:44 +0000 (11:34 +0100)]
Merge branch 'stable-2.15' into stable-2.16

* stable-2.15
  Document the decission why optimisation is turned off
  Don't keep input for error messages
  Use dict.copy instead of deepcopy
  Use bulk-adding of keys in renew-crypto
  Make NodeSshKeyAdd use its *Bulk companion
  Unit test bulk-adding normal nodes
  Unit test for bulk-adding pot. master candidates
  Introduce bulk-adding of SSH keys
  Pause watcher during performance QA
  Send answers strictly
  Store keys as ByteStrings
  Encode UUIDs as ByteStrings
  Prefer the UuidObject type class over specific functions
  Assign the variables before use (bugfix for dee6adb9)
  Extend QA to detect autopromotion errors
  Handle SSH key distribution on auto promotion
  Do not remove authorized key of node itself
  Fix indentation
  Support force option for deactivate disks on RAPI

* stable-2.14
  Fix faulty iallocator type check
  Improve cfgupgrade output in case of errors

* stable-2.13
  Extend timeout for gnt-cluster renew-crypto
  Reduce flakyness of GetCmdline test on slow machines
  Remove duplicated words

* stable-2.12
  Revert "Also consider connection time out a network error"
  Clone lists before modifying
  Make lockConfig call retryable
  Return the correct error code in the post-upgrade script
  Make openssl refrain from DH altogether
  Fix upgrades of instances with missing creation time

* stable-2.11
  (none)

* stable-2.10
  Remove -X from hspace man page
  Make htools tolerate missing "dtotal" and "dfree" on luxi

Conflicts:
  lib/backend.py
  lib/cmdlib/node.py
  src/Ganeti/WConfd/ConfigModifications.hs

Resolutions:
  lib/backend.py
    use bulk-adding keys with renamed public key file variable
  lib/cmdlib/node.py
    use self.cfg.RemoveNode rather than self.context.RemoveNode
  src/Ganeti/WConfd/ConfigModifications.hs
    fix imports
    add UTF8.{to,from}String at appropriate places

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoAdd entries describing new gnt-cluster params to manpage
Hrvoje Ribicic [Mon, 9 Nov 2015 17:49:53 +0000 (18:49 +0100)]
Add entries describing new gnt-cluster params to manpage

And also sprinkle reminders of when to update them across the codebase.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoQA: Add ssh-key-type and -bits tests
Hrvoje Ribicic [Mon, 9 Nov 2015 17:18:33 +0000 (18:18 +0100)]
QA: Add ssh-key-type and -bits tests

This patch expands the testing of SSH key renewal by changing the key
type existing on a cluster during the QA.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoQA: Extend AssertCommand to allow not forwarding the agent
Hrvoje Ribicic [Fri, 6 Nov 2015 16:01:42 +0000 (16:01 +0000)]
QA: Extend AssertCommand to allow not forwarding the agent

When testing SSH-related behavior in Ganeti, having the SSH agent
forwarded in all the command-running utilities can produce spurious
errors, or worse yet, allow real ones to sneak by. In this patch, the
AssertCommand function is extended to allow disabling of agent
forwarding. This also switches off connection multiplexing, as the
multiplexed connection forwards agents implicitly.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoRemove default limit on diffs in cfgupgrade tests
Hrvoje Ribicic [Fri, 6 Nov 2015 12:48:01 +0000 (12:48 +0000)]
Remove default limit on diffs in cfgupgrade tests

These tests deal with large configuration files, and without the
changes present in this patch, instead of a pretty git-style diff of
two configurations, we get nothing.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoQA: Downgrade the cluster key type in 2.16
Hrvoje Ribicic [Fri, 6 Nov 2015 01:53:50 +0000 (02:53 +0100)]
QA: Downgrade the cluster key type in 2.16

The downgrade/upgrade QA test starts from a freshly-built cluster which
would have RSA keys in 2.16. Downgrading such a cluster is prevented by
one of the preceding patches, for good reason, so this patch makes sure
to switch to DSA keys before running the upgrade test.

As this code is meant to be here only in 2.16, it also includes a kill
switch in case it is merged up silently.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoFix typo
Hrvoje Ribicic [Fri, 6 Nov 2015 01:53:00 +0000 (02:53 +0100)]
Fix typo

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoFail early for invalid key type and size combinations
Hrvoje Ribicic [Fri, 6 Nov 2015 01:35:51 +0000 (02:35 +0100)]
Fail early for invalid key type and size combinations

The ssh-keygen utility permits only some combinations of key types and
bit sizes. As many more things can go wrong late in the renewal
process, this patch introduces prerequisite checks mimicking those of
ssh-keygen.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoHandle SSH key changes in upgrades and downgrades
Hrvoje Ribicic [Thu, 5 Nov 2015 13:13:58 +0000 (14:13 +0100)]
Handle SSH key changes in upgrades and downgrades

When performing an upgrade of an old cluster, it is necessary to set
the SSH key parameters to the exact same values earlier versions
implicitly used - DSA with 1024 bits.

In the other direction, we simply do not permit downgrades if keys
other than DSA are being used. Triggering a gnt-cluster renew-crypto
might be time-consuming and surprising for the user, so we are simply
throwing out an error message, explaining that the downgrade cannot be
performed in the current state of the cluster.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoAllow SSH key property changes
Hrvoje Ribicic [Wed, 4 Nov 2015 13:24:03 +0000 (13:24 +0000)]
Allow SSH key property changes

By explicitly specifying the old and new SSH key type in the SSH key
renewal, this patch allows the switching of SSH key types to take place
during such an operation.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoUse the SSH key parameters when generating keys
Hrvoje Ribicic [Tue, 13 Oct 2015 16:05:18 +0000 (12:05 -0400)]
Use the SSH key parameters when generating keys

This patch makes sure that the parameters introduced in previous
patches propagates wherever SSH keys are generated and used, allowing
Ganeti to use different types of SSH keys. With tis patch, the key type
can be set only at cluster initialization time.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoDo not generate the ganeti_pub_keys file with --no-ssh-init
Hrvoje Ribicic [Wed, 18 Nov 2015 14:58:51 +0000 (14:58 +0000)]
Do not generate the ganeti_pub_keys file with --no-ssh-init

Prior to this patch, gnt-cluster renew-crypto still created the
ganeti_pub_keys file regardless of whether the cluster was initiated
with --no-ssh-init or not. Instead, query the matching config parameter
and build the file only if Ganeti manages SSH keys.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoAdd querying of ssh-related config values
Hrvoje Ribicic [Wed, 18 Nov 2015 14:53:53 +0000 (14:53 +0000)]
Add querying of ssh-related config values

To allow various command-line operations like renew-crypto and node
adds to know how to generate SSH keys, some config values need to be
queried outside of LUs. This patch adds the ssh_key_type and
ssh_key_bits to the config values that can be queried.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoAdd modify_ssh_setup to queryable config params
Hrvoje Ribicic [Wed, 18 Nov 2015 14:49:19 +0000 (14:49 +0000)]
Add modify_ssh_setup to queryable config params

As this will be necessary for checking whether to create the
ganeti_pub_keys file.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoAdd helper function for querying cluster properties
Hrvoje Ribicic [Tue, 13 Oct 2015 21:57:02 +0000 (21:57 +0000)]
Add helper function for querying cluster properties

As more and more configuration values will have to be made available via
queries, this patch adds a small helper method for these.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoShow info about new params in gnt-cluster info
Hrvoje Ribicic [Mon, 12 Oct 2015 20:42:35 +0000 (20:42 +0000)]
Show info about new params in gnt-cluster info

With this patch, gnt-cluster info shows both the ssh key type and the
key length.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoAdd the SSH key type and length to the config, and set them
Hrvoje Ribicic [Mon, 12 Oct 2015 15:39:11 +0000 (11:39 -0400)]
Add the SSH key type and length to the config, and set them

This patch uses the previously added CLI options to allow the key
parameters to be specified at initialization time and saved in the
configuration.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoChange SSH key types to a proper Haskell sum type
Hrvoje Ribicic [Sun, 11 Oct 2015 23:03:09 +0000 (19:03 -0400)]
Change SSH key types to a proper Haskell sum type

This will allow us to perform validation of opcode params that are SSH
key types.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoAdd the SSH key options
Hrvoje Ribicic [Fri, 9 Oct 2015 20:57:38 +0000 (20:57 +0000)]
Add the SSH key options

The two options added in this patch are ssh-key-bits and
ssh-key-type, which will control the length and type of key later.
They are added to the gnt-cluster init and renew-crypto submethods.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoDocument the decission why optimisation is turned off
Klaus Aehlig [Thu, 19 Nov 2015 13:27:03 +0000 (14:27 +0100)]
Document the decission why optimisation is turned off

Commit c22a35 removed an argument of readJSONWithDesc which
caused some versions of ghc to go too crazy in optimising,
so it had to be turned off for some files. Document that reason
in a comment.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoDon't keep input for error messages
Klaus Aehlig [Wed, 18 Nov 2015 13:59:36 +0000 (14:59 +0100)]
Don't keep input for error messages

When generating error messages, the raw JSValue is rarely
useful. However, keeping it for error messages---even if
only in the unused branch of an if statement---prevents this
value from going out of scope.

Note: with the smaller number of arguments in the readJSONWithDesc
function, newer versions of ghc try too fancy optimisations and thus
run out of memory; hence we have to reduce the ghc optimisation level
for some files.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoUse dict.copy instead of deepcopy
Helga Velroyen [Wed, 18 Nov 2015 08:44:43 +0000 (09:44 +0100)]
Use dict.copy instead of deepcopy

Due to a bug in python, deepcopy does not work on
the dictionaries we use for SSH updates. This patch
replaces the use of deepcopy by the built-in copy
function of dictionaries.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoUse bulk-adding of keys in renew-crypto
Helga Velroyen [Thu, 12 Nov 2015 12:48:59 +0000 (13:48 +0100)]
Use bulk-adding of keys in renew-crypto

This patch makes renew-crypto actually use the bulk-adding
function of SSH keys rather than adding each key
individually. This patch also adds a unit tests where the
bulk-adding is tested with a diverse set of keys (master
candidates, potential master candidates, normal nodes).

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoMake NodeSshKeyAdd use its *Bulk companion
Helga Velroyen [Thu, 12 Nov 2015 10:21:45 +0000 (11:21 +0100)]
Make NodeSshKeyAdd use its *Bulk companion

Since the bulk-version of adding keys is subsuming the
functionality of adding a single key, this patch makes
NodeSshKeyAdd internally use the *Bulk version. The
unit tests in place make sure no functionality is
changed.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoUnit test bulk-adding normal nodes
Helga Velroyen [Thu, 12 Nov 2015 10:06:26 +0000 (11:06 +0100)]
Unit test bulk-adding normal nodes

This patch adds a unit test that tests adding a bulk
of normal nodes' SSH keys to the cluster.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoUnit test for bulk-adding pot. master candidates
Helga Velroyen [Thu, 12 Nov 2015 10:05:15 +0000 (11:05 +0100)]
Unit test for bulk-adding pot. master candidates

This patch adds a unit tests for bulk-adding SSH keys
of potential master candidates.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoIntroduce bulk-adding of SSH keys
Helga Velroyen [Wed, 11 Nov 2015 15:54:31 +0000 (16:54 +0100)]
Introduce bulk-adding of SSH keys

This patch introduces a backend function to add a set of
SSH keys to the nodes (rather than one key at a time).
The bulk-adding function is having the same structure
as the original one, but is adapted to work with a set
of keys rather than one key.

This patch also adds a unit test for testing the
bulk-adding of keys.

Note that this patch only adds the bulk-adding function
but does not use it yet. In the following patches of
this series, we will add more unit tests and at the
end integrate the bulk-adding function into
renew-crypto.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoPause watcher during performance QA
Klaus Aehlig [Tue, 17 Nov 2015 14:16:28 +0000 (15:16 +0100)]
Pause watcher during performance QA

Our performance QA tests are intended to alert us if some common
task suddenly takes longer. To serve this purpose, they need to provide
reproducible results. Hence avoid any interference with watcher-submitted
jobs by pausing the watcher during performance QA tests.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoSend answers strictly
Klaus Aehlig [Mon, 16 Nov 2015 14:05:45 +0000 (15:05 +0100)]
Send answers strictly

When sending an answer over a domain socket, the recipient
won't process that answer anyway before it is complete. So
we can as well assemble one ByteString first and send it over
the wire all at once, thus saving a few system calls.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoStore keys as ByteStrings
Klaus Aehlig [Thu, 12 Nov 2015 13:51:16 +0000 (14:51 +0100)]
Store keys as ByteStrings

Keys to maps are only used to look up values, so
a compact representation does impact flexibility.
However, it does save on memory usage; having more
locality in the keys also improves time when comparing
them.

While there, also refrain from linearly looking through
keys searching for partial matches where partial matches
are not desired (e.g., when looking up things by uuid).

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoMention disabling of '--no-node-setup' in NEWS file
Helga Velroyen [Fri, 13 Nov 2015 08:56:32 +0000 (09:56 +0100)]
Mention disabling of '--no-node-setup' in NEWS file

Update the NEWS file regarding the disabling of
'--no-node-setup'.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoShow 'modify ssh setup' in cluster info
Helga Velroyen [Wed, 11 Nov 2015 12:02:13 +0000 (13:02 +0100)]
Show 'modify ssh setup' in cluster info

This shows the parameter 'modify ssh setup' in the
output of 'gnt-cluster info', to make the information
more accessible than only writing it in the configuration.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoDisable --no-node-setup
Helga Velroyen [Tue, 10 Nov 2015 10:02:41 +0000 (11:02 +0100)]
Disable --no-node-setup

This patch disables the option --no-node-setup of
'gnt-node add'. The option is the equivalent to
--no-ssh-init for 'gnt-cluster init'. However, it
was rather cumbersome for users to remember whether
or not the cluster was initialized with that to
not forget to use this option. Instead making the
user use --no-node-setup, Ganeti shall determine
whether or not to change the SSH setup by reading
the configuration parameter 'modify_ssh_setup' which
is set with --no-ssh-init.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoMake 'modify ssh setup' queryable
Helga Velroyen [Fri, 6 Nov 2015 12:46:44 +0000 (13:46 +0100)]
Make 'modify ssh setup' queryable

This enables the config to be queried for the configuration
parameter 'modify ssh setup'. This will later be used in
gnt-node add.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoMerge branch 'stable-2.14' into stable-2.15
Klaus Aehlig [Thu, 12 Nov 2015 10:45:16 +0000 (11:45 +0100)]
Merge branch 'stable-2.14' into stable-2.15

* stable-2.14
  Fix faulty iallocator type check
  Improve cfgupgrade output in case of errors

* stable-2.13
  Extend timeout for gnt-cluster renew-crypto
  Reduce flakyness of GetCmdline test on slow machines
  Remove duplicated words

* stable-2.12
  Revert "Also consider connection time out a network error"
  Clone lists before modifying
  Make lockConfig call retryable
  Return the correct error code in the post-upgrade script
  Make openssl refrain from DH altogether
  Fix upgrades of instances with missing creation time

* stable-2.11
  (no changes)

* stable-2.10
  Remove -X from hspace man page
  Make htools tolerate missing "dtotal" and "dfree" on luxi

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoEncode UUIDs as ByteStrings
Klaus Aehlig [Wed, 11 Nov 2015 11:07:03 +0000 (12:07 +0100)]
Encode UUIDs as ByteStrings

UUIDs are fixed-length strings at which we either look
completely or not at all. Moreover, we do not do any
computations on them. Therefore, we can chose a more
compact representation on them, resulting in reduced memory
foot print.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoPrefer the UuidObject type class over specific functions
Klaus Aehlig [Wed, 11 Nov 2015 11:26:24 +0000 (12:26 +0100)]
Prefer the UuidObject type class over specific functions

The UuidObject type class provides a clean interface to
obtain the UUID of an object. Prefer this interface over
hard-coding the specific functions all over the place.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoMerge branch 'stable-2.13' into stable-2.14
Oleg Ponomarev [Wed, 11 Nov 2015 18:01:36 +0000 (19:01 +0100)]
Merge branch 'stable-2.13' into stable-2.14

* stable-2.13
  Extend timeout for gnt-cluster renew-crypto

* stable-2.12
  Revert "Also consider connection time out a network error"
  Clone lists before modifying
  Make lockConfig call retryable

* stable-2.11
  (no changes)

* stable-2.10
  Remove -X from hspace man page
  Make htools tolerate missing "dtotal" and "dfree" on luxi

Conflicts:
    tools/cfgupgrade
Resolution
    take the change into lib/tools/cfgupgrade

Signed-off-by: Oleg Ponomarev <oponomarev@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoMerge branch 'stable-2.12' into stable-2.13
Oleg Ponomarev [Wed, 11 Nov 2015 17:14:51 +0000 (18:14 +0100)]
Merge branch 'stable-2.12' into stable-2.13

* stable-2.12
  Revert "Also consider connection time out a network error"
  Clone lists before modifying
  Make lockConfig call retryable

* stable-2.11
  (no changes)

* stable-2.10
  Remove -X from hspace man page
  Make htools tolerate missing "dtotal" and "dfree" on luxi

Signed-off-by: Oleg Ponomarev <oponomarev@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoMerge branch 'stable-2.11' into stable-2.12
Oleg Ponomarev [Wed, 11 Nov 2015 16:04:40 +0000 (17:04 +0100)]
Merge branch 'stable-2.11' into stable-2.12

    * stable-2.11
      (no changes)

    * stable-2.10
      Remove -X from hspace man page
      Make htools tolerate missing "dtotal" and "dfree" on luxi

Signed-off-by: Oleg Ponomarev <oponomarev@google.com>
Reviewed-by: Liza Velden <velden@google.com>

5 years agoMerge branch 'stable-2.10' into stable-2.11
Klaus Aehlig [Wed, 11 Nov 2015 15:51:42 +0000 (16:51 +0100)]
Merge branch 'stable-2.10' into stable-2.11

* stable-2.10
  Remove -X from hspace man page
  Make htools tolerate missing "dtotal" and "dfree" on luxi

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoRevert "Also consider connection time out a network error"
Klaus Aehlig [Tue, 10 Nov 2015 16:47:44 +0000 (17:47 +0100)]
Revert "Also consider connection time out a network error"

This reverts commit 84c17185ad47070944c64ab64a8c7dfd60a260f9.
We use RetryOnNetworkError for basically every form of internal
communication. While it makes sense to retry---given that we
assume daemons might come and go at any time---we can only do
so safely, if we positively know that we did not cause any
side effect. Given that not all our requests are idempotent
(e.g., submitting jobs is not)---in fact, the majority is
not--, retrying on timeouts is not safe.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoClone lists before modifying
Klaus Aehlig [Tue, 10 Nov 2015 15:40:47 +0000 (16:40 +0100)]
Clone lists before modifying

When an opcode expands to a list of jobs, we extend the reason trail
of the new jobs with that of the original opcode that expanded to them.
Before modifying the reason trail, however, we should duplicate it to
avoid side effects on shared copies---like the default empty list.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoAssign the variables before use (bugfix for dee6adb9)
Oleg Ponomarev [Mon, 9 Nov 2015 16:28:38 +0000 (17:28 +0100)]
Assign the variables before use (bugfix for dee6adb9)

Signed-off-by: Oleg Ponomarev <oponomarev@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoExtend QA to detect autopromotion errors
Helga Velroyen [Fri, 6 Nov 2015 10:27:41 +0000 (11:27 +0100)]
Extend QA to detect autopromotion errors

The issue that was fixed with the previous patch would
have been detected earlier if the QA would actually
run a 'verify' after the modify operations. For 'verify'
to not raise false negatives, we need to first reduce
the candidate pool size, because otherwise QA fails
with a warning about the mininmum pool size being
violated.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoHandle SSH key distribution on auto promotion
Helga Velroyen [Fri, 6 Nov 2015 10:26:08 +0000 (11:26 +0100)]
Handle SSH key distribution on auto promotion

This fixes the missing SSH key distribution in case
a node gets autopromoted to master candidate.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoDo not remove authorized key of node itself
Helga Velroyen [Fri, 6 Nov 2015 09:11:19 +0000 (10:11 +0100)]
Do not remove authorized key of node itself

This fixes a small bug that if a node was demoted
from master candidate, that its own public key
was removed from its own authorized key file.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoFix indentation
Lisa Velden [Thu, 5 Nov 2015 10:36:57 +0000 (11:36 +0100)]
Fix indentation

so that the method can be called correctly.

Signed-off-by: Lisa Velden <velden@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoMake lockConfig call retryable
Klaus Aehlig [Wed, 4 Nov 2015 13:52:16 +0000 (14:52 +0100)]
Make lockConfig call retryable

Locking the configuration is naturally idempotent. However,
the corresponding WConfD call had a check refusing to lock
the config, if the caller has already locked it, arguing that
this should not happen. That argument misses that we have the
built-in assumption that daemons might be restarted at any time,
including the moment where a request is processed, but the caller
did not get the answer yet. So allow retries, hower logging that
they occurred (as this should only happen rarely).

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoExtend timeout for gnt-cluster renew-crypto
Hrvoje Ribicic [Wed, 4 Nov 2015 13:01:38 +0000 (14:01 +0100)]
Extend timeout for gnt-cluster renew-crypto

With particularly large clusters, the renewal of SSH keys happening in
renew-crypto can take a long time to complete. While this should be
improved, an additional problem is that the RPC doing most of the work
has a default one-hour timeout. Given that it is preferable that the
operation completes, this patch bumps the timeout to four hours, which
should suffice even for 80+ node clusters.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoMerge branch 'stable-2.12' into stable-2.13
Hrvoje Ribicic [Mon, 2 Nov 2015 17:49:36 +0000 (17:49 +0000)]
Merge branch 'stable-2.12' into stable-2.13

* stable-2.12
  Return the correct error code in the post-upgrade script
  Make openssl refrain from DH altogether
  Fix upgrades of instances with missing creation time

Conflicts:
cfgupgrade_unittest.py: merge version tests
tools/post-upgrade: return the correct error code for SSH
                    renewal as well

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoReturn the correct error code in the post-upgrade script
Hrvoje Ribicic [Mon, 2 Nov 2015 17:19:22 +0000 (17:19 +0000)]
Return the correct error code in the post-upgrade script

While we want all the post-upgrade actions to be undertaken, should one
of these fail, the correct error code should be returned so that the
upgrade script can report issues.

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoMake openssl refrain from DH altogether
Klaus Aehlig [Mon, 2 Nov 2015 10:44:34 +0000 (11:44 +0100)]
Make openssl refrain from DH altogether

As various ssl implementations have different ideas about
which dh key lengths are acceptable, refrain from standard
dh altogether (and not only from anonymous dh) to avoid
handshake problems.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoRemove -X from hspace man page
Klaus Aehlig [Mon, 26 Oct 2015 12:34:17 +0000 (13:34 +0100)]
Remove -X from hspace man page

hspace never had such an option.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

Cherry-picked-from: fa36daf4
Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoFix faulty iallocator type check
Hrvoje Ribicic [Wed, 28 Oct 2015 17:56:23 +0000 (17:56 +0000)]
Fix faulty iallocator type check

Because the ignore-soft-errors parameter is optional rather than always
present, fix the type check in the iallocator request issuing code.

Signed-off-by: Gerard Oskamp <gjo@google.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoImprove cfgupgrade output in case of errors
Hrvoje Ribicic [Wed, 28 Oct 2015 14:21:06 +0000 (15:21 +0100)]
Improve cfgupgrade output in case of errors

By logging with the exception function instead of the error function,
and showing the error content without the stack trace unless explicitly
debugging.

Signed-off-by: Gerard Oskamp <gjo@google.com>
Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoFix upgrades of instances with missing creation time
Hrvoje Ribicic [Tue, 27 Oct 2015 18:38:16 +0000 (18:38 +0000)]
Fix upgrades of instances with missing creation time

Some instances from very old Ganeti versions may not have any creation
time information embedded in the config. The upgrade code does not
expect this, and crashes horribly when trying to populate newly
separate disk objects with the same creation time, and this patch
fixes things by inserting a fake value: 0.

The value was chosen because the serialization and deserialization of
such an instance in Haskell yields a value of 0 for the ctime, making
the time consistent between instance and disk. While showing the epoch
time instead of N/A in gnt-instance info is suboptimal, due to the age
of the Ganeti version in which these instances must have been created,
they are at least still ordered correctly.

Signed-off-by: Gerard Oskamp <gjo@google.com>
Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoFix RPC signature of NodeVerify
Helga Velroyen [Wed, 28 Oct 2015 10:54:16 +0000 (11:54 +0100)]
Fix RPC signature of NodeVerify

This fixes the fact that the last patch was submitted
in the wrong version. It also fixes a bug where
accidentally the node was not looked up properly
in the ssh_port_map.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoReduce flakyness of GetCmdline test on slow machines
Klaus Aehlig [Wed, 28 Oct 2015 10:54:18 +0000 (11:54 +0100)]
Reduce flakyness of GetCmdline test on slow machines

The GetCmdline test verifies that we can get the command line
of a running process via the procfs. To not have to care about
cleanup, the test creates an ephemeral process for this. While
two wall-clock seconds seem more than enough for a single read
from the procfs on nowadays machines, this is not true for some
of the public buildbot (virtual) machines which are extremely
low on resources and can have really heavy load; this causes
flakyness of that test there. Mitigate this by increasing the
life time of the process.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoUse ssconf for SSH ports in NodeVerify
Helga Velroyen [Tue, 27 Oct 2015 14:17:06 +0000 (15:17 +0100)]
Use ssconf for SSH ports in NodeVerify

This fixes issue 773. For a while already, SSH ports are
available via ssconf. This enables us to simplify the
RPC signature of NodeVerify.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoRemove duplicated words
Lisa Velden [Tue, 27 Oct 2015 15:43:13 +0000 (16:43 +0100)]
Remove duplicated words

Signed-off-by: Lisa Velden <velden@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoSupport force option for deactivate disks on RAPI
Klaus Aehlig [Tue, 27 Oct 2015 14:32:26 +0000 (15:32 +0100)]
Support force option for deactivate disks on RAPI

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoMerge branch 'stable-2.15' into stable-2.16
Lisa Velden [Mon, 26 Oct 2015 09:01:12 +0000 (10:01 +0100)]
Merge branch 'stable-2.15' into stable-2.16

* stable-2.15
  (no changes)

* stable-2.14
  (no changes)

* stable-2.13
  Renew-crypto: stop daemons on master node first
  Mention manual creation of {shared,}file paths in UPGRADE
  Don't warn about broken SSH setup of offline nodes

* stable-2.12
  Fix inconsistency in python and haskell objects
  Add notSerializeDefault default field option
  Move design-disks.rst to drafts

* stable-2.11
  Fix default for --default-iallocator-params

Conflicts:
    doc/design-draft.rst

Resolution:
    doc/design-draft.rst: take the stable-2.16 version and
      add design-disks.rst

Signed-off-by: Lisa Velden <velden@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoMerge branch 'stable-2.14' into stable-2.15
Klaus Aehlig [Fri, 23 Oct 2015 11:32:06 +0000 (13:32 +0200)]
Merge branch 'stable-2.14' into stable-2.15

* stable-2.14
  (no changes)

* stable-2.13
  Renew-crypto: stop daemons on master node first
  Mention manual creation of {shared,}file paths in UPGRADE
  Don't warn about broken SSH setup of offline nodes

* stable-2.12
  Fix inconsistency in python and haskell objects
  Add notSerializeDefault default field option
  Move design-disks.rst to drafts

* stable-2.11
  Fix default for --default-iallocator-params

Conflicts:
doc/design-draft.rst
Resolution:
take all additions

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoMerge branch 'stable-2.13' into stable-2.14
Klaus Aehlig [Fri, 23 Oct 2015 07:52:51 +0000 (09:52 +0200)]
Merge branch 'stable-2.13' into stable-2.14

* stable-2.13
  Renew-crypto: stop daemons on master node first
  Mention manual creation of {shared,}file paths in UPGRADE
  Don't warn about broken SSH setup of offline nodes

* stable-2.12
  Fix inconsistency in python and haskell objects
  Add notSerializeDefault default field option
  Move design-disks.rst to drafts

* stable-2.11
  Fix default for --default-iallocator-params

Conflicts:
src/Ganeti/THH.hs
Resolution:
take all additions

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoMake htools tolerate missing "dtotal" and "dfree" on luxi
Klaus Aehlig [Tue, 16 Jun 2015 09:15:48 +0000 (11:15 +0200)]
Make htools tolerate missing "dtotal" and "dfree" on luxi

If a cluster allows sharedfile as only disk template, the amount of
total and free disk space might not be available. This is perfectly
normal, hence make the luxi backend handle it gracefully and just report
0 available disk on 0 total disk.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Petr Pudlak <pudlak@google.com>

Cherry-picked-from: 49644203
Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoMerge branch 'stable-2.12' into stable-2.13
Klaus Aehlig [Thu, 22 Oct 2015 08:51:36 +0000 (10:51 +0200)]
Merge branch 'stable-2.12' into stable-2.13

* stable-2.12
  Fix inconsistency in python and haskell objects
  Add notSerializeDefault default field option
  Move design-disks.rst to drafts

* stable-2.11
  Fix default for --default-iallocator-params

Conflicts:
doc/design-draft.rst
doc/index.rst
lib/cli.py

Resolution:
for lib/cli.py follow the code move
for the rest, take all additions.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agoMerge branch 'stable-2.11' into stable-2.12
Klaus Aehlig [Thu, 22 Oct 2015 07:13:23 +0000 (09:13 +0200)]
Merge branch 'stable-2.11' into stable-2.12

* stable-2.11
  Fix default for --default-iallocator-params

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoFix default for --default-iallocator-params
Klaus Aehlig [Wed, 21 Oct 2015 15:36:23 +0000 (17:36 +0200)]
Fix default for --default-iallocator-params

We need to distinguish between the option not being provided
(i.e., no change requested) and the option being empty (i.e.,
a request to reset the value). Therefore, use None as a default,
not {}.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoRenew-crypto: stop daemons on master node first
Helga Velroyen [Wed, 21 Oct 2015 10:51:37 +0000 (12:51 +0200)]
Renew-crypto: stop daemons on master node first

Otherwise, this can create problems when restarting
the nodes due to voting issues.

Signed-off-by: Gerard Oskamp <gjo@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

5 years agoMention manual creation of {shared,}file paths in UPGRADE
Helga Velroyen [Thu, 15 Oct 2015 14:11:33 +0000 (16:11 +0200)]
Mention manual creation of {shared,}file paths in UPGRADE

This fixes Issue 653. It was unclear whether or not
'ensure-dirs' creates the directories for file and
sharedfile storage. This patch extends the documentation
to clarify this.

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Oleg Ponomarev <oponomarev@google.com>

5 years agokvm: Introduce kvm_pci_reservations hvparam
Dimitris Aragiorgis [Thu, 8 Oct 2015 09:40:28 +0000 (12:40 +0300)]
kvm: Introduce kvm_pci_reservations hvparam

In order to support the theoretical maximum of devices (16 disks and
8 NICs) we introduce kvm_pci_reservations hvparam that denotes the
number of PCI slots that QEMU will implicitly use, e.g. if
kvm_pci_reservations is set to 5, QEMU will manage PCI slots 0, 1,
2, 3, 4. By default this will be 12 and Ganeti will start adding
disks and NICs from the 13rd slot (or whatever the next slot is)
onwards.

Signed-off-by: Dimitris Aragiorgis <dimitris.aragiorgis@gmail.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agokvm: Introduce scsi_controller_type hvparam
Dimitris Aragiorgis [Thu, 8 Oct 2015 09:40:27 +0000 (12:40 +0300)]
kvm: Introduce scsi_controller_type hvparam

This will allow the user to explicitly set the type of SCSI
controller to use. The available types are: lsi, virtio-scsi-pci,
and megasas. QEMU uses lsi by default and so does Ganeti.

Signed-off-by: Dimitris Aragiorgis <dimitris.aragiorgis@gmail.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agokvm: Use the new interface during hotplug actions
Dimitris Aragiorgis [Thu, 8 Oct 2015 09:40:26 +0000 (12:40 +0300)]
kvm: Use the new interface during hotplug actions

Use the new interface and hvinfo during hotplug actions. This means
that _GenerateKVMDeviceID(), _GetBusSlots() and
_GenerateDeviceHVInfo() will be used to obtain the proper device ID
and bus position.

Add an extra check in _VerifyHotplugSupport() that allows hotplug
only for paravirtual devices (virtio-blk-pci, virtio-net-pci) and
SCSI devices (scsi-cd, scsi-hd, scsi-block, scsi-generic) that
use the latest device model in QEMU (via the -device option).

Signed-off-by: Dimitris Aragiorgis <dimitris.aragiorgis@gmail.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agokvm: Work around QEMU commit 48f364dd
Dimitris Aragiorgis [Thu, 8 Oct 2015 09:40:25 +0000 (12:40 +0300)]
kvm: Work around QEMU commit 48f364dd

QEMU commits 48f364dd and da2cf4e8 included in version 2.2 and later
state the following:

 * HMP's drive_del is not supported any more on a drive added
   via QMP's blockdev-add

 * Stay away from immature blockdev-add unless you want to help
   with development.

To this end, currently we cannot hot-remove a disk that was
previously hot-added. Here, we work around this constrain by using
HMP's drive_add instead of blockdev-add. This must be done via a
callback in HotAddDisk() wrapper method, due to the fact that if a
QMP connection terminates before a drive keeps a reference to the fd
passed via the add-fd QMP command, then the fd gets closed and
cannot be used later. So we open a QMP connection, pass the fd, then
invoke the drive_add HMP command referencing the corresponding
fdset, then remove the fdset, then add the device and then terminate
the QMP connection.

Signed-off-by: Dimitris Aragiorgis <dimitris.aragiorgis@gmail.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agomonitor: Use hvinfo in QMP methods
Dimitris Aragiorgis [Thu, 8 Oct 2015 09:40:24 +0000 (12:40 +0300)]
monitor: Use hvinfo in QMP methods

Change hotplug-related methods (i.e., HotAddNic() and HotAddDisk())
to use slot 'hvinfo' instead of slot 'pci'.

Change HasPCIDevice() to HasDevice(). This will parse the output of
the 'query-block' and 'query-pci' QMP commands and try to match the
given device ID.

Make _VerifyHotplugCommand() depend only on the device ID and not
on the whole device object.

Signed-off-by: Dimitris Aragiorgis <dimitris.aragiorgis@gmail.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agokvm: Refactor device option handling
Dimitris Aragiorgis [Thu, 8 Oct 2015 09:40:23 +0000 (12:40 +0300)]
kvm: Refactor device option handling

Replace pci slot with hvinfo in Disk and NIC config Objects. This
will contain all the necessary info to construct a -device option.
Specifically, for PCI devices it will have driver, id, bus, and addr
fields. SCSI devices will have driver, id, bus, channel, scsi-id,
and lun fields.

Change the way we generate the device ID so that it gets derived
only from the type and the UUID of the device (e.g.
disk-932df160-7a22-4067). We ensure seamless migration by upgrading
the existing runtime files in _UpgradeSerializedRuntime(). All
devices found with a 'pci' slot will get an 'hvinfo' dict with the
old id, bus 'pci.0', and addr based on the old value of 'pci'.

Change the way we put devices into buses. Guessing the default
reserved PCI slots of an instance has proven to be error-prone,
since they may change from version to version, and also because the
PCI bus state may be modified with a specific option passed by the
user with the kvm_extra hvpamam. Therefore, we decide to start
adding PCI devices from slot 12 onwards. The SCSI bus does not need
to have default reservations.

Let QEMU decide the PCI slot of spice, balloon and SCSI controller
devices. Finally, QEMU adds by default an LSI controller in case a
SCSI disk is attached to the instance. Here, we set it explicitly,
and name the created bus 'scsi.0' (as QEMU does by default).

Until now, only SCSI emulated disks were supported. Allow
scsi-generic, scsi-block, scsi-hd, and scsi-cd for the disk_type
hvparams. The first two do SCSI pass-through and thus require a
physical SCSI device on the host. The third is the equivalent of
if=scsi, which means that the virtual disk of the instance will be an
emulated SCSI device.

Signed-off-by: Dimitris Aragiorgis <dimitris.aragiorgis@gmail.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoAdd design doc for SCSI support in KVM
Dimitris Aragiorgis [Thu, 8 Oct 2015 09:40:22 +0000 (12:40 +0300)]
Add design doc for SCSI support in KVM

This is a design document detailing the refactoring of device
handling in the KVM Hypervisor. More specifically, it will use the
latest QEMU device model and modify the current hotplug implementation
so that both PCI and SCSI devices can be managed.

Signed-off-by: Dimitris Aragiorgis <dimitris.aragiorgis@gmail.com>
Reviewed-by: Hrvoje Ribicic <riba@google.com>

5 years agoDon't warn about broken SSH setup of offline nodes
Helga Velroyen [Wed, 14 Oct 2015 08:24:33 +0000 (10:24 +0200)]
Don't warn about broken SSH setup of offline nodes

This fixes issue 1131. 'gnt-cluster verify' should stop
complaining about broken SSH setups of offline nodes.

Additionally, this fixes a problem when readding nodes.
In some cases, Ganeti complains about a possible attack,
which is a valid case for readding a node (if a key
renew took place between offlining and readding the node).

Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoFix inconsistency in python and haskell objects
Oleg Ponomarev [Mon, 12 Oct 2015 14:25:33 +0000 (16:25 +0200)]
Fix inconsistency in python and haskell objects

Currently hv/disk_state_static parameters are supported only for cluster
object properly. For node groups and nodes they were introduced in
2da9f556, however only on the python side. This could cause problems
during upgrades from old versions.

This patch adds hv and disk states fields to haskell objects as a
notSerializedDefaultField which will fix the problem without the changes
in behaviour. Also it modifies corresponding haskell arbitrary instances.

The patch is inspired by e78fb0d6 and 553363a3.

Signed-off-by: Oleg Ponomarev <oponomarev@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoAdd notSerializeDefault default field option
Oleg Ponomarev [Mon, 12 Oct 2015 14:25:32 +0000 (16:25 +0200)]
Add notSerializeDefault default field option

Default field with notSerializedDefault flag set is a default field which
will be serialized only if it's value differs from the default one. This
flag can be set by using notSerializedDefaultField field type instead of
defaultField field type.

This field is introduced in order to fix a bug of inconsistency between
haskell and python config modules which leads to inconsistent config
after ganeti updgrade.

Signed-off-by: Oleg Ponomarev <oponomarev@google.com>
Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

Cherry-picked from: c0a2c62b9ad96c3e35cae0ffdcdf63a09164f537

Signed-off-by: Oleg Ponomarev <oponomarev@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoAdd test for desired locations accounting in ialloc
Oleg Ponomarev [Thu, 8 Oct 2015 09:59:58 +0000 (11:59 +0200)]
Add test for desired locations accounting in ialloc

The test demonstates simple allocation request in which desired location
affect the instance placement.

Signed-off-by: Oleg Ponomarev <oponomarev@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoFix requested instance desired location tags in IAllocator
Oleg Ponomarev [Thu, 8 Oct 2015 09:53:05 +0000 (11:53 +0200)]
Fix requested instance desired location tags in IAllocator

Currently, instance desired location tags are not set for the
instance(s) in iallocator request. This patch fixes the issue.

Signed-off-by: Oleg Ponomarev <oponomarev@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoMerge branch 'stable-2.15' into stable-2.16
Hrvoje Ribicic [Mon, 12 Oct 2015 15:11:38 +0000 (17:11 +0200)]
Merge branch 'stable-2.15' into stable-2.16

* stable-2.15
  For queries, take the correct base address of an IP block
  Fix computation in network blocks

* stable-2.14
  Add test for tags accounting in hail
  Set node tags in iallocator htools backend

* stable-2.13
  Improve xl socat migrations

* stable-2.12
  QA: Retrieve only the RAPI certificate
  QA: Allow usage of specific RAPI certificates and files
  QA: Reload certificates only when renew-crypto has been run
  QA: Restart Ganeti after adding the RAPI users file
  QA: Add reading the RAPI password from a file
  QA: Allow the RAPI user to be set
  QA: Do not remove nodes from cluster without destroying it
  QA: Refactor RAPI handling
  Increase default disk size of burnin to 1G
  break line with more than 80 characters
  Only search for Python-2 interpreters
  Fix faulty comments / indentation
  Handle Xen 4.3 states better

* stable-2.11
  (no changes)

* stable-2.10
  Add a test for parsing of admin_state in IAlloc backend
  At IAlloc backend guess state from admin state

* stable-2.9
  Update harep's man page to notify users of its limitations

 Conflicts:
htools-hail.test: merged tests

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoFor queries, take the correct base address of an IP block
Klaus Aehlig [Fri, 9 Oct 2015 16:15:02 +0000 (18:15 +0200)]
For queries, take the correct base address of an IP block

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoFix computation in network blocks
Klaus Aehlig [Fri, 9 Oct 2015 15:58:26 +0000 (17:58 +0200)]
Fix computation in network blocks

...by differentiating between the provided address and
the base address of the block. E.g., 10.0.0.1/29 and 10.0.0.0/29
contain the same IP addresses; in particular, the first address is
10.0.0.0.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoMerge branch 'stable-2.14' into stable-2.15
Hrvoje Ribicic [Mon, 12 Oct 2015 14:01:02 +0000 (16:01 +0200)]
Merge branch 'stable-2.14' into stable-2.15

* stable-2.14
  Add test for tags accounting in hail
  Set node tags in iallocator htools backend

* stable-2.13
  Improve xl socat migrations

* stable-2.12
  QA: Retrieve only the RAPI certificate
  QA: Allow usage of specific RAPI certificates and files
  QA: Reload certificates only when renew-crypto has been run
  QA: Restart Ganeti after adding the RAPI users file
  QA: Add reading the RAPI password from a file
  QA: Allow the RAPI user to be set
  QA: Do not remove nodes from cluster without destroying it
  QA: Refactor RAPI handling
  Increase default disk size of burnin to 1G
  break line with more than 80 characters
  Only search for Python-2 interpreters
  Fix faulty comments / indentation
  Handle Xen 4.3 states better

* stable-2.11
  (no changes)

* stable-2.10
  Add a test for parsing of admin_state in IAlloc backend
  At IAlloc backend guess state from admin state

* stable-2.9
  Update harep's man page to notify users of its limitations

Signed-off-by: Hrvoje Ribicic <riba@google.com>
Reviewed-by: Klaus Aehlig <aehlig@google.com>

5 years agoMerge branch 'stable-2.13' into stable-2.14
Klaus Aehlig [Mon, 12 Oct 2015 12:54:00 +0000 (14:54 +0200)]
Merge branch 'stable-2.13' into stable-2.14

* stable-2.13
  Improve xl socat migrations

* stable-2.12
  QA: Retrieve only the RAPI certificate
  QA: Allow usage of specific RAPI certificates and files
  QA: Reload certificates only when renew-crypto has been run
  QA: Restart Ganeti after adding the RAPI users file
  QA: Add reading the RAPI password from a file
  QA: Allow the RAPI user to be set
  QA: Do not remove nodes from cluster without destroying it
  QA: Refactor RAPI handling
  Increase default disk size of burnin to 1G
  break line with more than 80 characters
  Only search for Python-2 interpreters
  Fix faulty comments / indentation
  Handle Xen 4.3 states better

* stable-2.11
  (no changes)

* stable-2.10
  Add a test for parsing of admin_state in IAlloc backend
  At IAlloc backend guess state from admin state

* stable-2.9
  Update harep's man page to notify users of its limitations

Conflicts:
src/Ganeti/HTools/Backend/IAlloc.hs
Resolution:
manually apply 9c1704a5 to stable-2.13

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoMove design-disks.rst to drafts
Klaus Aehlig [Mon, 12 Oct 2015 12:15:07 +0000 (14:15 +0200)]
Move design-disks.rst to drafts

When, in commit 2676f31, the design for stand-alone disks
was added, it was not added to the list of draft designs,
but accidentally to the list of designs not shown in the index;
the latter, however, is only for implemented designs. As this
design still isn't fully implemented, fix this now.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>

5 years agoMake CommitTemporaryIPs call out to WConfD
Klaus Aehlig [Thu, 8 Oct 2015 11:32:24 +0000 (13:32 +0200)]
Make CommitTemporaryIPs call out to WConfD

...instead of only doing changes locally. Doing changes
locally used to be fine---and even necessary---as long
as all calls to CommitTemporaryIPs used to be under
full configuration synchronisation.

Signed-off-by: Klaus Aehlig <aehlig@google.com>
Reviewed-by: Lisa Velden <velden@google.com>